Skip to content

Latest commit

 

History

History
75 lines (53 loc) · 3.06 KB

README.md

File metadata and controls

75 lines (53 loc) · 3.06 KB

Build Status Build Status Docker Pulls Npm Downloades

NodeJS based website downloader

Download a website locally without any configuration right from you terminal

Note: The script is based entirely on node-webiste-scraper, an awesome website scraper library :)

Requirments

  • Nodejs version >= 8

Installation

npm install -g node-site-downloader

Usage

node-site-downloader download DOMAIN START_POINT OUTPUT_FOLDER [VERBOSE] [OUTPUT_FOLDER_SUFFIX] [INCLUDE_IMAGES]

Example

# Download all of the english jest documentation
node-site-downloader download -s https://jestjs.io/docs/en/getting-started -d https://jestjs.io/docs/en/ -o jest-docs -v --include-images

For more information please run

node-site-downloader --help
node-site-downloader download --help

Docker support

Now you can run the downloader straight from a docker container. This way there is no need to download nodejs and install node-site-downloader.

Instead please pull the image from dockerhub

docker pull gnird/node-site-downloader

And then run the container with all of the relevant options passed to the script (Please check the options section), except for --output-folder.

--output-folder isn't passed to the container because the script saves the site inside of the container.

Instead configure a volume from a folder on your computer to /data in the container.

docker run -v /some/path:/data ...

Docker example

docker run -v /tmp/mysite:/data gnird/node-site-downloader download -d https://jestjs.io/docs/en/ -s https://jestjs.io/docs/en/getting-started -v 

NOTICE: The first -v configures the volume for the container and the second -v (at the end of the command) is passed to the script in order to make it verbose.

Options

  • domain (-d) - The script will download all of the urls under the specified url.
  • start point (-s) - The page from which the script should start scraping
  • include-images (--include-images) - Should the script download relevant images as well?
  • output folder (--output-folder) - The folder in which the script should save the downloaded assets, Note: The folder should not exist!
  • verbose (-v) - If flag is present the script will print every url that was downloaded.
  • output folder suffix (--output-folder-suffix) - The suffix that will be added to OUTPUT_FOLDER, defaults to: .site