Download a website locally without any configuration right from you terminal
Note: The script is based entirely on node-webiste-scraper, an awesome website scraper library :)
- Nodejs version >= 8
npm install -g node-site-downloader
node-site-downloader download DOMAIN START_POINT OUTPUT_FOLDER [VERBOSE] [OUTPUT_FOLDER_SUFFIX] [INCLUDE_IMAGES]
# Download all of the english jest documentation
node-site-downloader download -s https://jestjs.io/docs/en/getting-started -d https://jestjs.io/docs/en/ -o jest-docs -v --include-images
For more information please run
node-site-downloader --help
node-site-downloader download --help
Now you can run the downloader straight from a docker container. This way there is no need to download nodejs
and install node-site-downloader
.
Instead please pull the image from dockerhub
docker pull gnird/node-site-downloader
And then run the container with all of the relevant options passed to the script (Please check the options section), except for --output-folder
.
--output-folder
isn't passed to the container because the script saves the site inside of the container.
Instead configure a volume from a folder on your
computer to /data
in the container.
docker run -v /some/path:/data ...
docker run -v /tmp/mysite:/data gnird/node-site-downloader download -d https://jestjs.io/docs/en/ -s https://jestjs.io/docs/en/getting-started -v
NOTICE: The first -v
configures the volume for the container and the second -v
(at the end of the command) is passed to the script in order to make it verbose
.
- domain (-d) - The script will download all of the urls under the specified url.
- start point (-s) - The page from which the script should start scraping
- include-images (--include-images) - Should the script download relevant images as well?
- output folder (--output-folder) - The folder in which the script should save the downloaded assets, Note: The folder should not exist!
- verbose (-v) - If flag is present the script will print every url that was downloaded.
- output folder suffix (--output-folder-suffix) - The suffix that will be added to
OUTPUT_FOLDER
, defaults to:.site