Web grep: search all rendered resources used by a URI
This node
command-line utility uses a headless browser (Puppeteer) to render
a webpage and download all resources it may need. These resources including the
original HTML are all saved locally which it searches one-by-one for a text
string.
Since we are downloading all resources it is easy to determine the total download size.
- Search using regular expressions
- A screen capture is created (not configurable)
$ git clone https://github.com/stav/wgrep.git
$ cd wgrep
$ npm install
Let's try to find the string "stav" from the repository website on GitHub:
$ npx wgrep stav https://github.com/stav/wgrep
Calling for "stav" in "output" from "https://github.com/stav/wgrep" with user "undefined"
Looking in "output" for 'stav'
Found 1 files
[ 'output/stav/wgrep/index.html' ]
It was only found in the index.html
page.
Now let's see what the total download size was:
$ du -sh output
1.4M output
$ wgrep --help
Usage: wgrep [options] <regex> <url>
Options:
-V, --version output the version number
-d, --directory <directory> The output directory (default: "output")
-u, --username <username> The user to authenticate as
-h, --help output usage information
$ npm test
$ npm run test-e2e
Please file any issues you have.
If you fix a bug or add new features it would be great to have you fork this repo and submit a pull request:
- Fork it (
https://github.com/yourname/yourproject/fork
) - Create your feature branch (
git checkout -b feature/fooBar
) - Commit your changes (
git commit -am 'Add some fooBar'
) - Push to the branch (
git push origin feature/fooBar
) - Create a new Pull Request
Apache 2.0