Web Archive Crawler

This tool allows you to extract archived URLs for specific domains from sources like the Wayback Machine, Common Crawl, and VirusTotal. It's a powerful resource for researchers, security analysts, and developers looking to explore historical or archived data about websites.

Features

Fetch URLs from the Wayback Machine and Common Crawl archives.
Optional integration with VirusTotal for additional URL data.
Support for fetching archived versions of specific URLs.
Exclude subdomains to focus on primary domains.
Write output to a file or display it in the terminal.
Show dates of archive snapshots in a human-readable format.

Installation

Prerequisites

Go 1.21 or higher installed on your machine.
Internet connection to fetch data from archives.

Steps

Clone the repository:

go install github.com/zebbern/url@latest

Run the tool:
```
url [options] [domain...]
```

Options

-t <target>: Target domain or file containing a list of domains (one per line).
-o <file>: Output file to write results (default: stdout).
-d: Show the date of the fetch in the first column of the output.
-n: Exclude subdomains of the target domain.
-v: List different versions of URLs (from the Wayback Machine).
-vt <key>: VirusTotal API key for fetching additional URLs.

Examples

Fetch URLs for a single domain:
```
url example.com
```
Fetch URLs from a file of domains and write to an output file:
```
url -t domains.txt -o results.txt
```
Fetch URLs without subdomains and show fetch dates:
```
url -d -n -t example.com
```
List archived versions of URLs:
```
url -v example.com
```
Fetch URLs including VirusTotal data:
```
url -vt YOUR_API_KEY -t example.com
```

API Key Setup for VirusTotal

To fetch URLs from VirusTotal, you need an API key. You can obtain one by signing up at VirusTotal. Use the key with the -vt option:

url -vt YOUR_API_KEY -t example.com

Output Format

With Dates: Each line includes the fetch date in RFC3339 format followed by the URL.
Without Dates: Only the URLs are displayed.

Advanced Examples

A comprehensive guide to maximize the capabilities of the url tool in penetration testing workflows. These examples demonstrate advanced commands for recon and exploitation.

1. Extract URLs Containing Parameters

Identify URLs with query parameters for further injection testing.

Use Case:
Locate endpoints potentially vulnerable to SQLi, XSS, or other injection attacks.

url example.com | grep '?'

2. Filter by File Extensions

Extract URLs for specific file types such as .php, .aspx, .jsp, or .txt.

Use Case:
Focus on server-side scripts or configuration files for vulnerability analysis.

url example.com | grep -E '\.(php|aspx|jsp|txt)$'

3. Detect Open Redirects

Find URLs with redirect-like parameters (?url=, ?redirect=).

Use Case:
Identify open redirects that can be exploited for phishing or bypasses.

url example.com | grep -E "redirect=|url="

5. Hunt for Backup and Config Files

Find URLs ending with backup or configuration file extensions.

Use Case:
Locate sensitive backup files that might expose credentials or database structures.

url example.com | grep -E '\.(bak|old|config|cfg|sql|db)$'

6. Enumerate Subdomains

Identify subdomains from the extracted URLs.

Use Case:
Discover subdomains for further recon or exploitation.

url example.com | grep -oP 'https?://\K[^/]*' | sort -u

7. Save URLs for Burp Suite

Export unique URLs for crawling and fuzzing in Burp Suite.

Use Case:
Import into Burp Suite for automated scanning.

url example.com | sort -u > burp_urls.txt

8. Test LFI Vulnerabilities

Filter URLs for potential Local File Inclusion testing.

Use Case:
Detect vulnerable endpoints allowing file path manipulation.

url example.com | grep -E '\.php\?file='

9. Extract Endpoints Containing Login or Admin

Look for URLs that might indicate sensitive areas of the website.

Use Case:
Target administrative or authentication endpoints for brute-forcing or bypass attempts.

url example.com | grep -E 'login|admin'

10. Chain with Other Tools

Combine url output with popular security tools.

Check Live URLs with httpx:
```
url example.com | httpx
```
Identify Patterns with gf (GoFindings):
```
url example.com | gf xss
```
Expand Data with waybackurls:
```
url example.com | waybackurls | sort -u
```

11. Automate and Expand Workflow

Create a Bash script to automate common recon tasks.

Use Case:
Run a single script to collect multiple data types.

#!/bin/bash
domain=$1
url $domain | tee urls.txt
url $domain | grep '\.js$' | tee js_files.txt
url $domain | grep -E '\.(php|aspx|jsp)$' | tee scripts.txt

Contributing

Contributions are welcome! Please fork the repository, make your changes, and submit a pull request.

License

This project is licensed under the MIT License. See the MSI file for details.

Contact

For inquiries, please contact:

GitHub: zebbern
inspired by WayBackURL by @tomnomnom.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
LICENSE		LICENSE
README.md		README.md
go.mod		go.mod
main.go		main.go

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Web Archive Crawler

Table of Contents

Features

Installation

Prerequisites

Steps

Options

Examples

API Key Setup for VirusTotal

Output Format

Advanced Examples

1. Extract URLs Containing Parameters

2. Filter by File Extensions

3. Detect Open Redirects

5. Hunt for Backup and Config Files

6. Enumerate Subdomains

7. Save URLs for Burp Suite

8. Test LFI Vulnerabilities

9. Extract Endpoints Containing Login or Admin

10. Chain with Other Tools

11. Automate and Expand Workflow

Contributing

License

Contact

About

Releases

Packages

Languages

License

zebbern/url

Folders and files

Latest commit

History

Repository files navigation

Web Archive Crawler

Table of Contents

Features

Installation

Prerequisites

Steps

Options

Examples

API Key Setup for VirusTotal

Output Format

Advanced Examples

1. Extract URLs Containing Parameters

2. Filter by File Extensions

3. Detect Open Redirects

5. Hunt for Backup and Config Files

6. Enumerate Subdomains

7. Save URLs for Burp Suite

8. Test LFI Vulnerabilities

9. Extract Endpoints Containing Login or Admin

10. Chain with Other Tools

11. Automate and Expand Workflow

Contributing

License

Contact

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages