Skip to content
/ url Public

Extract archived URLs for specific domains from sources like the Wayback Machine, Common Crawl, and VirusTotal. It's a powerful resource for researchers, security analysts, and developers looking to explore historical or archived data about websites.

License

Notifications You must be signed in to change notification settings

zebbern/url

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Web Archive Crawler

This tool allows you to extract archived URLs for specific domains from sources like the Wayback Machine, Common Crawl, and VirusTotal. It's a powerful resource for researchers, security analysts, and developers looking to explore historical or archived data about websites.

Table of Contents

Features

  • Fetch URLs from the Wayback Machine and Common Crawl archives.
  • Optional integration with VirusTotal for additional URL data.
  • Support for fetching archived versions of specific URLs.
  • Exclude subdomains to focus on primary domains.
  • Write output to a file or display it in the terminal.
  • Show dates of archive snapshots in a human-readable format.

Installation

Prerequisites

  • Go 1.21 or higher installed on your machine.
  • Internet connection to fetch data from archives.

Steps

  1. Clone the repository:

    go install github.com/zebbern/url@latest
    export PATH=$PATH:$(go env GOPATH)/bin
  2. Run the tool:

    url [options] [domain...]

Options

  • -t <target>: Target domain or file containing a list of domains (one per line).
  • -o <file>: Output file to write results (default: stdout).
  • -d: Show the date of the fetch in the first column of the output.
  • -n: Exclude subdomains of the target domain.
  • -v: List different versions of URLs (from the Wayback Machine).
  • -vt <key>: VirusTotal API key for fetching additional URLs.

Examples

  1. Fetch URLs for a single domain:

    url example.com
  2. Fetch URLs from a file of domains and write to an output file:

    url -t domains.txt -o results.txt
  3. Fetch URLs without subdomains and show fetch dates:

    url -d -n -t example.com
  4. List archived versions of URLs:

    url -v example.com
  5. Fetch URLs including VirusTotal data:

    url -vt YOUR_API_KEY -t example.com

API Key Setup for VirusTotal

To fetch URLs from VirusTotal, you need an API key. You can obtain one by signing up at VirusTotal. Use the key with the -vt option:

url -vt YOUR_API_KEY -t example.com

Output Format

  • With Dates: Each line includes the fetch date in RFC3339 format followed by the URL.
  • Without Dates: Only the URLs are displayed.

Advanced Examples

A comprehensive guide to maximize the capabilities of the url tool in penetration testing workflows. These examples demonstrate advanced commands for recon and exploitation.


1. Extract URLs Containing Parameters

Identify URLs with query parameters for further injection testing.

Use Case:
Locate endpoints potentially vulnerable to SQLi, XSS, or other injection attacks.

url example.com | grep '?'

2. Filter by File Extensions

Extract URLs for specific file types such as .php, .aspx, .jsp, or .txt.

Use Case:
Focus on server-side scripts or configuration files for vulnerability analysis.

url example.com | grep -E '\.(php|aspx|jsp|txt)$'

3. Detect Open Redirects

Find URLs with redirect-like parameters (?url=, ?redirect=).

Use Case:
Identify open redirects that can be exploited for phishing or bypasses.

url example.com | grep -E "redirect=|url="

5. Hunt for Backup and Config Files

Find URLs ending with backup or configuration file extensions.

Use Case:
Locate sensitive backup files that might expose credentials or database structures.

url example.com | grep -E '\.(bak|old|config|cfg|sql|db)$'

6. Enumerate Subdomains

Identify subdomains from the extracted URLs.

Use Case:
Discover subdomains for further recon or exploitation.

url example.com | grep -oP 'https?://\K[^/]*' | sort -u

7. Save URLs for Burp Suite

Export unique URLs for crawling and fuzzing in Burp Suite.

Use Case:
Import into Burp Suite for automated scanning.

url example.com | sort -u > burp_urls.txt

8. Test LFI Vulnerabilities

Filter URLs for potential Local File Inclusion testing.

Use Case:
Detect vulnerable endpoints allowing file path manipulation.

url example.com | grep -E '\.php\?file='

9. Extract Endpoints Containing Login or Admin

Look for URLs that might indicate sensitive areas of the website.

Use Case:
Target administrative or authentication endpoints for brute-forcing or bypass attempts.

url example.com | grep -E 'login|admin'

10. Chain with Other Tools

Combine url output with popular security tools.

  • Check Live URLs with httpx:

    url example.com | httpx
  • Identify Patterns with gf (GoFindings):

    url example.com | gf xss
  • Expand Data with waybackurls:

    url example.com | waybackurls | sort -u

11. Automate and Expand Workflow

Create a Bash script to automate common recon tasks.

Use Case:
Run a single script to collect multiple data types.

#!/bin/bash
domain=$1
url $domain | tee urls.txt
url $domain | grep '\.js$' | tee js_files.txt
url $domain | grep -E '\.(php|aspx|jsp)$' | tee scripts.txt

Contributing

Contributions are welcome! Please fork the repository, make your changes, and submit a pull request.

License

This project is licensed under the MIT License. See the MSI file for details.

Contact

For inquiries, please contact:

  • GitHub: zebbern
  • inspired by WayBackURL by @tomnomnom.

Warning

These is intended for educational and ethical hacking purposes only. It should only be used to test systems you own or have explicit permission to test. Unauthorized use of third-party websites or systems without consent is illegal and unethical.

About

Extract archived URLs for specific domains from sources like the Wayback Machine, Common Crawl, and VirusTotal. It's a powerful resource for researchers, security analysts, and developers looking to explore historical or archived data about websites.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages