Passive Recon Tool

TL;DR: Passive OSINT pipeline that maps a domain’s attack surface without direct interaction, correlates multi-source data, and generates a structured risk-based security report.

Report First Page

Report First Page

I developed a passive reconnaissance pipeline capable of mapping a target domain’s attack surface using only publicly available data (OSINT).

Unlike traditional scanning approaches, this tool performs zero direct interaction with the target, eliminating detection risks while still providing meaningful security insights.

The system aggregates data from multiple external sources, correlates findings, and transforms fragmented information into a structured and actionable security report.

This project simulates a real-world reconnaissance phase used in penetration testing and attack surface analysis, focusing on scalability, automation, and low-noise information gathering.

This approach ensures:

  • No direct contact with the target
  • No legal or ethical risks
  • Real-world applicability in reconnaissance phases

Technical Decisions

Several design choices were made to ensure reliability, scalability, and data consistency:

  • VirusTotal (Passive DNS):
    Selected for historical DNS resolution data, allowing timeline-based infrastructure analysis and identification of IP reuse.
    The implementation deduplicates records and tracks first/last seen timestamps.

  • crt.sh (Certificate Transparency):
    Used to enumerate subdomains from SSL certificates, enabling discovery of hidden or legacy assets often missed by DNS-based methods.
    Wildcard certificates are handled separately to avoid noise.

  • Threaded Resolution (Concurrency):
    Subdomain and infrastructure validation use multithreading to significantly reduce execution time.

  • Data Normalization Layer:
    WHOIS data is normalized into a structured domain model to handle inconsistent formats across registrars.

  • Deduplication Strategy:
    DNS records, certificates, and subdomains are aggregated using key-based correlation to avoid redundant entries.

  • Resilience Handling:
    API failures and timeouts are handled, ensuring the pipeline continues execution even with partial data.

  • Modular Architecture:
    Each stage (collection, processing, enrichment, reporting) is isolated, enabling easy extension and maintenance.

Objectives

This project was built with the following goals:

  • Automate passive reconnaissance workflows
  • Aggregate multiple OSINT sources into a single pipeline
  • Identify security-relevant patterns in public data
  • Generate a clean and readable report for analysis
  • Simulate a real-world reconnaissance phase used in pentesting

Key Concepts

This project explores important cybersecurity concepts:

  • Passive Reconnaissance (OSINT)
  • Attack Surface Mapping
  • Certificate Transparency
  • Metadata Exposure
  • Infrastructure Enumeration
  • Risk Assessment Modeling

Architecture

The tool follows a modular pipeline:

Pipeline Diagram

Pipeline Diagram

                
[Input Layer]
    └── Target Domain

[Collection Layer]
    ├── WHOIS Collection
    ├── Passive DNS (VirusTotal API)
    ├── Certificate Transparency (crt.sh)
    └── Wayback Machine (Document Discovery)

[Processing Layer]
    ├── Data Normalization (WHOIS parsing)
    ├── Deduplication (DNS, certs, subs)
    ├── Timeline Correlation (first_seen / last_seen)
    └── Metadata Classification

[Enrichment Layer]
    ├── IP Intelligence (IPInfo API)
    └── Active/Inactive Resolution (Threaded DNS lookup)

[Analysis Layer]
    ├── Subdomain Risk Detection (keyword-based)
    ├── Metadata Exposure Analysis
    ├── Infrastructure Sizing
    └── Domain Lifecycle Analysis

[Scoring Engine]
    └── Weighted Risk Model (Low/Medium/High → Score → Average)

[Output Layer]
    └── HTML Report Generator

External APIs Used

Project Structure

.
├── recon.py             # Main execution script
├── passive.py           # OSINT data collection
├── data_filter.py       # WHOIS data processing
├── domain.py            # Domain data model
├── pdfgenerator.py      # Report generation
├── requirements.txt
└── template/
    └── report_template_passive_css.html

Data Collection Techniques

1. WHOIS Analysis

Used to extract:

  • Registrar information
  • Domain lifecycle dates
  • Organization details
Whois Section

Whois Section

Security insight: Missing or obfuscated data may indicate privacy protection or misconfiguration.

2. Passive DNS

Collected via VirusTotal API:

  • Historical IP resolutions
  • Infrastructure mapping

Security insight: Helps identify infrastructure changes and potential shared hosting risks.

DNS Records Section

DNS Records Section

3. Subdomain Enumeration

Using certificate transparency logs (crt.sh):

  • Discovers hidden or forgotten subdomains
Identified Subs Section

Identified Subs Section

Security insight: Subdomains like dev, test, or admin may expose sensitive environments.

Analyzing the collected data reveals domain reuse of gabrielpassos.com through the observed dates.

4. Infrastructure Analysis

IP enrichment using external APIs:

  • ASN
  • Organization
  • Country
Identified IP’s Section

Identified IP’s Section

Security insight: Expands the attack surface visibility.

5. Public Document Discovery

Wayback Machine is used to find:

  • PDFs
  • DOC/DOCX files

Security insight: Documents may expose:

  • Internal usernames
  • Software versions
  • Sensitive metadata

Risk Assessment Model

The tool implements a heuristic-based risk scoring system that evaluates multiple aspects of the target:

  • Domain Management:
    Based on expiration proximity and WHOIS completeness

  • Subdomain Exposure:
    Detection of sensitive naming patterns such as dev, test, admin, and internal

  • Metadata Exposure:
    Volume of publicly accessible documents and potential information leakage

  • Infrastructure Exposure: Size and distribution of the identified infrastructure

Risk Assessment Section

Risk Assessment Section

Each category is classified as Low, Medium, or High, and assigned a numerical score:

  • Low = 1
  • Medium = 2
  • High = 3

The overall risk is calculated using an average-based scoring model:

  • ≥ 2.5 → High
  • ≥ 1.5 → Medium
  • < 1.5 → Low

This approach provides a simple but effective way to prioritize potential risks based on aggregated indicators.

Results

The tool successfully automates the reconnaissance workflow and produces a structured security report in a short time frame.

Example outcomes include:

  • Identification of multiple subdomains through certificate transparency logs
  • Detection of active vs inactive assets via DNS resolution
  • Discovery of publicly accessible documents from archived sources
  • Correlation of infrastructure data (ASN, organization, geolocation)
  • Automated classification of risk across multiple categories

Execution is optimized through concurrency, allowing multiple enrichment and validation tasks to run in parallel, significantly reducing total runtime.

The final output is a comprehensive HTML report that consolidates all findings into a readable and actionable format.

Limitations

While effective, the tool has some limitations:

  • Relies on third-party APIs, which may introduce rate limits or incomplete data
  • Passive-only approach may miss assets not exposed through public sources
  • WHOIS data inconsistency across registrars can affect accuracy
  • Risk scoring is heuristic-based and does not replace manual analysis

These limitations reflect real-world constraints of passive reconnaissance techniques.

Output

The final output is an HTML report containing:

  • WHOIS data
  • DNS records
  • Subdomains (active/inactive)
  • Certificates
  • Infrastructure details
  • Metadata findings
  • Risk assessment

Download HTML Report Example 🔗

Key Challenges

During development, some challenges included:

  • Handling inconsistent WHOIS formats
  • Normalizing data from multiple APIs
  • Avoiding duplicate records
  • Managing API failures and timeouts
  • Designing a clean and readable report

What I Learned

This project helped me improve:

  • OSINT data correlation
  • Secure data handling
  • Writing modular Python code
  • Designing analysis pipelines
  • Thinking like an attacker during recon

Future Improvements

Planned enhancements:

  • Integration with additional OSINT sources
  • PDF export support
  • Improved risk scoring model
  • Web-based interface
  • Caching for faster execution

Ethical Considerations

This tool operates strictly in a passive manner, meaning:

  • No direct interaction with the target
  • No scanning or exploitation
  • Only publicly available data is used

Conclusion

This project demonstrates how publicly available data can be systematically leveraged to map a target’s attack surface without active interaction.

Beyond data collection, the focus was on building a structured pipeline capable of:

  • Correlating multiple OSINT sources
  • Reducing noise through normalization and deduplication
  • Automating analysis and reporting
  • Providing actionable security insights

This work reflects my ability to design scalable reconnaissance workflows and think in terms of attack surface analysis, a critical component in modern cybersecurity operations.

Repository

GitHub Repository Link Here 🔗

Author

Gabriel Passos Cybersecurity Student & Security Enthusiast