Passive Recon Tool

TL;DR: Passive OSINT pipeline that maps a domain’s attack surface without direct interaction, correlates multi-source data, and generates a structured risk-based security report.

Report First Page

I developed a passive reconnaissance pipeline capable of mapping a target domain’s attack surface using only publicly available data (OSINT).

Unlike traditional scanning approaches, this tool performs zero direct interaction with the target, eliminating detection risks while still providing meaningful security insights.

The system aggregates data from multiple external sources, correlates findings, and transforms fragmented information into a structured and actionable security report.

This project simulates a real-world reconnaissance phase used in penetration testing and attack surface analysis, focusing on scalability, automation, and low-noise information gathering.

This approach ensures:

No direct contact with the target
No legal or ethical risks
Real-world applicability in reconnaissance phases

Technical Decisions

Several design choices were made to ensure reliability, scalability, and data consistency:

VirusTotal (Passive DNS):
Selected for historical DNS resolution data, allowing timeline-based infrastructure analysis and identification of IP reuse.
The implementation deduplicates records and tracks first/last seen timestamps.
crt.sh (Certificate Transparency):
Used to enumerate subdomains from SSL certificates, enabling discovery of hidden or legacy assets often missed by DNS-based methods.
Wildcard certificates are handled separately to avoid noise.
Threaded Resolution (Concurrency):
Subdomain and infrastructure validation use multithreading to significantly reduce execution time.
Data Normalization Layer:
WHOIS data is normalized into a structured domain model to handle inconsistent formats across registrars.
Deduplication Strategy:
DNS records, certificates, and subdomains are aggregated using key-based correlation to avoid redundant entries.
Resilience Handling:
API failures and timeouts are handled, ensuring the pipeline continues execution even with partial data.
Modular Architecture:
Each stage (collection, processing, enrichment, reporting) is isolated, enabling easy extension and maintenance.

Objectives

This project was built with the following goals:

Automate passive reconnaissance workflows
Aggregate multiple OSINT sources into a single pipeline
Identify security-relevant patterns in public data
Generate a clean and readable report for analysis
Simulate a real-world reconnaissance phase used in pentesting

Key Concepts

This project explores important cybersecurity concepts:

Passive Reconnaissance (OSINT)
Attack Surface Mapping
Certificate Transparency
Metadata Exposure
Infrastructure Enumeration
Risk Assessment Modeling

Architecture

The tool follows a modular pipeline:

                
[Input Layer]
    └── Target Domain

[Collection Layer]
    ├── WHOIS Collection
    ├── Passive DNS (VirusTotal API)
    ├── Certificate Transparency (crt.sh)
    └── Wayback Machine (Document Discovery)

[Processing Layer]
    ├── Data Normalization (WHOIS parsing)
    ├── Deduplication (DNS, certs, subs)
    ├── Timeline Correlation (first_seen / last_seen)
    └── Metadata Classification

[Enrichment Layer]
    ├── IP Intelligence (IPInfo API)
    └── Active/Inactive Resolution (Threaded DNS lookup)

[Analysis Layer]
    ├── Subdomain Risk Detection (keyword-based)
    ├── Metadata Exposure Analysis
    ├── Infrastructure Sizing
    └── Domain Lifecycle Analysis

[Scoring Engine]
    └── Weighted Risk Model (Low/Medium/High → Score → Average)

[Output Layer]
    └── HTML Report Generator

External APIs Used

Project Structure

.
├── recon.py             # Main execution script
├── passive.py           # OSINT data collection
├── data_filter.py       # WHOIS data processing
├── domain.py            # Domain data model
├── pdfgenerator.py      # Report generation
├── requirements.txt
└── template/
    └── report_template_passive_css.html

Data Collection Techniques

1. WHOIS Analysis

Used to extract:

Registrar information
Domain lifecycle dates
Organization details

Security insight: Missing or obfuscated data may indicate privacy protection or misconfiguration.

2. Passive DNS

Collected via VirusTotal API:

Historical IP resolutions
Infrastructure mapping

Security insight: Helps identify infrastructure changes and potential shared hosting risks.

3. Subdomain Enumeration

Using certificate transparency logs (crt.sh):

Discovers hidden or forgotten subdomains

Security insight: Subdomains like dev, test, or admin may expose sensitive environments.

Analyzing the collected data reveals domain reuse of gabrielpassos.com through the observed dates.

4. Infrastructure Analysis

IP enrichment using external APIs:

ASN
Organization
Country

Security insight: Expands the attack surface visibility.

5. Public Document Discovery

Wayback Machine is used to find:

PDFs
DOC/DOCX files

Security insight: Documents may expose:

Internal usernames
Software versions
Sensitive metadata

Risk Assessment Model

The tool implements a heuristic-based risk scoring system that evaluates multiple aspects of the target:

Domain Management:
Based on expiration proximity and WHOIS completeness
Subdomain Exposure:
Detection of sensitive naming patterns such as dev, test, admin, and internal
Metadata Exposure:
Volume of publicly accessible documents and potential information leakage
Infrastructure Exposure: Size and distribution of the identified infrastructure

Each category is classified as Low, Medium, or High, and assigned a numerical score:

Low = 1
Medium = 2
High = 3

The overall risk is calculated using an average-based scoring model:

≥ 2.5 → High
≥ 1.5 → Medium
< 1.5 → Low

This approach provides a simple but effective way to prioritize potential risks based on aggregated indicators.

Results

The tool successfully automates the reconnaissance workflow and produces a structured security report in a short time frame.

Example outcomes include:

Identification of multiple subdomains through certificate transparency logs
Detection of active vs inactive assets via DNS resolution
Discovery of publicly accessible documents from archived sources
Correlation of infrastructure data (ASN, organization, geolocation)
Automated classification of risk across multiple categories

Execution is optimized through concurrency, allowing multiple enrichment and validation tasks to run in parallel, significantly reducing total runtime.

The final output is a comprehensive HTML report that consolidates all findings into a readable and actionable format.

Limitations

While effective, the tool has some limitations:

Relies on third-party APIs, which may introduce rate limits or incomplete data
Passive-only approach may miss assets not exposed through public sources
WHOIS data inconsistency across registrars can affect accuracy
Risk scoring is heuristic-based and does not replace manual analysis

These limitations reflect real-world constraints of passive reconnaissance techniques.

Output

The final output is an HTML report containing:

WHOIS data
DNS records
Subdomains (active/inactive)
Certificates
Infrastructure details
Metadata findings
Risk assessment

Download HTML Report Example 🔗

Key Challenges

During development, some challenges included:

Handling inconsistent WHOIS formats
Normalizing data from multiple APIs
Avoiding duplicate records
Managing API failures and timeouts
Designing a clean and readable report

What I Learned

This project helped me improve:

OSINT data correlation
Secure data handling
Writing modular Python code
Designing analysis pipelines
Thinking like an attacker during recon

Future Improvements

Planned enhancements:

Integration with additional OSINT sources
PDF export support
Improved risk scoring model
Web-based interface
Caching for faster execution

Ethical Considerations

This tool operates strictly in a passive manner, meaning:

No direct interaction with the target
No scanning or exploitation
Only publicly available data is used

Conclusion

This project demonstrates how publicly available data can be systematically leveraged to map a target’s attack surface without active interaction.

Beyond data collection, the focus was on building a structured pipeline capable of:

Correlating multiple OSINT sources
Reducing noise through normalization and deduplication
Automating analysis and reporting
Providing actionable security insights

This work reflects my ability to design scalable reconnaissance workflows and think in terms of attack surface analysis, a critical component in modern cybersecurity operations.

Repository

GitHub Repository Link Here 🔗