Find Similar Files Using ssdeep Fuzzy Hashing

Prev Next

ssdeep (also known as context-triggered piecewise hashing) generates fuzzy hashes that allow detection of similar files, even if they've been slightly modified, packed, or obfuscated. This makes it invaluable for identifying evolved malware variants that evade exact-match cryptographic hashes like MD5 or SHA-256.

Malicious actors frequently repack or minorly alter malware to bypass signature-based detection. ssdeep helps security teams cluster related samples, track malware families, and uncover attacker infrastructure hosting similar payloads.

Step-by-Step Guide

  1. From the left navigation menu, select Advanced Query Builder > Xperimental Queries > ssdeep find similar.

  2. Enter the full ssdeep hash of a known malicious file (e.g., from a malware sample, VirusTotal, or a previous Silent Push scan result).

    • Example: 12288:abc123xyz456+def789:abcxyz

  3. Set a minimum hash similarity cutoff (e.g., 70–90%) to filter out weak matches and focus on strongly related files.

  4. Lower thresholds (e.g., 40–60%) for broader family clustering; higher (80+%) for near-identical variants.

    • Limit the number of returned results (e.g., 50–200) to manage large datasets.

    • Use skip to paginate through additional matches if needed.

  5. Click Search.

  6. Each match includes a similarity score, file details (e.g., hosting URL, domain, IP), and scan metadata.

    • Sort by similarity score descending to prioritize the closest matches.

    • Pivot on associated domains/IPs for reputation, risk scores, or further DNS/web scans.

    • Cluster results to map malware distribution networks or C2 infrastructure.

    • Export matches to block hosting sites, enrich alerts, or feed into your threat hunting workflow.

Save Query

  1. Specify query parameters.

  2. Click Save Query.

  3. Provide a Name and Description for context.

  4. Click Save. The query appears in Private Queries.

Tip

Combine ssdeep searches with other filters (if available in advanced mode), such as a datasource (e.g., web resources) or recent scan dates, to hunt for live threats. Save high-value queries as recurring alerts via webhooks to your SIEM, and catch new variants of tracked malware families as soon as they're observed in the wild.

ssdeep hashes are calculated using an algorithm that generates a fingerprint or hash value from a file's contents