Understand body data

Prev Next

Web Scanner allows users to target the body data of a webpage in a number of ways, allowing security teams to scan for similar websites based on hash values, similarity scores, JavaScript data, and darkweb content.

Here's a selection of useful data types. Click here for a full list of field names.

Data types at a glance

Data type

What it does

Use case

Field Names

Examples

SHA-256 Body Data

Generates exact-match hashes for webpage body, header, or footer content.

Detect identical pages (e.g., error or holding pages).

body_analysis.body_sha256, body_analysis.header_sha256, body_analysis.footer_sha256

Two websites with identical 404 error pages share the same body_sha256 hash.

JavaScript Data

Tracks JavaScript files via SHA-256 (exact) and ssdeep (fuzzy) hashes.

Identify variations in JS files, even with different parameters (e.g., ?v=1.2).

body_analysis.js_sha256, body_analysis.js_ssdeep

A phishing site uses login.js?v=1.1; ssdeep shows it’s 95% similar to login.js?v=1.2 on another site.

Language Data

Identifies the website’s language to reveal intended audience.

Flag mismatches (e.g., Chinese content on a .co.uk domain).

body_analysis.language

A .co.uk site with language: zh-CN (Chinese) suggests potential fraud.

Onion Data

Lists Tor .onion addresses referenced in the HTML body.

Connect clearnet sites to dark web activity (e.g., phishing kits).

body_analysis.onion

A clearnet site links to abc123.onion for a phishing kit purchase page.

Script Hash Value (SHV)

Creates a fingerprint of script names, ignoring parameters, for fuzzy matching.

Group similar websites using common scripts (e.g., phishing kits with jQuery).

body_analysis.SHV

Two phishing sites use jquery-2.1.4.min.js with different ?buildTime parameters; SHV groups them as identical.

HTML Body Similarity

Measures similarity (0–100) between current and previous webpage scans.

Track changes in content (e.g., 91 means 9% difference).

html_body_similarity

A site’s similarity score drops to 85, indicating a 15% content change, possibly a new phishing page.