Understand HTML data

Web Scanner allows users to target the HTML data of a webpage in a number of ways, allowing security teams to scan for similair websites.

Here's an explanation of useful data types. Click here for a full list of field names.

HTML body data

Field name

html_body_similarity

Explanation

The above field name outputs a numerical value ranging from 0 to 100, representing the similarity between the current scan and the previous scan of a webpage.

Example

A value of 91 implies a 9% difference in data content compared to the previous scan.

This calculation is based on the html_body_ssdeep field.

This metric may not always correlate with the visual similarity of a website.

HTML title data

Field name

htmltitle

Explanation

htmltitle often serves as an initial point of investigation.

Example

htmltitle = "Mythic" could indicate the presence of the Mythic C2 Framework.

HTML titles are particularly useful when attempting to correlate phishing website groupings, when paired with JavaScript file hashes.

Header hash values (HHV)

Field name

hhv

Explanation

The header hash value is a fingerprint, comprised of various strings based on the keys (not values) of the response headers.

Example

Location: https://facebook.com/
Content-Type: text/plain
Server: proxygen-bolt
Date: Fri, 16 Feb 2024 01:57:33 GMT
Connection: keep-alive
Content-Length: 0

Given the amount of data they contain relating to a single website, HHV values are often used as prefilters when conducting URL scans.