Web Scanner allows users to target the HTML data of a webpage in a number of ways, allowing security teams to scan for similair websites.
Here's an explanation of useful data types. Click here for a full list of field names.
HTML body data
Field name
html_body_similarity
Explanation
The above field name outputs a numerical value ranging from 0 to 100, representing the similarity between the current scan and the previous scan of a webpage.
A value of 91 implies a 9% difference in data content compared to the previous scan.
This calculation is based on the html_body_ssdeep
field.
This metric may not always correlate with the visual similarity of a website.
HTML title data
Field name
htmltitle
Explanation
htmltitle
often serves as an initial point of investigation.
htmltitle = "Mythic"
could indicate the presence of the Mythic C2 Framework.
HTML titles are particularly useful when attempting to correlate phishing website groupings, when paired with JavaScript file hashes.
Header hash values (HHV)
Field name
hhv
Explanation
The header hash value is a fingerprint, comprised of various strings based on the keys (not values) of the response headers.
Location: https://facebook.com/
Content-Type: text/plain
Server: proxygen-bolt
Date: Fri, 16 Feb 2024 01:57:33 GMT
Connection: keep-alive
Content-Length: 0
Given the amount of data they contain relating to a single website, HHV values are often used as prefilters when conducting URL scans.