Web Scanner empowers security teams to analyze a webpage’s body data through various methods, enabling the identification of similar websites based on hash values, similarity scores, JavaScript data, and dark web content. Below is a curated selection of useful data types to enhance your scanning capabilities.

Here's a selection of useful data types. Click here for a full list of field names.

SHA-256 body data

Field names

body_analysis.body_sha256
body_analysis.header_sha256
body_analysis.footer_sha256

Explanation

The above fields are part of the HTTP response body analysis, specifically the SHA-256 hash of the content contained within the <body> tag of a webpage.

Matching hashes indicate that the content is identical across one or more pages.

This, however, is quite rare and generally only occurs with basic web pages and holding pages, e.g., error pages or directory listings.

Javascript body data

Field names

body_analysis.js_sha256
body_analysis.js_ssdeep

Explanation

The above fields contain a comprehensive list of all referenced JavaScript files, whether through URLs or embedded as script, along with their corresponding SHA-256 and ssdeep hashes.

These hashes are specific to each file version and parameter. For example, a file referenced with different query parameters, such as ?v=1.2 or ?v=2.0 would result in distinct hashes. This is crucial as even minor differences in file versions can lead to significant variations hash values.

body_analysis.js_ssdeep, being a fuzzy hash, displays similarity between two files. The more byte-identical two files are, the less different their ssdeep hashes will be.

Language data

Field names

body_analysis.language

Explanation

The above field serves as an indicator of the intended audience and language of the website. Be wary of body_analysis.language results that don't sync with the tld country code as this may suggest targeting discrepancies or misconfiguration.

Example

Chinese language content on a website with a ".co.uk" tld

Onion data

Field names

body_analysis.onion

Explanation

The above field contains a list of Tor onion addresses referenced within the HTML body of a webpage. This information can be leveraged to connect with Tor scan data.

Example

A clearnet website promoting Command and Control (C2) or phishing kits provides a link to their .onion purchasing page.

By identifying these connections, you can conduct further investigation into the hosting IP, domain, and other associated data.

SHV data

Field names

body_analysis.SHV

Explanation

The Script Hash Value (SHV) is a fingerprint generated by alphabetically ordering the list of all script names (excluding parameters) as they appear on a webpage. It excels in pinpointing groups of perfect matches.

This method entails a somewhat fuzzy search, disregarding parameters, treating variations such as "jquery-2.1.4.min.js?buildTime=1708035548" and "jquery-2.1.4.min.js?buildTime=999999990" as identical, even though the "buildTime" value differs between the two

Unlike a SHA-256 search (which allows only exact matches for single files), the SHV's fuzzy nature facilitates the identification of similar groups. While a single-file search using ssdeep is possible, it may not always yield useful results when finding partial hash matches.

Example

If a phishing kit includes commonly used JavaScript files such as jQuery, along with two custom JS files with varying versions denoted by a "?v=" parameter, the SHV fingerpint enables the discovery of these variations.

HTML body data

Field names

html_body_similarity

Explanation

The above field name outputs a numerical value ranging from 0 to 100, representing the similarity between the current scan and the previous scan of a webpage.

Example

A value of 91 implies a 9% difference in data content compared to the previous scan.

This calculation is based on the html_body_ssdeep field.

This metric may not always correlate with the visual similarity of a website.