Body data
    • 17 Feb 2024
    • 2 Minutes to read
    • Dark
      Light

    Body data

    • Dark
      Light

    Article Summary

    Web Scanner allows users to target the body data of a webpage in a number of ways, allowing security teams to scan for similair websites based on hash values, similiarity scores, JavaScript data, and darkweb content.

    Here's a selection of useful data types. Click here for a full list of field names.

    SHA-256 body data

    Field names

    • body_analysis.body_sha256
    • body_analysis.header_sha256
    • body_analysis.footer_sha256

    Explanation

    The above fields are part of the http response body analysis - i.e. the SHA-256 hash of whatever is contained in the < body > tag of a webpage.

    Matching hashes means that the content is exactly the same across one or more pages.

    This, however, is quite rare and generally only occurs with basic web pages and holding pages, e.g. error pages or directory listings.

    Javascript body data

    Field names

    • body_analysis.js_sha256
    • body_analysis.js_ssdeep

    Explanation

    The above fields contain a comprehensive list of all referenced JavaScript files, whether through URLs or embedded as script, along with their corresponding SHA-256 and ssdeep hashes.

    These hashes are specific to each file version and parameter. For example, a file referenced with different query parameters, such as ?v=1.2 or ?v=2.0 would result in distinct hashes. This is crucial as even minor differences in file versions can lead to significant variations hash values.

    body_analysis.js_ssdeep, being a fuzzy hash, displays similarity between two files. The more byte-identical two files are, the less different their ssdeep hashes will be.

    Language data

    Field names

    • body_analysis.language

    Explanation

    The above field serves an indicator of the intended audience and language of the website. Be wary of body_analysis.language results that don't sync with the tld country code.

    Example

    Chinese language content on a website with a ".co.uk" tld

    Onion data

    Field names

    • body_analysis.onion

    Explanation

    The above field contains a list of Tor onion addresses referenced within the HTML body of a webpage. This information can be leveraged to connect with Tor scan data.

    Example

    A clearnet website promoting Command and Control (C2) or phishing kits provides a link to their .onion purchasing page.

    By identifying these connections, you can conduct further investigation into the hosting IP, domain, and other associated data.

    SHV data

    Field names

    • body_analysis.SHV

    Explanation

    The Script Hash Value (SHV) is a fingerprint generated by alphabetically ordering the list of all script names (excluding parameters) as they appear on a webpage. It excels in pinpointing groups of perfect matches.

    This method entails a somewhat fuzzy search, disregarding parameters, treating variations such as "jquery-2.1.4.min.js?buildTime=1708035548" and "jquery-2.1.4.min.js?buildTime=999999990" as identical, even though the "buildTime" value differs between the two

    Unlike a SHA-256 search (which allows only exact matches for single files), the SHV's fuzzy nature facilitates the identification of similar groups. While a single-file search using ssdeep is possible, it may not always yield useful results when finding partial hash matches.

    Example

    If a phishing kit includes commonly used JavaScript files such as jQuery, along with two custom JS files with varying versions denoted by a "?v=" parameter, the SHV fingerpint enables the discovery of these variations.

    HTML body data

    Field names

    • html_body_similarity

    Explanation

    The above field name outputs a numerical value ranging from 0 to 100, representing the similarity between the current scan and the previous scan of a webpage.

    Example

    A value of 91 implies a 9% difference in data content compared to the previous scan.

    This calculation is based on the html_body_ssdeep field.

    This metric may not always correlate with the visual similarity of a website.


    Was this article helpful?

    What's Next