Web Scanner Data Type for Analysis

Prev Next

This guide explains key Web Scanner data types used to analyze webpages for similarity, malicious activity, or specific characteristics. These data types, represented by field names in SPQL queries, enable security teams to identify phishing websites, track dark web connections, detect malware, and analyze URL navigation paths. For a complete list of field names, refer to Understand Field Names.

Body Data

Body data fields allow analysis of a webpage’s HTML content, JavaScript, language, and dark web references, helping identify identical or similar websites.

Field Name

Description

Type

body_analysis.body_sha256

SHA-256 hash of the <body> content. Matches indicate identical content (rare, e.g., error pages).

String

body_analysis.header_sha256

SHA-256 hash of the <header> content

String

body_analysis.footer_sha256

SHA-256 hash of the <footer> content

String

body_analysis.js_sha256

List of referenced JavaScript files with SHA-256 hashes (includes URLs and query parameters, e.g., ?v=1.2

String

body_analysis.js_ssdeep

List of JavaScript files with SSDeep fuzzy hashes for similarity detection

String

body_analysis.language

Comma-separated list of HTML languages (most to least used). Mismatches with tld may indicate targeting issues.

String

body_analysis.onion

List of Tor .onion addresses in the HTML body, useful for dark web investigations

String

body_analysis.SHV

Script Hash Value (SHV), a fingerprint of script names (excluding parameters, e.g., jQuery-2.1.4.min.js). Identifies similar script groups

String

Example

  • Query: body_analysis.onion = *market* AND datasource = torscan

    • Finds .onion sites referencing “market” in their HTML body.

  • Query: body_analysis.SHV = jquery* AND htmltitle = *login*

    • Identifies potential phishing sites using specific JavaScript files and login-related titles.

Favicon Data

Favicon fields target website icons to identify similar websites based on hash values or paths.

Field Name

Description

Type

favicon_md5

MD5 hash of .ico favicon (typically at /favicon.ico)

String

favicon_murmur3

Murmur3 hash of .ico favicon

String

favicon_path

Path to .ico favicon

String

favicon2_md5

MD5 hash of non-.ico favicon

String

favicon2_murmur3

Murmur3 hash of non-.ico favicon

String

favicon2_path

Path to non-.ico favicon

String

favicon_urls

List of favicon URLs, including unreferenced root-favicon /favicon.ico

String

Notes:

  • favicon fields refer to .ico files; favicon2 fields cover non-.ico formats (e.g., PNG). Websites may have both.

  • Browsers and Web Scanner automatically check for /favicon.ico even if not referenced in code.  

Example

  • Query: favicon_murmur3 = 1234567890 AND datasource = webscan

    • Finds websites with a specific .ico favicon hash, indicating visual similarity.

HTML Data

HTML data fields analyze webpage titles, similarity scores, and response headers for correlation and filtering.

Field Name

Description

Type

html_body_similarity

Numerical value (0–100) showing similarity to the previous scan, based on html_body_ssdeep. A value of 91 indicates a 9% difference. May not reflect visual similarity.

Number

htmltitle

HTML title, useful for initial investigations (e.g., detecting phishing or C2 frameworks).

String

hhv

Header Hash Value, a fingerprint of response header keys (not values, e.g., Content-Type, Server). Used as a prefilter for URL scans.

String

Example

  • Query: htmltitle = "Mythic" AND body_analysis.js_sha256 = *mythic*

    • Correlates websites using the Mythic C2 Framework via title and JavaScript hashes.

  • Query: hhv = *proxygen* AND datasource = webscan

    • Prefilters scan websites for specific server software (e.g., proxy gen-bolt).

SSL Data

SSL data fields target certificate characteristics to identify unique or malicious servers.

Field Name

Description

Type

ssl.chv

Certificate Hash Value, fingerprint of issuer, subject, and extension keys, formatted as <hash>:w

X:<SANs_count>(e.g.,487049c6c39ee487049c6c39ee7646766df07c6:w:0005`). Identifies unique certificates, especially self-signed ones.

ssl.san

List of domains in the Subject Alternative Names (SANs) field

List of domains.

Notes:

  • ssl.chv includes:

    • First part: Hashes of issuer, subject, and extension keys (often identical for issuer and subject).

    • Second part: w (wildcard certificate) or x (non-wildcard).

    • Third part: Number of SANs (e.g., 0005 for 5 domains).

Example:

  • Query: ssl.chv = *w:0001 AND datasource = services

    • Finds wildcard SSL certificates with a single SAN in non-HTTP services.

JARM Data

JAMR Data fields use TLS handshake characteristics to fingerprint servers or identify malware.

Field Name

Description

Type

jarm

JARM fingerprint, a hash of TLS handshake characteristics (ciphersuites, extensions). Value-based, useful as a prefilter or for identifying unique TLS responses (e.g., malware).

String

  • Query: jarm = *abc123* AND datasource = webscan

    • Targets websites with a specific TLS fingerprint, potentially linked to malware.

Origin and Redirect Data

Origin and redirect fields track the initial and redirected paths of scanned URLs.

Field Type

Description

Type

origin_domain

Domain originally scanned

String

origin_hostname

Hostname of originally scanned domain

String

origin_ip

IP of originally scanned URL

String

origin_path

Path of originally scanned URL

String

origin_port

Port of the originally scanned URL

String

origin_scheme

Scheme of the originally scanned URL (e.g., http)

String

origin_url

URL originally scanned (e.g., http://3.1.104.127)

URL

redirect

Boolean indicating if a redirect occurred

Boolean

redirect_to_https

Boolean indicating if a redirect led to HTTPS

Boolean

redirect_list

List of URLs in the redirect chain (e.g., https://20.160.240.124/sslvpn/Login/login

String

Notes:

  • Queries for domain, hostname, path, or url automatically search corresponding origin_xxx fields for exact, wildcard, or regex matches (e.g., domain = silentpush.com searches domain and origin_domain).

  • Negative matches (e.g., domain != silentpush.com) only search the specified field.

Example:

  • Query: origin_url = http://20.160.240.124 AND redirect_to_https = true

    • Finds scans starting at a specific URL that redirected to HTTPS.