This guide explains key Web Scanner data types used to analyze webpages for similarity, malicious activity, or specific characteristics. These data types, represented by field names in SPQL queries, enable security teams to identify phishing websites, track dark web connections, detect malware, and analyze URL navigation paths. For a complete list of field names, refer to Understand Field Names.
Body Data
Body data fields allow analysis of a webpage’s HTML content, JavaScript, language, and dark web references, helping identify identical or similar websites.
Field Name | Description | Type |
---|---|---|
| SHA-256 hash of the | String |
| SHA-256 hash of the | String |
| SHA-256 hash of the | String |
| List of referenced JavaScript files with SHA-256 hashes (includes URLs and query parameters, e.g., | String |
| List of JavaScript files with SSDeep fuzzy hashes for similarity detection | String |
| Comma-separated list of HTML languages (most to least used). Mismatches with | String |
| List of Tor .onion addresses in the HTML body, useful for dark web investigations | String |
| Script Hash Value (SHV), a fingerprint of script names (excluding parameters, e.g., | String |
Example
Query:
body_analysis.onion = *market* AND datasource = torscan
Finds .onion sites referencing “market” in their HTML body.
Query:
body_analysis.SHV = jquery* AND htmltitle = *login*
Identifies potential phishing sites using specific JavaScript files and login-related titles.
Favicon Data
Favicon fields target website icons to identify similar websites based on hash values or paths.
Field Name | Description | Type |
---|---|---|
| MD5 hash of .ico favicon (typically at | String |
| Murmur3 hash of .ico favicon | String |
| Path to .ico favicon | String |
| MD5 hash of non-.ico favicon | String |
| Murmur3 hash of non-.ico favicon | String |
| Path to non-.ico favicon | String |
| List of favicon URLs, including unreferenced root-favicon | String |
Notes:
favicon
fields refer to .ico files;favicon2
fields cover non-.ico formats (e.g., PNG). Websites may have both.Browsers and Web Scanner automatically check for
/favicon.ico
even if not referenced in code.
Example
Query:
favicon_murmur3 = 1234567890 AND datasource = webscan
Finds websites with a specific .ico favicon hash, indicating visual similarity.
HTML Data
HTML data fields analyze webpage titles, similarity scores, and response headers for correlation and filtering.
Field Name | Description | Type |
---|---|---|
| Numerical value (0–100) showing similarity to the previous scan, based on | Number |
| HTML title, useful for initial investigations (e.g., detecting phishing or C2 frameworks). | String |
| Header Hash Value, a fingerprint of response header keys (not values, e.g., | String |
Example
Query:
htmltitle = "Mythic" AND body_analysis.js_sha256 = *mythic*
Correlates websites using the Mythic C2 Framework via title and JavaScript hashes.
Query:
hhv = *proxygen* AND datasource = webscan
Prefilters scan websites for specific server software (e.g., proxy gen-bolt).
SSL Data
SSL data fields target certificate characteristics to identify unique or malicious servers.
Field Name | Description | Type |
---|---|---|
| Certificate Hash Value, fingerprint of issuer, subject, and extension keys, formatted as | X:<SANs_count>(e.g.,487049c6c39ee487049c6c39ee7646766df07c6:w:0005`). Identifies unique certificates, especially self-signed ones. |
| List of domains in the Subject Alternative Names (SANs) field | List of domains. |
Notes:
ssl.chv
includes:First part: Hashes of issuer, subject, and extension keys (often identical for issuer and subject).
Second part:
w
(wildcard certificate) orx
(non-wildcard).Third part: Number of SANs (e.g.,
0005
for 5 domains).
Example:
Query:
ssl.chv = *w:0001 AND datasource = services
Finds wildcard SSL certificates with a single SAN in non-HTTP services.
JARM Data
JAMR Data fields use TLS handshake characteristics to fingerprint servers or identify malware.
Field Name | Description | Type |
---|---|---|
| JARM fingerprint, a hash of TLS handshake characteristics (ciphersuites, extensions). Value-based, useful as a prefilter or for identifying unique TLS responses (e.g., malware). | String |
Query:
jarm = *abc123* AND datasource = webscan
Targets websites with a specific TLS fingerprint, potentially linked to malware.
Origin and Redirect Data
Origin and redirect fields track the initial and redirected paths of scanned URLs.
Field Type | Description | Type |
---|---|---|
| Domain originally scanned | String |
| Hostname of originally scanned domain | String |
| IP of originally scanned URL | String |
| Path of originally scanned URL | String |
| Port of the originally scanned URL | String |
| Scheme of the originally scanned URL (e.g., | String |
| URL originally scanned (e.g., | URL |
| Boolean indicating if a redirect occurred | Boolean |
| Boolean indicating if a redirect led to HTTPS | Boolean |
| List of URLs in the redirect chain (e.g., | String |
Notes:
Queries for
domain
,hostname, path
, orurl
automatically search correspondingorigin_xxx
fields for exact, wildcard, or regex matches (e.g.,domain = silentpush.com
searchesdomain
andorigin_domain
).Negative matches (e.g.,
domain != silentpush.com
) only search the specified field.
Example:
Query:
origin_url = http://20.160.240.124 AND redirect_to_https = true
Finds scans starting at a specific URL that redirected to HTTPS.