Signature Based Detection.
What is signature-based detection?
Signature-based detection matches known patterns (signatures) against observed artefacts (files, network traffic, logs). It’s the classic approach used by AV, IDS/IPS (Snort/Suricata), email gateways, and many EDR rules. Signatures can be exact matches (file hash), pattern matches (byte sequence), structural rules (YARA), or behavioral/log patterns (SIEM rules).
Common signature types
-
File hashes (exact-match signatures)
-
MD5, SHA-1, SHA-256 (and SHA-512). Used to uniquely identify a file binary or sample. Fast to compute, cheap to compare.
-
Recommendations: use SHA-256 for new work (collision resistance + wide adoption). MD5/SHA-1 are weak for cryptographic guarantees but still used as legacy identifiers.
-
-
Fuzzy / similarity hashes
-
ssdeep (context triggered piecewise hashing) — measures similarity between files; useful for variants (packing, minor edits).
-
TLSH (Trend-micro Locality Sensitive Hash) — another similarity hash.
-
Use when exact hashes differ due to minor changes but you still want to detect family variants.
-
-
YARA rules (file content / structure rules)
-
Very flexible: match strings, hex patterns, file offsets, metadata, boolean logic, and modules for PE/ELF parsing. Great for malware family detection and hunting.
-
Example (very small):
-
-
IDS/IPS rules (Snort/Suricata)
-
Signatures for network traffic: HTTP patterns, protocol anomalies, specific payload bytes, flows, port-based detections. Example:
alert tcp any any -> any 80 (msg:"SQLi"; content:"UNION SELECT"; sid:1000001;)
-
-
Log/behavior signatures (Sigma, SIEM rules)
-
Detect sequences in logs (process spawn chains, suspicious command-lines, Lateral movement patterns). Sigma is a vendor-agnostic rule format that translates into Splunk/Elastic/QRadar rules.
-
-
IOC lists
-
Simple indicators: filenames, mutex names, registry keys, IPs, domains, URLs. Often used in blocklists or quick detection.
-
-
Non-cryptographic hashes for indexing
-
CRC32, MurmurHash — used internally for fast lookups (not for security).
-
Properties of different hash families (short)
-
MD5 — fast, 128-bit; collisions are trivial to produce now. Good as an identifier but not secure against attackers.
-
SHA-1 — 160-bit; collision attacks exist. Avoid for security-sensitive use.
-
SHA-2 (SHA-256/512) — secure for current practical needs. Use SHA-256 for file identification and signing workflows.
-
ssdeep / TLSH — not cryptographically secure but similarity metrics are useful for clustering variants.
How signatures are used in practice
-
AV engine: exact-hash for known malicious binaries + YARA for families + heuristics for packed/obfuscated code.
-
IDS: network pattern matching + protocol decoding + rule thresholds (to avoid floods).
-
EDR & Hunting: YARA + fuzzy hashes + behavioral detection (abnormal process creation, suspicious command lines).
-
Threat intel sharing: publish hashes, YARA rules, domains, IPs as IOCs.
Strengths and weaknesses
Strengths:
-
Very precise for known threats (low false negatives for exact signatures).
-
Fast and deterministic.
-
Easy to share (hash lists, YARA rules).
Weaknesses:
-
Only detects what’s known — fails against novel malware, zero-days, or significant polymorphism.
-
Evasion: trivial binaries changes break exact hashes; packers, encryption, polymorphism, and runtime code generation avoid static signatures.
-
False positives/negatives: poorly written rules can match benign content or miss variants.
-
Volume / performance: large rule sets or regex-heavy signatures can tax endpoints or sensors.
Common evasion techniques
-
Changing a single byte or timestamp to alter exact hash.
-
Packing/packing with custom packers (changing file envelope).
-
Polymorphic/encrypted payloads; unpacking only at runtime.
-
Domain generation algorithms (DGAs) for network indicators.
-
Living-off-the-land (LoL) — using signed/legit binaries (bypass file-based detection).
Mitigations & best practices
-
Layered detection: don’t rely only on hashes. Combine static signatures (hashes, YARA) with behavioral detection, heuristics, telemetry, and sandboxing.
-
Prefer strong hashes (SHA-256) for IOC publication. Include ssdeep or TLSH for family clustering.
-
YARA + modules: use metadata, file format checks, PE/ELF parsing to reduce false positives.
-
Triage & threat intel: validate IOCs (avoid blind blocking), add context (first seen, source reputation).
-
Update cadence: keep signature/rule feeds current; roll out safely to avoid mass false positives.
-
Testing: test signatures in a staging environment and tune thresholds.
-
Canonicalization and normalization: normalize URLs/paths before rule matching to avoid trivial evasion.
Practical examples
Compute common hashes on Linux:
-
md5sum sample.exe
-
sha1sum sample.exe
-
sha256sum sample.exe
Small YARA example (file detection + metadata):
Small ssdeep example (compare similarity):
-
Generate fuzzy hash:
ssdeep -b sample.exe
-
Compare:
ssdeep -k sample1.ssdeep sample2.ssdeep
Rule writing tips
-
Use anchored strings or hex patterns with offsets when possible (reduces false positives).
-
Avoid overly broad regexes across large inputs.
-
Add metadata (malware family, confidence, source) to rules.
-
Rate-limit noisy rules in network sensors (threshold options) to avoid alert storms.
When to use which signature
-
Use exact hashes for blocking confirmed, immutable malicious files (but with caution).
-
Use fuzzy hashes for hunting and clustering variants.
-
Use YARA for family detection, structural checks, and hunting in repositories.
-
Use IDS rules for network IOCs and protocol anomalies.
-
Use Sigma / SIEM rules for log-based behavioral detection.
Comments