Malware Analysis
Malware analysis is the practice of taking a malicious program apart to answer three questions: what does it do, how do we detect it, and how do we clean it up. It sits at the heart of incident response, threat intelligence, and detection engineering.
Handling live samples is only safe and lawful when you own the analysis lab and are authorised to study the file. Treat every sample as loaded and pointed at your own machine. Detonate malware exclusively in an isolated environment, never on a host you care about or a network you share with others.
Why Analysts Take Malware Apart
A finished report answers practical questions that defenders act on immediately:
- Classification. Is this a trojan, ransomware, a loader, a rootkit, or commodity adware? The family drives the response.
- Indicators of compromise (IOCs). File hashes, mutex names, registry keys, dropped files, domains, and IP addresses that let a team hunt for other infected hosts.
- Capabilities. Does it steal credentials, encrypt files, persist across reboots, or open a backdoor for remote control?
- Scope. Forensics teams use the findings to decide whether one laptop or the whole domain was touched.
- Detection. Analysts convert behaviour into YARA rules, Sigma rules, and EDR signatures so the next infection is caught automatically.
Build a Safe Lab First
Never run an unknown binary on your daily machine. Stand up a dedicated analysis virtual machine and follow a few non-negotiable rules:
- Use host-only or an isolated internal network so the sample cannot reach the internet or your LAN. Simulate services with tools like INetSim or FakeNet-NG when the malware expects a callback.
- Take a clean snapshot before every run and revert to it afterwards. Assume the VM is burned once malware executes.
- Disable shared folders and clipboard sharing, and be aware that some samples detect virtualisation and refuse to run.
- Keep two toolkits ready: REMnux (Linux, for triage and network analysis) and a Windows VM built with FLARE-VM (for Windows malware).
Analysis splits into two complementary phases: static (examine without running) and dynamic (run and observe).
Static Analysis
Static analysis inspects the file at rest. It is fast, low-risk, and often reveals enough to triage a sample before you ever execute it.
Fingerprint and Triage
Start with identity and reputation. Compute cryptographic hashes and check whether the sample is already known before spending hours on it:
file suspicious.bin
sha256sum suspicious.bin
Submit the hash — not the file, to avoid tipping off an operator — to a reputation service, or search internal threat intel. Pull human-readable strings to spot URLs, IP addresses, commands, and error messages:
# ASCII and 16-bit Unicode strings, minimum length 8
strings -n 8 suspicious.bin
strings -e l -n 8 suspicious.bin
Inspect Structure
For Windows executables, the PE header lists imported functions, sections, and the compile timestamp. Imports such as CryptEncrypt, InternetOpenUrl, or VirtualAllocEx hint at ransomware, network callbacks, or code injection. Command-line tooling makes this scriptable:
pip install pefile
python -c "import pefile,sys; pe=pefile.PE(sys.argv[1]); print([s.Name.decode(errors='ignore') for s in pe.sections])" suspicious.bin
High-entropy sections or a tiny import table usually mean the payload is packed or encrypted. Packing is a dead end for pure static analysis — you either unpack it or switch to dynamic analysis.
Reverse Engineering
To read the actual logic, disassemble or decompile the binary. Ghidra (free, from the NSA) and IDA are the standard disassemblers; radare2/rizin and x64dbg round out the toolkit. Reverse engineering is how analysts recover hardcoded keys, understand a custom protocol, or find the exact condition that triggers a payload. Pair it with your knowledge of operating systems and the Linux shell, because malware constantly abuses OS APIs and built-in commands.
Dynamic Analysis
Dynamic analysis runs the sample in the sandbox and records what it touches. Behaviour is harder to obfuscate than code, so this phase frequently reveals what static analysis hid behind packing.
Watch four channels at once:
- Processes. Process Monitor and Process Hacker on Windows show new processes, injected threads, and child spawns.
- Filesystem and registry. Track dropped files, modified startup keys, and scheduled tasks — the usual persistence mechanisms.
- Network. Capture traffic with Wireshark or tcpdump to expose command-and-control domains and beaconing intervals.
- API calls. A debugger or an automated sandbox logs the exact system calls the malware makes.
Automated sandboxes such as CAPE or Cuckoo orchestrate all of this, detonate the sample, and produce a report of dropped files, network indicators, and behaviour tags. They are a force multiplier for triage, but a determined analyst still confirms the important findings by hand.
Map Behaviour to ATT&CK
Frame what you observe using the MITRE ATT&CK framework. Describing actions as techniques — for example T1547 Boot or Logon Autostart Execution for persistence — gives defenders a shared vocabulary and connects your sample to known threat actors.
Turn Findings into Detection
Analysis only pays off when it hardens defences. The classic output is a YARA rule that matches distinctive strings or byte patterns:
rule Example_Loader_Strings
{
meta:
author = "analyst"
description = "Matches unique strings from the loader"
strings:
$a = "botnet_config_v3" ascii
$b = { 6A 40 68 00 30 00 00 } // push 0x40; push 0x3000
condition:
uint16(0) == 0x5A4D and all of them
}
Ship the resulting IOCs and rules to your SIEM, EDR, and firewall so the same threat is blocked everywhere. Feed novel families — especially anything exploiting a zero-day — back to the wider malware research community to speed up collective defence.
Hands-on Lab: Triage the EICAR Test File
Before you ever touch a live sample, rehearse the whole triage loop on the EICAR test file — a harmless 68-byte string that every antivirus flags as malicious by convention. It carries no real payload, so it is the perfect way to build muscle memory without risk. Do it inside your isolated analysis VM anyway, and snapshot first out of habit.
- Create the sample. In REMnux or any Linux VM, write the canonical EICAR string to disk:
printf '%s' 'X5O!P%@AP[4\PZX54(P^)7CC)7}$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*' > eicar.com
- Fingerprint it. Record identity before doing anything else — exactly what you would do with an unknown binary:
file eicar.com
sha256sum eicar.com
- Pull strings. Confirm the readable content that gives the file away:
strings eicar.com
The full EICAR-STANDARD-ANTIVIRUS-TEST-FILE marker appears in clear text — the kind of distinctive artefact you hunt for in a real sample.
- Write a detection. Turn that observation into a signature. Save this as
eicar.yar:
rule EICAR_Test_File
{
meta:
description = "Detects the EICAR antivirus test string"
strings:
$marker = "$EICAR-STANDARD-ANTIVIRUS-TEST-FILE!$H+H*"
condition:
$marker
}
- Verify the rule fires:
yara eicar.yar eicar.com
A line reading EICAR_Test_File eicar.com means your rule matched. You have just walked the complete chain — sample to hash to strings to working signature — on a target that can never hurt your lab. Once this feels routine, graduate to packed or live samples with the same discipline.
Working Habits That Keep You Effective
- Revert religiously. Every detonation gets a fresh snapshot. Cross-contaminating runs ruins your evidence.
- Document as you go. Record hashes, timestamps, commands, and observed behaviour; a report you cannot reproduce is not a report.
- Assume anti-analysis tricks. Modern samples check for VMs, debuggers, and sleep past sandbox timeouts. Patch these checks or run longer when a sample “does nothing.”
- Stay legal and current. Only handle samples you are authorised to, follow the legal and ethical rules of your jurisdiction, and keep learning as families evolve.