A Binary Detection Classification Model
Abstract
Detections (programmatic rules that translate machine generated log data into an alert that indicates potential attacker behavior) can be thought of as being of two types: anomalous and malicious. The type which a detection belongs to has practical consequences for it’s design, maintenance and handling by a response team.
The Theory
General mental models, while at times tedious and cumbersome, can be useful. I think I learn a lot from pithy insights from people who know more than me, and a bit from didactic, general models. Maybe that's because useful didactic, general models that endure are rare and hard to discover.
Nonetheless, while working as a software engineer and interfacing with detection systems frequently I've come to see there is a way of classifying detections based on their functionality that can influence design. The model I propose may be limited in certain ways, incomplete fundamentally and difficult or unappealing to some people who prefer a particular intellectual aesthetic, but, if you're reading this document it's because I found the model useful in practice and therefor worth sharing.
While I understand the model below may sound obvious and perhaps “self evident” to those familiar with the space I’ve still found it quite useful in practice. You’d be surprised how easy it is to conceive of an anomalous detection with no baseline or how lengthy debates about “sunsetting” malicious detections can be. The model provides answers for both of these issues. It’s intuitive, simplistic nature is, in my view, it’s strength.
For the purposes of this document I define a detection as a discrete set of programmatic rules that translate machine generated log data into an alert that indicates potential attacker behavior. These are most commonly correlation or filter based, although ML has been applied in certain contexts to detections. The following model classifies detections in two categories
Anomalous detections
Behavior which is not necessarily known to be malicious inherently but that is not typical for the environment
For example:
The use of wmic.exe to query information about a remote computer
The use of powershell.exe with a base64 encoded command
The use of a HTTP library in a VBA macro in a Word or Excel doc
Malicious detections
Behavior which is associated with tools that are used explicitly and only for taking control of a system in an unauthorized manner
For example:
An invocation of a hack tool (mimikatz, nmap, cobalt strike) on a managed computer based on file name and process start telemetry
A software doing unequivocally undesirable actions - "rm -if /", overwriting or erasing critical system files, subverting an end users ability to physically control the device
Unique character strings in malware or hack tools that are unlikely to be found elsewhere (kill switches, file names, IP addresses, registry key / value pairs)
If you subscribe to this binary detection classification model (anomalous / malicious) there are a few consequences at design time
Anomalous detections
Anomalous detections require knowledge of a baseline, ideally measured over a period of time of months. You cannot identify what is abnormal if you don't know what's normal.
Anomalous detections also require a different type of health monitoring and periodic re-evaluation. What's abnormal yesterday may be normal today.
Anomalous detections are also lower priority for response teams than malicious detections. "Something is unusual" is not the same as "something is malicious."
Malicious detections
Malicious detections require previous exposure to the threat, either through experience or simulation.
Malicious detections don't have a clear lifecycle in terms of their efficacy. They may remain relevant until invalidated by design.
Malicious detections with no automated response action are higher in priority than anomalous detections for response teams.
All of these aspects of the two types of detections can influence detection and response operations in a meaningful way.
I’ve also included the original, hand written form of this paper, for fun (: (I like to hand write things sill, I’m old school) (the link below is to SquareSpace’s CDN, hence the long, strange, GUID filled URL)
Also … a plug for https://remarkable.com/, which is what I used to write the hand written version of this paper. The Remarkable tablet is awesome, I use it almost every day. It’s super easy to jot something down, upload it to my Google account and include it in documents at work to foster collaboration. I also love using it to visualize design for algorithms and edge cases in system mechanics when I’m doing root cause analysis.
It’s just an awesome tool anyhow. It really feels like paper, it makes writing super smooth, and it’s just seamless. Other tablets always feel too “slippery”, are too finnicky or unresponsive, or just intrusive to the flow of my writing process. The Remarkable is not. Mad props to whoever built it. It’s great for writing down your theories as they arise (:
Remarkable isn’t paying me anything for saying this, I just think it’s an awesome tool.