Contrary to what you may have read, machine learning (ML) isn’t magic pixie dust. In general, ML is good for narrowly scoped problems with huge datasets available, and where the patterns of interest are highly repeatable or predictable. Most security problems neither require nor benefit from ML. Many experts, including the folks at Google, suggest that when solving a complex problem you should exhaust all other approaches before trying ML.

ML is a broad collection of statistical techniques that allows us to train a computer to estimate an answer to a question even when we haven’t explicitly coded the correct answer. A well-designed ML system applied to the right type of problem can unlock insights that would not have been attainable otherwise.

A successful ML example is natural language processing
(NLP). NLP allows computers to “understand” human language, including things like idioms and metaphors. In many ways, cybersecurity faces the same challenges as language processing. Attackers may not use idioms, but many techniques are analogous to homonyms, words that have the same spelling or pronunciations but different meanings. Some attacker techniques likewise closely resemble actions a system administrator might take for perfectly benign reasons.

IT environments vary across organizations in purpose, architecture, prioritization, and risk tolerance. It’s impossible to create algorithms, ML or otherwise, that broadly address security use cases in all scenarios. This is why most successful applications of ML in security combine multiple methods to address a very specific issue. Good examples include spam filters, DDoS or bot mitigation, and malware detection.

Garbage in, Garbage Out

The biggest challenge in ML is availability of relevant, usable data to solve your problem. For supervised ML, you need a large, correctly labeled dataset. To build a model that identifies cat photos, for example, you train the model on many photos of cats labeled “cat” and many photos of things that aren’t cats labeled “not cat.” If you don’t have enough photos or they’re poorly labeled, your model won’t work well.

In security, a well-known supervised ML use case is signatureless malware detection. Many endpoint protection platform (EPP) vendors use ML to label huge quantities of malicious samples and benign samples, training a model on “what malware looks like.” These models can correctly identify evasive mutating malware and other trickery where a file is altered enough to dodge a signature but remains malicious. ML doesn’t match the signature. It predicts malice using another feature set and can often catch malware that signature-based methods miss.

However, because ML models are probabilistic, there’s a trade-off. ML can catch malware that signatures miss, but it may also miss malware that signatures catch. This is why modern EPP tools use hybrid methods that combine ML and signature-based techniques for optimal coverage.

Something, Something, False Positives

Even if the model is well-crafted, ML presents some additional challenges when it comes to interpreting the output, including:

  • The result is a probability.
    The ML model outputs the likelihood of something. If your model is designed to identify cats, you’ll get results like “this thing is 80% cat.” This uncertainty is an inherent characteristic of ML systems and can make the result difficult to interpret. Is 80% cat enough?
  • The model can’t be tuned, at least not by the end user. To handle the probabilistic outcomes, a tool might have vendor-set thresholds that collapse them to binary results. For example, the cat-identification model may report that anything >90% “cat” is a cat. Your business’s tolerance for cat-ness may be higher or lower than what the vendor set.
  • False negatives (FN), the failure to detect real evil, are one painful consequence of ML models, especially poorly tuned ones. We dislike false positives (FP) because they waste time. But there is an inherent trade-off between FP and FN rates. ML models are tuned to optimize the trade-off, prioritizing the “best” FP-FN rate balance. However, the “correct” balance varies among organizations, depending on their individual threat and risk assessments. When using ML-based products, you must trust vendors to select the appropriate thresholds for you.
  • Not enough context for alert triage. Part of the ML magic is extracting powerful predictive but arbitrary “features” from datasets. Imagine that identifying a cat happened to be highly correlated with the weather. No human would reason this way. But this is the point of ML — to find patterns we couldn’t otherwise find and to do so at scale. Yet, even if the reason for the prediction can be exposed to the user, it’s often unhelpful in an alert triage or incident response situation. This is because the “features” that ultimately define the ML system’s decision are optimized for predictive power, not practical relevance to security analysts.

Would “Statistics” by Any Other Name Smell as Sweet?

Beyond the pros and cons of ML, there’s one more catch: Not all “ML” is really ML. Statistics gives you some conclusions about your data. ML makes predictions about data you didn’t have based on data you did have. Marketers have enthusiastically latched onto “machine learning” and “artificial intelligence” to signal a modern, innovative, advanced technology product of some kind. However, there’s often very little regard for whether the tech even uses ML, never mind if ML was the right approach.

So, Can ML Detect Evil or Not?

ML can detect evil when “evil” is well-defined and narrowly scoped. It can also detect deviations from expected behavior in highly predictable systems. The more stable the environment, the more likely ML is to correctly identify anomalies. But not every anomaly is malicious, and the operator isn’t always equipped with enough context to respond. ML’s superpower is not in replacing but in extending the capabilities of existing methods, systems, and teams for optimal coverage and efficiency.