Machine learning can be applied in various ways in security, for instance, in malware analysis, to make predictions, and for clustering security events. It can also be used to detect previously unknown attacks with no established signature.
Wendy Edwards, a software developer interested in the intersection of cybersecurity and data science, spoke about applying machine learning to security at The Diana Initiative 2021.
Artificial Intelligence (AI) can be applied to detect anomalies by finding unusual patterns. But unusual doesn’t necessarily mean malicious, as Edwards explained:
For example, maybe your web server is experiencing higher than usual traffic because something is trending on social media. You may be able to examine things related to the traffic to make that decision. For example, are there a number of HTTP requests with the “User-agent” set to something not typically associated with normal web browsing? Is there a lot of unexplained traffic originating from a single IP or IP range? An unusual sequence of accesses to endpoints might suggest fuzzing.
With AI and machine learning, there are techniques to deal with a number of input variables and establish a conclusion. Edwards gave an example of how forecasting allows you to use time series data to make predictions about the future, and supports trends, seasons, and cycles:
This could be useful for measuring CPU utilization or total web server access. It’s quite possible that a system will normally be busiest during certain times of the day. Perhaps the hits on a new website are gradually trending up. Statistical metrics can also be useful, e.g. mean and standard deviation. This can help us determine what an “unusual” amount of activity from a single IP or IP range actually is.
Edwards showed how machine learning can be used to cluster security events:
Clustering is a machine learning technique to create groups of data points that are more similar to each other than outside points. Security incidents are sets of events, and often the same set of events with the same root cause show up in multiple locations.
For example, a Trojan Horse might attack a number of machines, but the root cause and remediation would be the same.
Clustering helps Security Operation Center (SOC) analysts identify similar incidents, which would generally require the same response. This can save time by eliminating a lot of tedious work, Edwards mentioned.
InfoQ interviewed Wendy Edwards about how machine learning is being applied in security.
InfoQ: What’s the state of practice on applying artificial intelligence in IT security?
Wendy Edwards: It’s steadily improving, though I think there will always be a need for skilled practitioners; artificial intelligence and machine learning is unlikely to replace people. Artificial intelligence has grown significantly over the past 15 years, and cybersecurity has also become even more challenging because of greater complexity in computing.
At this point, there’s been extensive research and development related to potential applications of artificial intelligence in cybersecurity, including intrusion detection, malware analysis, phishing detection, and finding bot accounts on social media. Natural language processing has played a role as well, most obviously in spam detection, but also in identifying malicious code in obfuscated scripts.
Just look at the number of vendors telling you about how their products use machine learning! However, there’s not a widely accepted set of best practices about AI and cybersecurity at this point.
InfoQ: You mentioned in your talk that anomaly-based detection has the potential to detect previously unknown attacks with no established signature. How does this work?
Edwards: This relates to the question about establishing what’s normal and what’s malicious. A signature is a set of rules related to a known attack, so there wouldn’t be any for an attack that hadn’t been seen before.
When we see something anomalous with no benign explanation, something may be wrong. For example, if something on your website is trending on social media, you may see increased activity and that’s OK. But if you’re seeing a lot of activity that does not correspond with normal user behavior, you may be under attack.
InfoQ: What AI tools are available and how can we use them?
Edwards: There are a number of established freely available tools; for example, Python has scikit-learn. Google and Facebook have released the Tensorflow and PyTorch libraries respectively.
Scikit-learn offers a lot of useful tools, including regression, clustering, classification, and more.
Tensorflow and PyTorch support more complex tasks, like deep learning. Generally, PyTorch is considered easier for experienced Python programmers to use, and TensorFlow is considered more ready for use in a production setting.
InfoQ: What do you expect the future will bring when it comes to AI and IT security?
Edwards: I think adversaries will also leverage artificial intelligence in attacks. The Internet of Things (IoT) and other growing technologies will create an increasingly large attack surface, and attackers may leverage AI to find ways to exploit this. According to a National Academy of Science report Implications of Artificial Intelligence for Cybersecurity, the use of AI and ML for finding and weaponizing new vulnerabilities is in the conceptualization and development stage in the United States, and likely in China and Israel as well.
Adversarial machine learning refers to attempts to fool machine learning algorithms. For example, a spammer may attempt to evade filtering by misspelling “bad” words and including “good” words not commonly associated with filters. If operational data is used to train future systems, an attacker may attempt to contaminate this data.
One example of this is the Microsoft “Tay” bot. After being bombarded by racist and sexist messages from trolls, Tay began to tweet offensive things and ended up being shut down after about 16 hours.