Machine Learning and Cybersecurity: Made for Each Other

01 Feb

Machine Learning and Cybersecurity: Made for Each Other

in Blog, Perspectives

by Nikfar Khaleeli

In our recent webinar “Behavioral Analytics and Machine Learning: What You Need to Know”, Dr. Ankur Teredesai, Professor of Computer Science and Systems at the Center of Data Science at the University of Washington, put the spotlight on a very interesting topic: why there is such spirited interest in applying machine learning to cybersecurity. If you missed the live event, I’d encourage you to view the on-demand version.

To illustrate the point, he drew on the DIMS (Distributed Incident Management System) project that has been created to protect the public cyber infrastructure. The DIMS project, funded via the DHS, is an open source prototype that uses behavioral analytics to counter targeted and advanced threats. As input for its behavioral analytics modules, DIMS uses log data that’s collected from multiple Washington state municipalities, enabling cross-organization correlation. Its stakeholders include multiple public and private organizations focused on reducing cyberthreats (i.e., PRISEM, True Digital Security, Western Cyber Exchange, NCFTA, U.S. Secret Service).

Machine learning is well suited to take advantage of the large volumes of data produced by cybersecurity systems. But it’s not as simple as using one of the many available machine learning software libraries. It takes security domain knowledge, data scientist skills, and more to optimally apply machine learning to cybersecurity. And it’s worth the investment. For example, hackers use domain generation algorithms (DGA) to automatically create domains. These serve as the command and control (C&C) servers that communicate with and provide instructions to compromised systems within businesses. In the past, automatically generated domains were typically a random sequence of characters but, as hacker sophistication has increased, the domains have become very human readable, so it’s more difficult to detect them and make the association with malware.


In the webinar, Dr. Teredesai also discussed how machine learning can be used to identify the malware families associated with these automatically generated domains and, with careful feature engineering, generate very accurate results using only a subset of the many features that are possible to extract from the data (which is why I recently blogged about the importance of choosing meaningful features). Relying on fewer features means it is faster to extract them in real time and the solution becomes more scalable.

Much more was covered in this webinar. Dr. Teredesai also discusses how graph analysis in conjunction with machine learning can be used for

  • Anomaly detection, showing how an infection within the business network plays out over time.
  • Behavioral modeling such as user behavior analytics (UBA) and user and entity behavior analytics (UEBA), to reveal the network effect of a user getting compromised.
  • Attack pattern extraction, where looking in the past and simulating the future can prevent potential attacks.

There’s a lot of great information in this webinar so if you missed the live event, make sure to check out the on-demand version.

Tags: Blog, Perspectives