As almost every new technology emerging, ML (Machine Learning) is a double-edged sword that can be used as a solution or as harm depending on the situation. So, it is only natural to assume that adversaries will become more and more interested in learning how to attack involving ML models. For instance, we know already about many security solutions using ML such as malware classifiers, behavior analytics, spam filters, and several detonation chamber technologies; all of them being targeted everyday expanding the attack surface widely.

But let’s get to the point, the purpose of the ML technology or model is transparent when we can abstract the attacks launched against them considering them either White Box or black box in association within a standard penetration testing case.

White Box attacks assume the adversary has direct access to a specific model. By this, we are talking about the code, the architecture, the trained parameters, and in some cases the data set used. While this type of attack seems unfeasible in real life, since having all this data about a model implies considerable knowledge which is often kept private (on the cloud and the user has only API access to the model), there are still model stealing attacks that can be performed prior a White Box scenario. (And here an important misconception must be dispelled: model stealing under ML attack jargon, refers to creating a clone of an ML model target. It doesn’t refer to stealing it in the sense of extracting the source code or files without permission. Secondly, one might naively think that to steal an ML model you can apply “fuzzing” techniques and collect inputs or outputs and learn from this information, however, whilst this is technically true, it doesn’t leverage to relevant information of the model itself or at least not enough for a copy).

Model stealing can be very versatile, there are many times models are deployed on end devices a malicious user can have access to, where a process like Reversing can be applied. Moreover, research has shown that adversarial samples, or in this case a model can be transferable, which means an adversarial sample for one model similar to our target may also be applicable to this one. In this way, the attacker can train a model designed locally for the same use case and generate an adversarial sample for that, which in turn can be potentially effective against the target model.

And here we introduce Black Box attacks, where we do not have direct access to the target model and in many cases, the access is limited to simple queries over the internet or the specific ML provider. Besides, no knowledge of its internals, architecture, or data. This kind of attacks work in the same way as Blind SQL injections, performing iterative (and educated) queries against the target model and observing its outputs, in order to build a copy of it; then White Box techniques are applied. Yes, this attack has varying complexity based on factors mentioned like availability of dataset, dimensionality of input space, complexity of use case, complexity of model, etc. But one important thing to notice here is, if a model is a simple linear model and the dimensionality of input is considerably low then the attacker can simply solve a few equations and create an exact replica of the deployed model. All the aforementioned concepts sounded quite technical and very complex, in fact, they are, but how it will really affect the daily lives of common people?? As we might expect ML technologies might be “invading” the world entirely and in the upcoming years more and more interaction with these models will be expanding the possibilities of attack/compromise. For example, a military drone misidentifies enemy tanks as friendlies or misidentifies targets with innocents. A self-driving car swerves into oncoming traffic. An NLP bot creates a sophisticated deep fake of a president or political actor initiating a major crisis. These are just examples of how ML (AI) systems can be hacked, which is an area of increased focus for government and industry leaders alike.

Machine learning for cybercriminals

On the other hand, ML technologies and capabilities can be used as well in the traditional kill chain model, giving attackers more advantage in succeeding on their damage. These are just some examples of the kill chain that machine learning can solve.

Information gathering

Malicious actors may use the classifying algorithms to characterize a potential victim as belonging to a relevant group. This means that after having collected thousands of emails, an attacker sends malware only to those who would click on the link. Thus, the attacker reduces the chances of early detection of the planned attack. Numerous factors may assist here. For example, the attacker can separate the users of social networking sites who write about IT from those focused on “food-and-cats” topics. The latter group might be unaware of threats.

Various clustering and classification methods from K-means and random forests to neural networks can be used in this case on top of the NLP analysis, which should be applied to victim’s posts on social networks. Additionally, If an attacker knows a victim and has his or her picture, ML can assist further. It’s easy to detect social media accounts by applying image recognition tools.


In the new era of AI, companies can create not only fake texts but also fake voices or videos. Lyrebird, a startup specializing in media and video that can mimic voices, demonstrates that they can make a bot that speaks exactly like you. With the growing amount of data and evolving networks, hackers can show even better results.

Just a few years ago, videos and images generated by neural networks had poor quality and were useful only for research articles. Now, almost everybody is able to generate a fake video with a celebrity or a world-known politician saying things they have never said or doing something they have never done(e.g. You Won’t Believe What Obama Says In This Video). It can be achieved with the help of publicly available tools such as Deepfake.

Unauthorized access

If cybercriminals need to get unauthorized access to a user’s session, the obvious way is to compromise the account. For mass bruteforcing (or password spraying), one of the annoying things is a captcha bypass. A number of computer programs can solve simple captcha tests but the most complicated part is the object segmentation.

One of the most inspiring papers regarding this bypassing was released on a Black Hat conference. The research paper was called “I am a Robot”. They used to break the latest semantic image captcha and compared various machine learning algorithms. The paper promised a 98% accuracy on breaking Google’s recaptcha.

Attack/exploitation phase

One of the most common methods of vulnerability discovery is Fuzzing. It implies putting a random input in the application and monitoring if it will crash. There are 2 steps that require automation and AI aid. The first is the generation of examples. Usually, if you take, for example, a PDF document, a researcher edits this document by randomly changing some fields. The use of smarter approaches to mutation generation can significantly speed up the process of finding new examples of documents that would crash the application.

Reinforcement learning approaches like the ones used by AlphaGo can also be implemented. If the AlphaGo model found a glitch in a game, it can help in finding security issues as well. The analysis of crashes follows the vulnerability discovery. Every analysis requires a great deal of manual work. If it is possible to train a model to choose more relevant crashes, it will save time and effort. In addition, it makes vulnerability discovery much cheaper.

Automation of tasks

Experienced hackers can use machine learning to automate tasks in various areas. It’s almost impossible to predict when and what exactly will be automated, but being aware of that, cybercrime organizations have hundreds of members requiring different types of software such as support portals or support bots.

As for specific cybercrime tasks, there is a new term known as “hivenet” standing for smart botnets. The idea is that if cybercriminals manage botnets manually, hivenets can have a sort of brain to reach a particular event and change behavior depending on them. Multiple bots will sit in the devices and decide based on the task who will use a victim’s resources.

As we could barely see attackers armed with AI capabilities pose a formidable and unprecedented threat. Bad actors are constantly looking at loopholes and ways to exploit the systems and with the right tools given by AI, they can easily accomplish success at a scale unachievable by humans. Moreover, traditional pentesting methodology will not cover assessment against these attacks because most of these are highly domain specific. Testing an ML system against above discussed potential vectors requires an “exquisite domain expertise”.

We must ensure that AI will be part of our security solution set, powering complex defined models for malicious behaviours, threats in general, and also conducting every analysis faster and accurate enough than humans could. But more crucial is to develop training programs designed for security researchers and ML practitioners in order to educate them on the above topics.