Machine Learning - Adversarial

Adversarial machine learning is a subfield of machine learning that focuses on studying the vulnerability of machine learning models to adversarial attacks. An adversarial attack is a deliberate attempt to fool a machine learning model by introducing small perturbations in the input data. These perturbations are often imperceptible to humans, but they can cause the model to make incorrect predictions with high confidence. Adversarial attacks can have serious consequences in real-world applications, such as autonomous driving, security systems, and healthcare.

There are several types of adversarial attacks, including −

Evasion attacks − These attacks aim to manipulate the input data to cause the model to misclassify it. Evasion attacks can be targeted, where the attacker knows the target class, or untargeted, where the attacker only wants to cause a misclassification.
Poisoning attacks − These attacks aim to manipulate the training data to bias the model towards a particular class or to reduce its overall accuracy. Poisoning attacks can be either data poisoning, where the attacker modifies the training data, or model poisoning, where the attacker modifies the model itself.
Model inversion attacks − These attacks aim to infer sensitive information about the training data or the model itself by observing the outputs of the model.

To defend against adversarial attacks, researchers have proposed several techniques, including −

Adversarial training − This technique involves augmenting the training data with adversarial examples to make the model more robust to adversarial attacks.
Defensive distillation − This technique involves training a second model on the outputs of the first model to make it more resistant to adversarial attacks.
Randomization − This technique involves adding random noise to the input data or the model parameters to make it harder for attackers to craft adversarial examples.
Detection and rejection − This technique involves detecting adversarial examples and rejecting them before they are processed by the model.

Implementation in Python

In Python, several libraries provide implementations of adversarial attacks and defenses, including −

CleverHans − This library provides a collection of adversarial attacks and defenses for TensorFlow, Keras, and PyTorch.
ART (Adversarial Robustness Toolbox) − This library provides a comprehensive set of tools to evaluate and defend against adversarial attacks in machine learning models.
Foolbox − This library provides a collection of adversarial attacks for PyTorch, TensorFlow, and Keras.

In the following example, we will do implementation of Adversarial Machine Learning using the Adversarial Robustness Toolbox (ART) −

First, we need to install the ART package using pip −

pip install adversarial-robustness-toolbox

Then, we can create an adversarial example using the ART library on a pre-trained model.

Example

import tensorflow as tf
from keras.datasets import mnist
from keras.models import Sequential
from keras.layers import Dense, Flatten, Conv2D, MaxPooling2D
from keras.optimizers import Adam
from keras.utils import to_categorical
from art.attacks.evasion import FastGradientMethod
from art.estimators.classification import KerasClassifier

import tensorflow as tf
tf.compat.v1.disable_eager_execution()

# Load the MNIST dataset
(x_train, y_train), (x_test, y_test) = mnist.load_data()

# Preprocess the data
x_train = x_train.reshape(-1, 28, 28, 1).astype('float32') / 255
x_test = x_test.reshape(-1, 28, 28, 1).astype('float32') / 255
y_train = to_categorical(y_train, 10)
y_test = to_categorical(y_test, 10)

# Define the model architecture
model = Sequential()
model.add(Conv2D(32, kernel_size=(3, 3), activation='relu', input_shape=(28, 28, 1)))
model.add(MaxPooling2D(pool_size=(2, 2)))
model.add(Flatten())
model.add(Dense(10, activation='softmax'))

# Compile the model
model.compile(loss='categorical_crossentropy', optimizer=Adam(lr=0.001), metrics=['accuracy'])

# Wrap the model with ART KerasClassifier
classifier = KerasClassifier(model=model, clip_values=(0, 1), use_logits=False)

# Train the model
classifier.fit(x_train, y_train)

# Evaluate the model on the test set
accuracy = classifier.evaluate(x_test, y_test)[1]
print("Accuracy on test set: %.2f%%" % (accuracy * 100))

# Generate adversarial examples using the FastGradientMethod attack
attack = FastGradientMethod(estimator=classifier, eps=0.1)
x_test_adv = attack.generate(x_test)

# Evaluate the model on the adversarial examples
accuracy_adv = classifier.evaluate(x_test_adv, y_test)[1]
print("Accuracy on adversarial examples: %.2f%%" % (accuracy_adv * 100))

In this example, we first load and preprocess the MNIST dataset. Then, we define a simple convolutional neural network (CNN) model and compile it using categorical cross-entropy loss and Adam optimizer.

We wrap the model with the ART KerasClassifier to make it compatible with ART attacks. We then train the model for 10 epochs on the training set and evaluate it on the test set.

Next, we generate adversarial examples using the FastGradientMethod attack with a maximum perturbation of 0.1. Finally, we evaluate the model on the adversarial examples.

Output

When you execute this code, it will produce the following output −

Train on 60000 samples
Epoch 1/20
60000/60000 [==============================] - 17s 277us/sample - loss: 0.3530 - accuracy: 0.9030
Epoch 2/20
60000/60000 [==============================] - 15s 251us/sample - loss: 0.1296 - accuracy: 0.9636
Epoch 3/20
60000/60000 [==============================] - 18s 300us/sample - loss: 0.0912 - accuracy: 0.9747
Epoch 4/20
60000/60000 [==============================] - 18s 295us/sample - loss: 0.0738 - accuracy: 0.9791
Epoch 5/20
60000/60000 [==============================] - 18s 300us/sample - loss: 0.0654 - accuracy: 0.9809
-------continue