Activation Function in a Neural Network: Sigmoid vs Tanh

Machine Learning Artificial Intelligence Functions

Introduction

Due to the non-linearity that can introduce towards the output of neurons, activation functions are essential to the functioning of neural networks. Sigmoid and tanh are two of the most often employed activation functions in neural networks. Binary classification issues frequently employ the sigmoid function in the output layer to transfer input values to a range between 0 and 1. In the deep layers of neural networks, the tanh function, which translates input values to a range between -1 and 1, is frequently applied. The usage of either function relies on the particular needs of the issue being handled since both have advantages and cons. We will examine the distinctions between the sigmoid and tanh activation functions in neural networks in this post and offer some suggestions as to which function could be most appropriate for certain sorts of issues.

What is the activation Function?

An activation function in neural networks is a mathematical function that is applied to each neuron's output in a layer of the network. The network can simulate more intricate interactions between the input and output variables because of the activation function, which brings nonlinearity into the neuron's output.

In a neural network, each neuron takes input from the layer before it, and its output is then sent via the activation function. The output of the neuron is altered by the activation function, and the modified output is then sent as input to the network's next layer.

Many activation function types, such as sigmoid, tanh, ReLU (Rectified Linear Unit), and softmax, are employed in neural networks. Each activation function has a unique mathematical structure and set of characteristics that make it suited for a certain class of issues and architectural designs.

For neural networks to understand non-linear correlations between the input and output variables, activation functions are a crucial component. Neural networks would only be able to describe linear connections without activation functions, which is insufficient for many real-world applications.

Sigmoid Activation Function

In neural networks, the sigmoid activation function is frequently employed. It is a mathematical formula that changes a neuron's input into a number between 0 and 1.

The sigmoid function has the following mathematical form ?

Where x is the input to the neuron, sigma(x) = 1 / (1 + exp(-x)).

Even as the input is large, a sigmoid function's output becomes closer to 1. A sigmoid function's output, on the opposing hand, swings toward zero whenever the input is small. The smooth S-shaped curve of the sigmoid function makes it differentiable and suited for backpropagation techniques used in neural network training.

The sigmoid function is often used in the output layer of binary classification problems, where the output of the network needs to be a probability value between 0 and 1. It can also be used in the hidden layers of shallow neural networks, although it suffers from the vanishing gradient problem, where the gradient of the function becomes very small as the input becomes very large or very small.

While the sigmoid function was widely used in the past, its use has decreased in recent years in favor of other activation functions, such as ReLU and its variants, due to their superior performance on deep neural networks.

Tanh activation function

In neural networks, the tanh (hyperbolic tangent) activation function is frequently utilized. A mathematical function converts a neuron's input into a number between -1 and 1.

The tanh function has the following formula: tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x)).

where x is the neuron's input.

The tanh function features a smooth S-shaped curve, similar to the sigmoid function, making it differentiable and appropriate for backpropagation methods used in neural network training.

One advantage of the tanh function over the sigmoid function is that it is centered around zero, which means that its output is symmetric around zero. This property makes it useful in hidden layers of neural networks because it allows the network to model positive and negative relationships between the input and output variables.

The tanh function is often used in hidden layers of neural networks because it introduces non-linearity into the network and can capture small changes in the input. However, it suffers from the vanishing gradient problem, where the gradient of the function becomes very small as the input becomes very large or very small, which can slow down the training of deep neural networks.

Overall, the tanh function is a useful activation function for neural networks, particularly in hidden layers where it can capture complex relationships between the input and output variables.

Sigmoid vs Tanh

Sigmoid Function

Maps input values to a range between 0 and 1 using the sigmoid function.
possesses a gentle S-curve.
used in binary classification issues' output layer.
has a vanishing gradient issue, which causes the function's gradient to rapidly decrease when the size of the input increases or decreases.
may add nonlinearity to the network and record minute input changes.

Tanh Function

translates the supplied numbers to a range between -1 and 1.
possesses a gentle S-curve.
used in neural networks' hidden layers.
It is zero-centered, capturing both positive and negative correlations between the input and output variables.
possesses the vanishing gradient issue.
may add nonlinearity to the network and capture intricate connections between the input and output variables.

Criteria	Sigmoid Function	Tanh Function
Mathematical form	?(x) = 1 / (1 + exp(-x))	tanh(x) = (exp(x) - exp(-x)) / (exp(x) + exp(-x))
Output range	0 to 1	-1 to 1
Centered around zero	No	Yes
Use cases	The output layer of binary classification problems, hidden layers of shallow neural networks	Hidden layers of neural networks
Advantages	Differentiable, introduces non-linearity, used in binary classification problems	A steeper gradient around zero captures small changes in the input
Disadvantages	Suffers from vanishing gradient problem, output not centered around zero	Suffers from vanishing gradient problem, which can cause exploding gradient problem when input is too large

Conclusion

In conclusion, neural networks frequently utilize the sigmoid and tanh functions as activation functions. The hidden layers of shallow neural networks and the output layer of binary classification tasks both frequently employ the sigmoid function. As the tanh function has a steeper gradient near 0 than the sigmoid function, it is frequently utilized in the hidden layers of neural networks. The vanishing gradient issue affects both functions, although the tanh function can also result in bursting gradients. It is crucial to take the unique properties of the dataset and the job at hand into account when selecting an activation function, as well as the benefits and drawbacks of each function.

Premansh Sharma

Updated on: 2023-04-13T17:22:51+05:30

1K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started