In the evolution of computer science, we’ve shifted from manually crafted programs to machine learning-driven solutions. The recent game-changer is deep learning, leveraging vast data to tackle once-deemed insurmountable challenges. Deep learning outpaces traditional techniques with artificial neurons mirroring the human brain, hidden layers, and backpropagation.
Different parts of the human brain are responsible for activities like intuitive decision-making, language processing, image recognition, etc.; similarly, different deep learning algorithms are responsible for solving different problems.
For example, while multilayer perception is responsible for decision-making based on historical data, a recursive neural network solves language-based problems.
“Deep” in deep learning refers to networks with more than three layers, while networks with two or three layers are basic neural networks.
This article will focus on one such neural network algorithm responsible for creating computer vision applications – Convolutional Neural Network, i.e., the CNN algorithm. However, before getting into the details and answering what CNN is, let’s start with some background and a basic understanding of this class of artificial neural networks.
What is CNN?
Today, CNN is the one algorithm that started the revolution in the field of computer vision. Its capability lies in its architectural design. While that will be discussed, let’s first understand where the CNN algorithm comes from and what we mean by it.
Background
Before understanding CNN in detail, let us first look at the origin and development of the field over time.
-
1950s – 1980s
Since the early days of AI, researchers, while convinced of the potential of artificial neural networks, have found it difficult to solve complex non-linear problems. It is, therefore, no surprise that they were also having a hard time understanding and processing visual data through neural networks.
The research for the solution kept on going. The search that started in the 1950s found some solution when, in the 1980s, a new neural network technique known as Convolutional Neural Network (CNN) was developed by researcher Yann LeCun.

-
1980s – 2010s
While initially, LeNet (the early version of CNN named after LeCun) could only recognize handwritten digits, it was considered a great leap forward in deep learning as previously it was unknown to solve such complex problems using computers.
The algorithm started gaining traction when it was adopted to read pin and zip codes in postal sectors. The issue at that time was that the algorithm in its current form could not scale due to its design and the limited resources available.
Also read: Understanding Genetic Algorithm: Guide to Types, Advantages, and Limitations
CNN required large, well-labeled data to be processed, and enough computing resources were not readily available then, which restricted its widespread adoption.
Things began to change with the advancement of computer hardware, and with the revolution of GPUs, cloud computing, microchip processing, etc., the capability of CNN began to surface.

The watershed moment happened in 2012 when an AI system, AlexNet, developed and named after its primary creator, Alex Krizhevsky, won the ILSVRC (ImageNet Large-Scale Visual Recognition Challenge).
Rather than using two convolutional and pooling layers in addition to two fully connected layers and an RBD classifier in the output layer as done in LeNet, Krizhevsky went for a more complex architecture with five convolutional, three pooling, and two fully connected layers using the softmax function in the output layer.
Using large datasets like the well-labeled ImageNet dataset and advanced computational resources allowed him to retrieve CNN, making AlexNet win the 2012 ImgeNet computer vision contest.
The system won the competition with 85% accuracy, while the runner-up scored only 74%, making it an undisputed winner.
The AlexNet system could mimic the workings of human vision, instilling confidence in other researchers to revisit multiple-layered neural networks, i.e., deep learning, for solving computer vision problems. Since then, much work has gone into this branch of AI.

Defining Convolutional Neural Network
Convolutional Neural Network, aka. CNN or ConvNet is a type of deep learning algorithm where a mathematical operation known as convolution is used instead of the traditional general matrix multiplication, at least in one of the hidden layers.
The difference here is that in convolution, a mathematical operation on two functions produces a third function that explains how the shape of one modifies the other.
While one can get into the complicated mathematics behind the functioning of CNN, one needs to have a proper knowledge of the architecture of CNN and the way it functions to implement CNN properly and perform computer vision tasks.
This is exactly what will be explored in the next two sections.
Also read: Fundamentals Concepts of Neural Networks & Deep Learning