Imagine I am throwing you a ball, so what would be your first reaction? You simply catch it right? This is what happens in the front end, but what about the back end, I mean why did you catch it? You’re never informed earlier? Ok, so no more questions, let’s first understand the back end procedure of this simple task.
So, when I throw a ball at you, the image of the ball crosses through your eyes and strikes your retina. The retina does some elementary analysis and passes it towards the brain, the visual cortex in the brain analyzes the image and sends it back to the cortex. Now, the cortex compares it to everything it already knows, in short scans of any similar incident earlier, it then classifies the objects and dimensions ultimately and decides on the next action and that would be to raise your hand.
You catch the ball only when you have predicted its path, all these calculations and decision making within a tiny fraction of a second, with no conscious effort and it almost never fails. Now imagine trying to recreate this complex process? Creating a machine that sees exactly like we do, think like we do and act just like is, this next to impossible task is going to be a reality soon, yes it is hard as we have no idea how do we start in the very first place.
If just imagining this is so difficult and thinking about the whole background details about how did you catch the ball in the first place is so complex, so, how would the process of designing a system like this would be. This may sound too difficult for us but not for AI pioneer Marvin Minsky a 1996 graduate who simply explained this process as “connect a camera to a computer and have it describe what it sees.”
It was so easy for him to describe this as a simple process and 50 years later, we are still trying convincing the camera to describe what it sees. The actual research over this project began since 1950, it begun with three distinct lines. They were:
All these may sound like, is this even possible? from where do you start? how can you duplicate eye or visual cortex or nonetheless the remaining part of the brain! No wonder, research has been going on for past 50 years, but thanks to our researchers and scientists that make all our how’s and impossible turn to why not and let’s make it possible.
Out of all the, how’s and why’s let me tell you, the part where the eye is to be reinvented has been the most successful part till date. The past decades has witnessed creating sensors and image processors that match and even broaden the capabilities of a human eye. All these perfect lenses and semiconductor sub-pixels manufactured at the nanometer scales, and not to forget the precision and sensitivity of modern cameras is just extraordinary
Today, we have cameras that record more than thousands of pictures per second and analyze the distances with great precision, but even after all this high devotion of their results, overall these devices are just like the pinhole camera from the 19th-century, with a mere extension that the distribution of photons come from a single direction. Even if we take the best camera designed till date, it would never memorize a ball, forget about catching it.
Let’s put it this way, this work needs both hardware and software functionalities, just using camera is not enough. Hardware is completely limited to certain extents without proper software and the biggest problem is, designing such software. But if we look at the positive end, today’s camera technology has grown a lot, it ensures a rich and flexible platform to work on. So, we can expect some great results in future, fingers crossed.
It’s interesting to note that among all other organs, the brain is the most dedicated organ to vision than engaging in any other tasks and the specialization goes straight to the cells themselves. Billions of cells working together extract patterns from the noisy and unsynchronized signal from the retina.
Imagine the kind of work our brains and each cell in our body is engaged with, so next time someone says what do you do, explain them – what your brain and each and every cell in your body is busy working and they are disturbing their working right now. Coming back to the cells, all these sets of neurons excite each other in case of any contrast within the line at a certain angle, say for example a rapid motion in one direction.
In presence of higher level network, these patterns join to form some meta patterns like a circle moving upwards. Some other patterns are like the circle is white with red lines or the one growing in size. All the researches done in the past done in computer vision suggests that these networks are deeply complex and a new approach was tried. The top-down approach like a bike looks like/this/and moves like/this. We are still trying to come up with a definition regarding how our minds work or how to stimulate it.
It’s easy to design a system that could recognize different types of books, from different angles, in different situations, be it at rest or at motion, anything and it wouldn’t be able to recognize a chair. Also, it could never tell you what is a book actually, is it for reading or for any other purpose. So, the problem is not hardware or software but having an operating system.
To explain this in simple words, our mind has short term memory and long term memory, where it takes input from all other senses, the attention and cognition, where billions of lessons are learned from trillions of conversation throughout the world, written with methods that are too complex to be understood, a network of interconnected neurons are far more complicated than we can think of.
We can expect integrating the powerful but specific systems created today as the future of computer vision. With our best engineers, computer scientists, psychologists, neuro-scientists and philosophers working on understanding how our minds working, still we could barely simulate it. but there’s hope, this is not the end.
All these cameras with recognizing faces and smiles, timers, self-driving cars, automatic cookers, reading traffic signs and the list goes on. If the advancement in technology has made all these possible, we can always expect very soon we would definitely be able to understand how our brain functions, and finally get the logic behind operating machines just like our brain functions.
All this complex process might take longer time for final execution, but if we can transmit our imagination to reality upto this extent, we can imagine a bigger picture and get over all three tasks of difficult, very difficult and most difficult part.