Computer Vision – All You Need to Know | Augmented Reality
In a very short and abstract sense, Computer Vision is the process name for any computation that involves understanding and analyzing visual content. These imagery could be anything composed of pixels be it photos, videos, icons and many more. Even though such techniques have existed since the 1960s it has been only with the advancements in computing capabilities and data storage that it has progressed leaps and bounds. Let us check some of the core building blocks for this piece of technology:
Object Classification: The process of classifying new objects on a dataset of specific objects. Example: To classify chairs and benches within a classroom
Object IdentificationThe process of identifying or differentiating two objects belonging to the same class. Example: To identify blue chairs from red chairs.
Motion Analysis: To estimate how fast the object has moved with respect to the camera
Segmentation: To partition the image into more than one set of vision
Restoration: To remove or edit out any sort of noise within the images.
These are but a few examples
How Computer Vision Works
Even the best neuroscientist cannot explain how the brain works and this is because neuroscience has yet to figure out exactly how a billion neurons triggering or firing in our head in turn works our body. Even though we made some breakthrough in this aspect based on frog vision, it stands to be said that, we humans are a far cry from amphibians. Therefore, the same problem exists in computer vision as well. Just as we do not know the direct working behind how our brain classifies images based on what our eyes see we are yet to decide how accurate the approximate algorithms behind computer vision are.
The way machines interpret images are relatively very simple – they identify them simply as series of pixels. Each pixel denoted by a number, usually ranging from 0-255. The software gives each pixel a specific position based on its internal division via rows and columns. Now imagine when colors are introduced. Each pixel has to be also denoted by a specific number in the RGB spectrum. Therefore we are adding 3 more values to an already existing set of complex numbers. To put it in size scale, a color value takes 8 bits, so 3 colors per pixel would result in 24 bits and a normal 1024 x 768 image would have roughly 19M bits. This calculation is for one single image and we would need thousands of similar images to actually train a model with any sort of meaningful accuracy.
It is very evident that the sheer amount of computing power and storage required to perform this technology to fruition is vast and while big shot names all over the world are developing this technology just for breakthrough models in their respective sectors, few prodigious firms have already aimed to integrate computer vision into their own applications that can be made very effective in the day to day life of the common man. A young budding firm right here in our humble nation, iBoson Innovations has developed their own computer vision technology and integrated into their Augmented Reality application – UniteAR, which stands as one of the forefronts of augmented reality apps in India. UniteAR is an augmented reality application that can be made practical in all sectors by allowing the users to create their own AR experience and with the introduction of computer vision it can now identify and classify the objects in physical world, that too in real time, presenting clients with a plethora of options.