The war for best AI will be won with Visual Data
Major technology companies and new startups are at war over having the most valuable artificial intelligence and at the core of this war is having unique high quality visual data.
This battle will be won by owning the connected camera. The majority of the data our brains analyze is visual, and therefore the majority of the data needed for artificial intelligence to have human (or better than human) skills, will rely on the ability for computers to translate high quality visual data.
One of the business sectors that will be revolutionized by artificial intelligence is e-commerce. The Amazon’s Echo Look is a smart stake in the ground for Amazon. Adding a camera to their Echo validates a prediction of mine from last year called the Internet of Eyes which enables all inanimate objects to see. Inanimate objects with cameras enable companies to own the first step in gathering the data for computer vision and artificial intelligence algorithms to analyze.
To date, Amazon has mostly relied on their customers searching on their website for products and clothes to buy. The Look is their first step to empowering their customers to buy products via Selfies instead, and it provides the company with trends of visual data so their artificial intelligence algorithms can learn our favorite clothes, styles and products.
Their core goal is to capture unique and proprietary visual data of their customers so their computers can learn as much as possible about us through the Selfies we capture via the Echo Look. This helps them make our shopping experience even more frictionless.
Fei-Fei Li, Director of the Stanford University Artificial Intelligence Lab and Chief Scientist AI/ML at Google Cloud says "More than 500 million years ago, vision became the primary driving force of evolution’s ‘big bang’, the Cambrian Explosion, which resulted in explosive speciation of the animal kingdom. 500 million years later, AI technology is at the verge of changing the landscape of how humans live, work, communicate and shape our environment".
"As nature discovered early on, vision is one of the most powerful secret weapons of an intelligent animal to navigate, survive, interact and change the complex world it lives in. The same is true for intelligence systems. More than 80% of the web is data in pixel format (photos, videos, etc.), there are more smartphones with cameras than the number of people on earth, and every device, every machine and every inch of our space is going to be powered by smart sensors", says Fei-Fei. "The only path to build intelligent machines is to enable it with powerful visual intelligence, just like what animals did in evolution. While many are searching for the ‘killer app’ of vision, I’d say, vision is the ‘killer app’ of AI and computing".
Society is driven by it’s narcissistic desire to capture Selfies to visually share what they are wearing, eating, where they are on vacation and who they are with. The main reason people make pictures is to visually communicate.
E-commerce is also driven online by photos and videos today and in the future it will be driven by inanimate objects with cameras leveraging computer vision and artificial intelligence. There will be cameras in our refrigerator, cameras all over our cars, security cameras, and visual sensors managing the watering of our garden to the temperature in our homes. These cameras will analyze many different types of visual data from photographic, thermal, X-ray, ultrasound, and white light to deliver high quality signals unlike anything we’ve had previously.
IBM Watson has been working with North Face, Macy’s, Sears and other retailers to empower their shopping experiences with artificial intelligence.
I hate shopping. I am part of that segment of men who haven’t changed their fashion styles in years. It drives me crazy when I finally go shopping for the same favorite red pants I have worn for years to buy them again – of course – they don’t make that version of my favorite pants anymore.
I have always wanted a camera in my house to photograph me every day and learn my style over time. Eventually the Amazon Echo Look, Google’s Assistant and other objects with cameras in our home will deliver a solution for us.
Leveraging computer vision and artificial intelligence a camera will hopefully help proactively shop for me without needing me to physically search online or in stores. Ideally, Amazon Alexa would send me an email saying "looks like your favorite red pants are wearing out because you wear them all of the time. We noticed a hole on your back pocket and thought you would like to know that we have two of those pants in the same color and size in stock. Would you like me to order you one or two of them?" Yes!
One click checkout will have them sent to my door and this will solve my dislike for shopping. This would make me a very happy customer and replaces the activity of searching.
I also hate shopping for food and bathroom supplies. It will be great when all of the cameras in my home speak together so that we never run out of what we need and most importantly we don’t have to actually go physically shopping online or in a store again.
Scaling the storage and processing infrastructure to handle exponentially more visual data from cameras around the house is just one of the challenges but fortunately Amazon has one of the largest cloud computing networks. The more difficult challenge and war is leveraging the right artificial intelligence algorithms to decipher and understand the valuable signals in photos, videos and other visual data to make their AI the smartest.
Computer vision expert and Professor Computer Vision at Cornell Tech, Serge Belongie says "A majority of the human brain involves processing of visual data, for purposes such as scene interpretation and spatial navigation. Visual data is central to the way we learn about the world, and it follows that the pursuit of intelligent machines will require substantial advances in our ability to process and interpret visual data".
Google will follow soon with a camera on their Assistant which will probably communicate with Nest and Dropcam. A point of sale will be your Selfies and other visual data captured by the Internet of Eyes.
Searching online to buy products will be replaced with artificial intelligence who learn from your visual data. I am excited to see all of the new startups building ecommerce businesses that leverage unique visual data and artificial intelligence. It will be exciting to watch the battle between big companies fighting to power cameras and interpret our visual data in the hopes to make our lives easier, more fun and increase revenue for them.