The topic of artificial intelligence has permanently made its way into the headlines, mainly thanks to ChatGPT. While AI based on natural language stirs up a lot of emotions, we can’t ignore the role of deep learning neural networks in image recognition. One of the pioneers in this field is Google and their Vision AI environment.
Over five years ago, a team of developers from Google Cloud started collaborating with one of the most influential newspapers, The New York Times. The vast archives of New York City are estimated to hold five to seven million photos taken over the past 100 years. Many of them hold untold stories or unpublished evidence.
In 2015, when a part of this archive ended up underwater due to a burst pipe, the management of NYT decided to take steps to protect this priceless asset. They chose the Google Cloud, including Cloud Storage, Cloud Pub/Sub, and Google Kubernetes Engine to create of a digital, well-secured photo database.
Although the goal was achieved and they could have stopped there, they decided to go further and leverage machine learning models to better understand the content of the archive. This is where computer vision technology, like Google Cloud Vision AI comes into play. It enables automatic identification of places and objects in the photographs, way beyond human capabilities to process objects and visual data.
What is Vision AI?
Vision AI is an environment that offers several key Google Cloud services combining machine learning capabilities (Vertex AI Vision) with image recognition and search (Vision API, AutoML Vision). Let’s focus on the latter two for now.
AutoML Vision
This solution allows you to train machine learning models to classify images according to your own defined labels. With AutoML Vision, you can train models based on labeled images and evaluate the accuracy of the labels assigned.
Furthermore, you can apply machine learning to label the remaining images based on existing human-created annotations. The ultimate goal is to create registries of trained models that can be accessed through the AutoML API interface.
Computer vision applications – Vision AI for wind turbine inspections
AutoML Vision is a tool that enables companies to save a significant amount of time and money. One example is AES, a renewable energy distributor operating in 15 countries. AES owns eight wind farms, each equipped with 50 to 300 turbines.
These devices require regular technical inspections. Detecting rotor blade damage early can prevent tragedies. Traditional inspections took about two weeks for each farm. AES decided to outsource the inspections to a specialised company that uses drones. It reduced the time to just two days.
During the audit, up to 30,000 photos are taken, which then need to be verified to check for any cracks on the surface of the wind turbines. Before using AutoML Vision models, the entire burden of verification fell on highly skilled engineers who needed four weeks to review the photos. By training a model, the tool available through Google Cloud allowed them to cut that time in half, leaving only 15,000 photos for approval.
How does computer vision work?
In AutoML Vision, regardless of the type of data processed, the workflow is similar and consists of six stages:
- Prepare data for training.
- Create a dataset.
- Train the model.
- Evaluate and iterate the model.
- Obtain predictions.
- Interpret the results.
AutoML Vision works with four types of data: images, videos, text, and tabular data. These are ready-made solutions. But if none of them meets the specific requirements of your project, you can create a custom model tailored for training custom models in Vertex AI. You can configure computing resources as needed for ML training, including the type and number of virtual machines, GPUs, or TPUs.
Cloud Vision API
Google Cloud Vision API provides access to advanced pre-trained machine learning models through REST API and RPC API interfaces. With the Vision API, you can label images and quickly classify them into millions of predefined categories. It offers a wide range of capabilities, including image classification, image labeling, object detection, object tracking, text recognition (both printed and handwritten text, OCR – optical character recognition), and the generation of image metadata.
The Cloud Vision API allows you to launch new applications for image and video analysis within minutes. It also enables training machine learning models to classify images using both AutoML Vision and custom models. It’s worth noting the easy integration of Cloud Vision API with BigQuery and Cloud Functions, expanding its scope of functionality.
Combining Vision API with AutoML Vision
To harness the power of machine learning and artificial intelligence even further, it is beneficial to combine the capabilities of AutoML Vision with the Vision API. The diagram below illustrates a commonly used model.
In the first step, the user uploads images, which are stored in Cloud Storage. Each time a new image is added, a notification is sent to the App Engine via Pub/Sub. In the next step, the App Engine calls the machine learning API, and there are two possible paths.
In the first path, the Vision API roughly recognizes objects and scenes in the photos. Label recognition comes in handy, allowing for later searchability of specific elements in the images.
In the second path, AutoML Vision recognizes custom labels based on pre-trained models.
In both cases, the results are sent back to the App Engine, and the user can now find the desired image based on keywords.
Benefits of Vision AI in Google Cloud
We already know that training models for image recognition reduces work time and helps you make significant cost savings. It also impacts the quality of business decisions. With access to complete data, we can make informed judgements instead of relying on representative examples. The models are continuously improved and can analyse not only specific objects but also emotions and facial expressions.
Thanks to Vision AI tools for extracting text (OCR) from images, we can create and search through rich datasets more quickly. This is valuable for both archiving and content organisation purposes.
How to use AutoML Vision and Vision API in practice?
To better leverage the potential of Google’s Vision AI tools, seek assistance from a Google Cloud partner. Just a few minutes with a certified Cloud Architect can help you decide whether a particular tool is suitable for your case. You will also learn about costs and ways to optimise them.