Image Enrichment: Giving your business vision.
So far in this series we have looked at text-based content, like Microsoft Word documents or PDF files, and how we go about enriching these to add additional business context and value to them. But what about images? As a business - especially if you’re an asset management company, or perhaps in the engineering space - you probably have mountains of them! Do you ever really use them though? In fact, can you even use them? Could you find what you’re looking for even if you wanted to?
The challenge with classifying images
If you file images anything like I file my holiday snaps, you may have some folders that are named after the holiday or day trip with the kids, but the photos themselves will probably have meaningless names that are just comprised of numbers and some random letters.
It’s the same situation that we observe for our customers. Often, these images will have been related to a process at some point in time. However, like many processes, the structured data and unstructured data ends up in two different places, with little metadata to connect them. This is made even worse with images, as there is no text content that we can use to link this content to its structured counterpart.
So, what if we could analyse these images? What if we could extract key features of the image and store it as metadata? What if we could apply Optical Character Recognition (OCR) to regions of text in an image, for example the number plate on a car or a road sign? What if we could even determine where a picture was taken?
Extracting metadata from images
Well, through our research we discovered that both Google Cloud Platform and Microsoft Cognitive Services provide cloud-based APIs that can be used to analyse images and extract this metadata for us.
Google provides a whole bunch of features, including: extracting objects or features of an image (for example, a wheel, a door, etc), returning images which are similar, applying OCR with automatic language identification, detecting faces, and even identifying popular places and landmarks. In fact, we have been using this API on one of our customer's images. The image shows some roadworks, and the Google Cloud Platform Vision API was able to tell us the name of the road in Cambridge where the picture was taken.
It’s a similar story with Microsoft. Their API offers some of the same features as Google, such as object detection. However, they don’t provide quite as much in the way of location and landmark detection. Having said that, Microsoft do provide a rather neat machine-generated text-based description of the image in question - something which Google Cloud Platform does not have.
Enriching images using metadata
Naturally, upon discovering these APIs from both Google and Microsoft, we set about building a series of enrichment steps for our product, InsightMaker, to allow us to start adding context to images.
These enrichment steps manifest themselves as standard InsightMaker enrichment steps (you can find out more about these in my previous posts, linked at the end of this blog). They populate both the ‘text content’ for the image - which is what we use for full-text index - and then also add the labels, landmarks and other fields as metadata against the image. From InsightMaker's user interface, you can then easily search all of your images using full-text search and using advanced filters on the metadata and entities.
To wrap up this little visionary blog (you see what I did there!), if you, like many organisations, have images in your business which you know you can't use effectively to support your processes, then you may want to consider how you make the, discoverable. This is all possible with technologies like Microsoft and Google, along with InsightMaker. Bear in mind that you don’t need to boil the ocean and process them all, you might just choose high-value areas, such as your asset pictures, and leave those staff Christmas party snaps…probably for the best.
Cheers and speak soon, Paul
If you missed my blogs in the 12 Days of Information Enrichment series, you can catch up here.
- Day 1 - What is enrichment? Creating wealth from information
- Day 2 - Starting at the beginning with Text Extraction
- Day 3 - Structuring the unstructured with Business Entity Extraction
- Day 4 - Solving the GDPR, PII and PCI problem
- Day 5 - Sustainable Document Classification
- Day 6 - Image Enrichment: Giving your business vision
- Day 7 - Advanced Entity Extraction with Natural Language Processing
- Day 8 - Understanding customers with Speech to Text translation
- Day 9 - Accelerating classification with Document Clustering
- Day 10 - Giving users what they need, when they need it
- Day 11 - Understanding documents with Dynamic Topics
- Day 12 - The power of Enrichment
Stay in the know with updates, articles, and events from Aiimi.
Discover more from Aiimi - we’ll keep you updated with our latest thought leadership, product news, and research reports, direct to your inbox.
Aiimi may contact you with other communications if we believe that it is legitimate to do so. You may unsubscribe from these communications at any time. For information about our commitment to protecting your information, please review our Privacy Policy.
Enjoyed this insight? Share the post with your network.
How to apply AI for Business Intelligence (an ideal first GenAI use case)
Llama 2: Our thoughts on the ground-breaking new large language model