So far in this series we have looked at text-based content, like Microsoft Word documents or PDF files, and how we go about enriching these to add additional business context and value to them. But what about images? As a business - especially if you’re an asset management company, or perhaps in the engineering space - you probably have mountains of them! Do you ever really use them though? In fact, can you even use them? Could you find what you’re looking for even if you wanted to?

The challenge with classifying images

If you file images anything like I file my holiday snaps, you may have some folders that are named after the holiday or day trip with the kids, but the photos themselves will probably have meaningless names that are just comprised of numbers and some random letters.

It’s the same situation that we observe for our customers. Often, these images will have been related to a process at some point in time. However, like many processes, the structured data and unstructured data ends up in two different places, with little metadata to connect them. This is made even worse with images, as there is no text content that we can use to link this content to its structured counterpart.

So, what if we could analyse these images? What if we could extract key features of the image and store it as metadata? What if we could apply Optical Character Recognition (OCR) to regions of text in an image, for example the number plate on a car or a road sign? What if we could even determine where a picture was taken?

Extracting metadata from images

Well, through our research we discovered that both Google Cloud Platform and Microsoft Cognitive Services provide cloud-based APIs that can be used to analyse images and extract this metadata for us.

Google provides a whole bunch of features, including: extracting objects or features of an image (for example, a wheel, a door, etc), returning images which are similar, applying OCR with automatic language identification, detecting faces, and even identifying popular places and landmarks. In fact, we have been using this API on one of our customer's images. The image shows some roadworks, and the Google Cloud Platform Vision API was able to tell us the name of the road in Cambridge where the picture was taken.

It’s a similar story with Microsoft. Their API offers some of the same features as Google, such as object detection. However, they don’t provide quite as much in the way of location and landmark detection. Having said that, Microsoft do provide a rather neat machine-generated text-based description of the image in question - something which Google Cloud Platform does not have.

Enriching images using metadata

Naturally, upon discovering these APIs from both Google and Microsoft, we set about building a series of enrichment steps for our product, InsightMaker, to allow us to start adding context to images.

These enrichment steps manifest themselves as standard InsightMaker enrichment steps (you can find out more about these in my previous posts, linked at the end of this blog). They populate both the ‘text content’ for the image - which is what we use for full-text index - and then also add the labels, landmarks and other fields as metadata against the image. From InsightMaker's user interface, you can then easily search all of your images using full-text search and using advanced filters on the metadata and entities.

To wrap up this little visionary blog (you see what I did there!), if you, like many organisations, have images in your business which you know you can't use effectively to support your processes, then you may want to consider how you make the, discoverable. This is all possible with technologies like Microsoft and Google, along with InsightMaker. Bear in mind that you don’t need to boil the ocean and process them all, you might just choose high-value areas, such as your asset pictures, and leave those staff Christmas party snaps…probably for the best.

Cheers and speak soon, Paul

If you missed my previous blogs in the 12 Days of Information Enrichment series, you can catch up here.

Day 1 - What is enrichment? Creating wealth from information

Day 2 - Starting at the beginning with Text Extraction

Day 3 - Structuring the unstructured with Business Entity Extraction

Day 4 - Solving the GDPR, PII and PCI problem

Day 5 - Sustainable Document Classification