Metadata: How to remain competitive in a data-driven world.
Our recent blog What is metadata? describes the different ways that labels can be used to describe both structured and unstructured information, and the purposes that this information can serve.
In my role as Head of Solutions Engineering for Aiimi, and as a former consultant specialising in electronic document and records management with quite a breadth of experience in implementing solutions to manage metadata, I am in a privileged position to see this world evolving at breakneck pace.
Historically, metadata was largely the preserve of organisations who either monetised documents (law and professional services firms) or relied on metadata to prove the provenance of information and protect people and assets from harm. The Piper Alpha disaster in 1988 and the collapse of Arthur Andersen in 2002 are the ultimate examples of poor metadata and associated document management practices – both led to information being mishandled or destroyed because of inadequate controls. These types of events are the reason why financial services firms, engineering, and asset-heavy industries now use metadata to carefully manage their information.
But why isn’t metadata more widely used? Given that it can help mission-critical services perform better and more safely, you’d imagine that all companies could usefully take advantage of it. The historical problem is that it takes time, effort, and sometimes expertise that's in short supply to accurately label your information. But technology capabilities have evolved rapidly over the last few years and are now huge enablers for this world of metadata labelling. They will change the equation for all kinds of businesses, to make it worthwhile to exploit this information, to improve competitive edge and knowledge worker experience, and to comply with regulations and policy.
Metadata meets Machine Learning
Five years ago, we were involved in a project to automate the extraction of metadata from 200,000 engineering documents. By developing tailored ontologies and taxonomies and then deploying tools to quickly integrate classification technology, the content of the documents, and the business logic, we were able to probabilistically label the documents with metadata; the majority with a certainty within acceptable thresholds, and a minority requiring human checking. Python and Elasticsearch were two of the key technologies used, and both are underpinning technologies in our Aiimi Insight Engine. New thought processes were important as well. Aside from labelling information, we wanted to be able to offer knowledge workers a view of the document with the relevant content already highlighted and in view. Linking metadata to the viewing and review experience, along with the automation of metadata extraction, saved our customer 56 person years of effort in achieving their goal - a massive return on investment.
These technologies are commoditised now and have more recently been joined by Machine Learning capabilities for identifying the names of people, places, and organisations, transcribing video and audio, image recognition, identifying key phrases and topics, and generating longer, more complex metadata labels, such as automated summaries. These techniques are beginning to remove the need to pre-build taxonomies, saving effort and time. They are revealing the unknown, automatically picking out metadata that organisations may previously have been totally unaware of. The hidden passport or credit card number supplied by a customer to help you track down their account. The witness who wishes to remain anonymous. The product or service sentiment on social media. The root cause of a repeated failure. The actors who appeared in your advertising videos. The anomaly data behind your best-performing new service.
Exploiting generative AI with metadata
Looking to the near future, metadata and labelling are going to be key in getting the most out of the explosion of generative ‘GPT’ AI technologies and large language models. These will allow knowledge workers to produce output at breathtaking speed, but the key to getting valuable output will be the use of labels to pre-filter the source information used to generate output, to review and cite information sources, and to get accurate and defensible output.
Generative techniques for producing metadata from an organisation’s information will drive new paradigms for legacy data. It allows older platforms and document storage to be decommissioned, at the same time increasing the value of that stored information by making it accessible and understandable without human intervention. Cost reduction, risk reduction, and the identification of new opportunities from your legacy data go hand in hand here.
Customer services and interaction will benefit from metadata, as inbound communication links seamlessly to accounts, orders, complaints and service delivery information by common metadata fields, which in turn will allow generative AI to use that linked information to generate suggested next actions, or to generate outbound correspondence with the right reference material.
What once was seen as a necessary chore for a small selection of industries is rapidly becoming automated, commoditised, and essential for all organisations that want to remain competitive in a data-driven world.
Aiimi Insights, delivered to you.
Discover the latest data and AI insights, opinions, and news from our experts. Subscribe now to get Aiimi Insights delivered direct to your inbox each month.
Enjoyed this insight? Share the post with your network.
Generative AI: should we really be talking about applying the brakes?
Why your remote compliance team needs collaborative DSAR technology to succeed
What is metadata, and how does it help your business?
How does a data mesh drive enterprise-wide value for knowledge workers?
Good data is one of your greatest assets - How to measure and improve data quality to drive your business forward
Read more on Aiimi Blog
Llama 2: our thoughts on the ground-breaking new large language model