Explore Discover new data and digital opportunities, prove their value, and unlock your business potential.

Map out technology-driven strategies to forge your data, AI, and digital-first future vision.

Transform Build strong data and digital foundations, strengthened by ML, AI, data science, and apps, to achieve your goals.
Enable Establish self-service analytics, citizen data science, and low-code/no-code platforms to support your business intelligence.
Discover our services

From deep dives to quick tips, become an industry leader with Aiimi.


Webinars, explainers, and chats with industry leaders, all on-demand.


All of our expert guides in one place. No form fills - just download and go.

CIO+ Hub

Practical advice, CIO success stories, and expert insights for today’s information leaders.

Customer Stories

Discover how customers across a range of industries have realised value and growth with Aiimi.

Data Risk Assessment

Our free Data Risk Assessment helps you quickly identify your most urgent data risk areas and biggest opportunities for automated data governance.


Accelerate your success by partnering with Aiimi. Our partner portal is your complete toolkit for driving success.

Our Work

Enrichment: Solving the GDPR, PII and PCI Problem.

by Paul Maker

In my last blog in my 12 Days of Information Enrichment series, I spoke about how we extract business entities from information and then use these entities to better structure, navigate and use that information. Quite a simple concept, yet something so fundamental to information enrichment.

So, how does this help your organisation comply with GDPR? Well, about 2 years ago, as the hype around GDPR was reaching its peak, we decided that we could use our entity extraction capabilities to help find Personally Identifiable Information (PII) and Payment Card Information (PCI). We could then couple this with some GDPR-specific apps so that users could quickly and easily understand their risk, get their house in order and be compliant with the new legislation.

Developing a PII & PCI finder solution

As with all new solutions, there was quite a bit of learning to come as we started to solve this problem for our customers. The main challenge was the dreaded false positive. What is this? Essentially, a false positive in this case would be determining that a piece of data is, let's say, a national insurance number when in fact it isn’t. An obvious problem which makes it very hard for the customer to see the wood for the trees and even harder to rely on their solution to help them achieve GDPR compliance.

After thinking through this issue, we decided that we needed to research some approaches that we could use to contextually reinforce that the pieces of information we were finding were genuine examples of PII or PCI.

Identifying PII with proximity indicators

We started out by looking at using proximity indicators. These work by checking the distance between the extracted entity, say a national insurance number, and a word that would reinforce the determination, for example, 'NI Number'. We extended this to include synonyms for each indicator word, increasing the reliability of the process. In addition, by using the distance between the indicator word and the entity in question, we were able to compute a confidence score to give as much transparency as possible to the user.

Using context to identify PII

Next, we looked at how we could infer meaning based on the presence of multiple pieces of information. A great example of this is payment card information, or PCI as it’s known. A 3-digit number means little in isolation, but if it occurs in a document along with a 16-digit number, a postcode, a name and a date…. Suddenly this looks like credit card details. We built algorithms to intelligently detect multiple items and then classify the document based on what was present, using a risk score to help users focus on the right things first.

How visible is your information?

Telling this story would not be complete without including another piece of capability that we developed – the information visibility metric.

This metric informs a user about how visible a piece of data or a document is within the enterprise. This is possible because, as part of the enrichment process, we store all the access permissions for each piece of information that we index. We can take these visibility details and use this to boost documents that can be seen by lots of people to the top of the queue. The rationale for this is that you are more liable to get yourself into GDPR-related hot water if you are storing sensitive data in a location that is wide open to the whole business.

Something we quickly learnt about PII and PCI is that customers have far more of it present than they realise. Undertaking this process of discovery usually reveals sensitive personal data sitting in places like network drives, where it remains for years and is usually undiscoverable. And, because customers have so much information like this, it becomes an impossible task to remedy manually. Being able to attach a confidence score to the items found, along with prioritising those items that are accessible to numerous users across business, offers organisations a pragmatic and progressive way to address their personal data problem for GDPR.

Cheers, and see you soon, Paul…

If you missed my blogs in the 12 Days of Information Enrichment series, you can catch up here.

Aiimi Insights, delivered to you.

Discover the latest data and AI insights, opinions, and news from our experts. Subscribe now to get Aiimi Insights delivered direct to your inbox each month.

Aiimi may contact you with other communications if we believe that it is legitimate to do so. You may unsubscribe from these communications at any time. For information about  our commitment to protecting your information, please review our Privacy Policy.

Enjoyed this insight? Share the post with your network.