A well-built Master Data Management (MDM) solution can solve many of the headaches a common enterprise faces. Specifically, such a solution gives the enterprise the visibility to their data and the sources of where it is stored and used, while keeping it current and relevant.
In this article, I want to look at this still poorly understood field in data management, and discuss how we can consider it as a fundamental step towards compliance to the EU’s upcoming GDPR directive.
Here at Grakn Labs, we have worked with a number of companies in different industries, including financial services (banks, hedge funds, etc), to deliver these types of solutions, where we are asked, amongst other things, to help enterprises represent their master data into a Grakn knowledge base. In this article, I want to share some of the lessons we have learned along the way.
Master Data Management
Before delving into this workflow, it is worth defining exactly what we mean by MDM and GDPR.
MDM is a set of technology-enabled processes and techniques that enable an organisation to connect its most important data into one file, which then serves as the one point of reference for that organisation. It ensures that every single IT system, platform, and architecture across the organisation uses data in the same way. Five components in MDM can be identified:
Content defines all the different types of entities within an organisation (e.g. item names, customers and suppliers), while relationships define their groupings, hierarchies, and business rules — i.e. an organisation’s data standards.
Access relates to the policies an organisation has on who can access and edit which data sets. Change Management deals with the management and monitoring of changes in data, and how this is then communicated across the organisation.
Finally, Processing defines the rules around matching and identification of data. This is to understand, for example, when two similar looking customer entries are found in two different systems, if these actually relate to the same customer or not. It also includes the policies on how changes to the master data file are communicated to the systems of origin.
MDM is still in its infancy, although maturing rapidly. Many challenges, therefore, exist. Commonly, they include data complexity — just representing all master data’s entities and their relationships/hierarchies can be a daunting task. What is more, master data can sit in different and overlapping data silos. Identifying these, and agreeing to common domain values across systems can be difficult. Further, we see that often many executives do not know where to start to engage in a potential MDM solution.
With this brief understanding of MDM, I would like now to touch on GDPR and discuss how MDM can be a solution to this looming regulatory behemoth.
GDPR & MDM Solutions
The General Data Protection Regulation (GDPR) is an EU-wide regulation coming into force on 25 May 2018 designed to harmonise data privacy laws and protect EU citizens’ data privacy. It states that any company that manages or stores EU citizens’ personal data needs to know exactly what data of an individual is stored and used, and needs to obtain the individual’s explicit consent to do so.
The fines set by the European Commission for non-compliance are staggering — up to 4% of worldwide turnover or €20MM, whichever is higher. There is also a personal fine applicable to the data controller in charge of up to €500K. It is, therefore, surprising so few enterprises have begun to implement these new guidelines. Partly this reflects the far-reaching requirements of the directive, as executives find it difficult taking the first step to compliance.
As such, some have raised MDM as an answer to GDPR. This is because, as GDPR forces companies to have a reliable and correct overview of individual customers, MDM offers exactly this overview and can, therefore, serve as the technical basis to GDPR compliance.
For example, data from individuals can be arranged as data identifiers and extended attributes. Here, a key advantage of using MDM for GDPR is that any individual is guaranteed accuracy of their data. In MDM, this data is maintained centrally, which means that any particular sub-system in a large organisation will not be forgotten when data changes. Not only does this simplify data management, but it hugely increases the quality and currentness of data.
That being said, we must not forget that GDPR also relates to personal data that is non-master data. This includes data that has limited reuse, or is just specific to a single application and, therefore, falls outside the scope of MDM. Thus despite the usefulness of MDM for GDPR, an organisation should beware not to include non-master data in MDM just for GDPR’s sake.
How to get started
As mentioned above, a common theme among executives that we see is not knowing where to start with an MDM solution. After all, in a large enterprise, beginning to represent and model your master data can become very complex, very fast. An example:
‘When we talk about a customer, do we mean a first name? Last name? Or both?’
More complex issues include cross-departmental definitions of ‘High Net Worth Customers’ and modelling the access authorities to different data sets by various groups in different geographies. Another challenge is how do we ask intelligent questions to this master data while letting the computer figure out how to navigate the complexity in the underlying data structure.
This is where Grakn comes in. Grakn is a hyper-relational database built to model and represent exactly these types of complex networks. Rooted in Symbolic AI, it provides the knowledge base foundation for intelligent systems, including MDM.
The workflow we recommend can be as follows. To begin, you should task someone/a team from within your organisation (or hire an outside consultant) to chart your master data. They will, amongst other things, speak to your relevant data processors/controllers, and map the different permissions/authorities, file structures, data lineages, etc.
As this information is gathered, we help your team integrate the content, relationship, access, change management and processes (MDM’s five components) into one consolidated knowledge base. There will usually be several iterations before the entire knowledge base is completed, which may take anywhere between two to six months. This typically includes several in-house workshops to ensure the system is successfully implemented and adopted within your team.
Already early on into this process, while we integrate each database at every iteration, we begin to uncover questions that we are now able to answer, such as:
‘Find me all the owners of a dataset containing personal information of country X’
‘Find me all the operational datasets that are used in a specific application and jurisdiction’
‘Find me the owners of all operational datasets in a category for a specific jurisdiction/level’