A typical office desk. Image by User: Mattes / Public Domain
The majority of organizations cite poor data quality as a key reason why they do not trust their data. In essence, this is in keeping with the cliché: “garbage in, garbage out”. The primary concern is that data-driven organizations are dependent upon modern technologies and artificial intelligence to obtain the most out of their data assets. This task is hampered when they struggle with data quality issues on a repeated basis.
Kirk Haslbeck, VP of Data Quality at Collibra, has explained to Digital Journal about the seven most common data quality issues. Based on Haslbeck’s comments, we can distil these as:
Duplicate data: There is bound to be duplication and overlap when you face an onslaught of data from all directions.
Inaccurate data: Inaccuracies of data can be traced back to several factors, including human errors, data drift, and data decay.
Ambiguous data: Column headings can be misleading, formatting can have issues, and spelling errors can go undetected, introducing multiple flaws in reporting and analytics.
Hidden data: Most organizations use only a part of their data, while the remaining may be lost in data silos or dumped in data graveyards. Hidden data means missing out on discovering opportunities to improve services, design innovative products, and optimize processes.
Inconsistent data: When you’re working with multiple data sources, it’s likely to have mismatches in the same information across sources. The discrepancies may be in formats, units, or spellings. Inconsistent data can also get introduced during migration or company mergers. If not reconciled constantly, inconsistencies in data tend to build up and destroy the value of data.
Too much data: While we focus on data-driven analytics and its benefits, too much data does not seem to be a data quality issue. But it is. When you are looking for data relevant to your analytical projects, it’s possible to get lost in too much data.
Data Downtime: There can be short durations when data is not reliable or not ready (especially during events like mergers and acquisitions, reorganizations, infrastructure upgrades and migrations). This data downtime can affect the companies to a great extent, including customer complaints and poor analytical results.
Haslbeck adds that the real power in data lies in uniting every person, team, and system throughout an organization, and uses predictive data quality so that organizations can improve trust in their data.