No one has perfect data. We would like to think that someday we can achieve it, but the fact is there are more ways for data to go wrong than we can anticipate or correct.
That doesn’t mean you can’t manage your data well enough to make it useful. You may never achieve perfection, but with a disciplined approach to data management, you can produce quality data that will support good business decisions, make operations more efficient and effective, and lower the cost of maintaining useful data streams.
Before we discuss how to get control of data quality, it may help to understand why you have bad data.
Why We Produce Bad Data
Almost all of the data we collect and maintain results from data entry controlled by humans. It’s easy to understand that a human typing data into an online form can make mistakes or, with website visitors, intentionally mislead.
Machines also produce data, but in every case, what is produced depends on a decision made or an algorithm designed by people. As we used to tell our colleagues in HR, “Computers are wonderful. They enable us to make mistakes with astounding speed and replicate them exponentially.”
Another way we produce bad data is by failing to manage it. Without common understanding and a set of guidelines, people in different parts of an organization or an adjacent office or shop will handle data differently.
Why Good Data Goes Bad
The operations we perform on data can cause data to degrade. Arkady Maydanchik, in his 2007 book Data Quality Assessment, described many of the ways we introduce errors into our data.
- Conversions and consolidations, batch feeds, and real-time streaming data add errors. They happen when we assume the source data is correct.
- Upgrades and process automation cause data decay. The usefulness of data can also decay when expertise about it departs the organization, or when new uses for data arise.
- Data processing, cleansing, and purging can also introduce errors. A cleansing or purging error can make data disappear permanently.
No one is immune from data errors. What matters is how well you manage them. Before you can manage your errors, you need to know how big the problem is.
How Much Does It Cost?
Some companies, when they realize they need to clean up their data, will begin by acquiring software tools, only to see it sit idle. Others complain that they can’t get funding for the tools. Profiling and cleansing tools are useful, but they aren’t magic. Before you can use them, you need a strategy and a plan. And before you can start a discussion about strategy and planning, you need to know the size and scope of the problem.
The fastest way to your CFO’s heart is cost reduction, and business impact gets your CEO’s attention. You can build the foundation for your argument with a cost analysis to estimate how much bad data costs.
Data guru Thomas C. Redman, writing last week in Harvard Business Review, offers a four-step method any manager can use to calculate the cost of errors in any data source. His method takes the form of sampling transactions and estimating the cost of mistakes.
The approach is simple.
- Gather the last 100 records of the process. Make a short list of critical data elements and lay them out in a spreadsheet or paper.
- Bring a small group of people together for a two-hour meeting.
- Have your team mark the critical errors.
- Add a “Perfect Record?” column to your worksheet and summarize the result.
The result will be an error rate you can use to calculate the cost by applying it to the expense of all transactions for a period. Use the rule of 10, which assumes it takes ten times as much effort to correct bad work as it does to produce perfect work.
You now have a cost estimate you can work with, and one you can use in making a case for a data management effort.
A Catalyst for Change
We also expect that the cost estimates will open eyes around your organization about how expensive data errors can be. It can be the start of building a culture with a new attitude toward data management.
Watch for our article on creating the strategy and governance that form the foundation for data management in your organization.
1. Maydanchik, Arkady. Data Quality Assessment, pp. 5-22. Technics Publications. Apr 1, 2007.
Pixentia is a full-service technology company dedicated to helping clients solve business problems, improve the capability of their people, and achieve better results.