Efficient Data Preparation Will Not Improve Data Quality

Jan 09, 2017

Efficient Data Preparation Will Not Improve Data Quality (data governance)

Last week we wrote about the growth of embedded analytics in HR business platforms and the need to connect those tools to other data in the enterprise. Much of the data HR needs to evaluate the impact of people on the business resides outside of HR, and business planners need HR data to capitalize on opportunities as they arise.

The traditional IT-centric data delivery model produced well-governed single-source data streams, but the process was too cumbersome to respond quickly to new information sources and business needs. Decision-makers miss opportunities when data analysis cannot keep pace with the speed of business.

Companies need to deliver large volumes of data quickly to the point of need. Analytical tools embedded in the business applications people use in their daily work have made that possible, but analysts can still spend up to 80% of their time cleansing and preparing data before they use it.

Drag and drop data preparation tools for end users are now available where analysts can quickly create reusable data prep procedures using a simple graphical interface. Taking it one step further, they can then apply them to data integrations. What used to take weeks of developing SQL stored procedures or scripts now takes hours or minutes.

If you have a new data source you need to explore to seize an opportunity, it makes sense to use a data preparation tool to achieve those insights quickly. But cleansing data at the point of analysis is the wrong approach over the long term.

  • Self-service data preparation adapts quickly to changing business needs but creates multiple versions of the truth, reporting errors, and inconsistent information. In the past, IT could control data quality and consistency, but it cannot govern the free flow of today’s distributed data and analytics.
  • While data prep tools make the job easier, they are only a way to perform rework a little better. When we get bad data, the natural human tendency is to correct it and move on to the next problem. That approach perpetuates bad data, and rework becomes embedded in the way work gets done.

Correcting and cleansing data is useful when you need to deliver business intelligence quickly, but over the long term, the best way to improve data quality is to stop creating bad data. People create data problems, and people are the solution. You can reduce processing overhead and speed time to value by addressing data quality issues at their source.

A data governance program will start you on the right path, but the real solution begins with understanding that every person who creates or consumes data is part of both the problem and the solution. Every person who touches data or the technology that creates or transforms data needs to understand the data needs of the people who will consume it and individuals who consume data should have the means to communicate their requirements to data creators.

Efficient data preparation will reduce the time and cost of cleaning up bad data, but it will not improve data quality. Take a different approach. You will be amazed at what can happen when people learn how much what they do matters.


1.  Howson, Cindi. "Embrace Self-Service Data Preparation Tools for Agility, but Govern to Avoid Data Chaos." Gartner, Inc. May 24, 2016. .

2.  Redman, Thomas C. “Data Quality Should Be Everyone’s Job.” Harvard Business Review. May 20, 2016. 

Pixentia is a full-service technology company dedicated to helping clients solve business problems, improve the capability of their people, and achieve better results.


Previously:  Next up: 


News Letter Sign up

Get in touch with us
phone_footer.png  +1 903-306-2430,
              +1 855-978-6816