Manager IT
Pakistan Telecom Mobile Limited (Ufone)
Oct 13 - 19, 2008

Though the word ìData qualityî is quite a simple concept, but its meaning is misconceived. Unlike the quality of an object, data quality can not be touched or felt. Data quality is defined as the measure of suitability of data for its intended purpose. However, it is common in industry to think of Data Quality as ìQuality of Dataî which means that degree or grade of excellence of data.

Data is factual information, especially information organized for analysis or used to reason or make decision. This data is collected as a result of an operation(s) of a process. Letís look at the process of building a clay vase. The craftsman has a process through which clay is molded into a vase. The Quality of Vase is about the measure of how the good this particular vase is. However, the Vase Quality is a measurement of the process through which the vase is made. The excellent production of vases reflects the craftsmanship or a high Vase Quality.

Now we need to identify what is meant by ìGoodî or ìBadî data. The answer lies in the definition of ìData Qualityî ñ Measure of Suitability of data for the intended purpose. Thus, a good data contribute to better outcome of the purpose. This property is unique but rarely recognized.

When looking at companies, this is not an individual problem, the core lies with the way the business process is build, implemented and executed. Bad data leads to identification of process compliance failure. Good data, indicates a well implemented and execution of the process of a business. Hence, a good or bad data has business value and thus it is important that data is traceable to identify the cause of production of such data.

If the bad data production cannot be traced to source then the risk of bad decisions is elevated to a more significant and strategic level than in narrower. Thus, the business units and Information system organization needs to, Identify significant data quality issues from data production system and also determine the business rules and processes, and implement the necessary controls and process changes to ensure that the data is compliant with the business process.

It is normally expected that when required the data can be cleaned. Yes, the data can be fixed this way but this is a temporary solution to the problem. The problem could be fixed for certain amount of data, but the same problems can arise once the new data is generated and the same data fixing activity would have to be performed again when required. A better solution to the problem is to fix the processes generating the bad data rather than fixing the data itself. The computer technologies only aid these processes by processing the data. The business that owns the data may apply certain rules through business processes and it is possible that these processes might create quality issues in the data..

The presence of the extensive amount of unchecked data in the legacy system is either due to data entry or the mis-configuration of operational systems producing data. It is quite hard to go and look for such data by just browsing each record. Thus, a proper procedure for the data quality assurance should be followed, which generally begins with assessments of whether the data in question is suitable for its intended purpose. Depending on that purpose, the user may want to perform assessments of accuracy or correctness, consistency, completeness, uniqueness and integrity. These could be achieved by well establish analysis techniques.

Let us have a look at some common problems for data quality and possible techniques for their solutions. There might be negative values for a record for some attribute, which can be invalid, if it is associated with age. There can be cases where the data values for city codes or country codes are out side the determined domain. Such anomalies can be identified by ìValues Analysisî technique. Similarly, there can be occasions where the process or data is generating referential integrity violations, or an outlier value (data that is far different from all the other data values) for a given attribute. These anomalies can be revealed using ìFrequency Analysisî techniques. There are other techniques available to perform other common data quality analysis reports, which include Overlap analysis, Statistical analysis, Derive analysis etc. These techniques can be implemented by developing in-house data quality processes; however, there are certain data mining as well as data quality tools available which performs these analyses for finding potential data quality issues.

Data quality helps in improving the business value by patching the leeks in the business process. The goal of becoming an organization that leverages its data resources into a source of enterprise knowledge, can be achieved, but not without a firm base of high quality data. Understanding the relationships between Business Processes and data items is fundamental to understanding the root causes of inconsistent and non-compliance business processes. All business processes need to be designed thoroughly and carefully with all the relevant data quality measures to ensure data quality. The benefits from realizing the Data Quality includes improving customer focus (through retention, customer value, and customer acquisition), cost reduction (by lowering the intensity of re-work and delivery / processing cycle time), decision making processes improvements (by making better and faster decisions on more consistent and accurate data), effective and efficient business processes, and most important the trusted and authentic data for all other business processes.