Business Intelligence: The Dirty (and Costly) Little Secret of Bad Data

Poor data quality is insidious. The phrase "garbage in, garbage out" summarizes the problem, but the real-world extent of data corruption and the red-ink trail it leaves behind are not always obvious. Worse, the systems undermined by poor data are often those responsible for masking the problem. While a sophisticated Business Intelligence (BI) system can tell in detail what data says, it cannot tell if data is lying. Data quality is a necessary element to successful BI.

Ironically, the enterprise application integration (EAI) that can make BI so effective can actually speed the rate of corruption in databases. For example, Web data is notoriously less accurate and more complex and varied than call center data. Co-mingling of bad data across systems will increase data corruption in relatively accurate databases. Experts estimate that about two percent of accurate records can corrupt each month with no data quality system in place. That's the equivalent of 20,000 corrupt records monthly in a 1 million record database. With data pulled from the Web, the numbers can grow dramatically. According to some industry experts, including the META Group, some e-businesses estimate that only 10 percent of their records are accurate.

Trouble at Every Touch Point
Bad data enters systems in a variety of ways, including online customers submitting false information in Web forms to protect their privacy, and outdated customer information. Even systems that prevent strict duplication can be fooled by variations in customer information. Duplicate customer records fracture the customer view, spurring redundant mailings, erroneous analysis and—perhaps most significantly—higher fraud incidence. Only the ability to consistently identify customers between interactions and reconcile a single, unified customer view from varying records can prevent the increasingly high operational costs associated with a fractured view.

Once bad data enters a system it can be practically impossible to extract. For example, a date entered as 6/1/02 could indicate June or January, depending on whether the information was input in the U.S. or Europe. The source system would interpret the date correctly, but once the data is integrated into a data warehouse, the date might be subject to different business rules. The potential misinterpretation could dramatically impact analysis by skewing sales reports for all months.

Functional Criteria
For holistic customer knowledge based on diverse customer touch points and channels, the data quality process must address disparate, even incompatible data sources across the enterprise. Empowering BI with data that supports accurate business and analytical processes requires a data quality solution that can meet a variety of key functional criteria:

Accurate, drill-down data analysis and reporting
Real-time processing to ensure incoming data meets organizational data quality standards
Replicable processes to facilitate enterprise roll-outs without substantial services investments
User-customizable business rules to ensure ongoing correlation of the data quality to evolving data processes

These capabilities are critical, because even minor data quality problems can add up to major inefficiencies. Beyond data profiling, cleansing, re-engineering and relationship identification that comprises a robust data quality solution, there are three key functional criteria needed to manifest data quality enterprise-wide:

High-performance batch processing creates a single source of high quality data available throughout the enterprise by quickly processing millions of records to create a clean central data file. Real-time online data processing prevents new data from corrupting the reliable data source by cleansing data as it enters the company through various channels.
True international data processing requires both a content- and context-oriented approach to data comprehension. The ability to recognize an alphabet (i.e., character set or script) is obviously a prerequisite for data processing, making Unicode enablement a necessary component of international data quality. Unicode allows software to understand the world's major languages—from English to Chinese to Hebrew to Cyrillic and others. Unicode alone cannot guarantee data comprehension; contextual understanding is equally important. Only through context can a data quality application discern which of seven meanings for a particular Japanese character is correct.
Sophisticated BI systems rely on exceptionally diverse data. Business data processing, the ability to ensure the high quality of data beyond names and addresses, provides a more flexible and complete customer and business understanding. By providing more and more reliable information about customers and accounts, it promotes more granular segmentation and facilitates more revealing analytics.

The return on investment (ROI) from effective enterprise-wide data quality flows from every point in the enterprise that uses data. A data quality solution aligns customer views and customers, even as customers, the organization and its operational and markets change. By having a strong data foundation based on accurate records, businesses are empowered to communicate effectively, target accurately and enact fully.

While most companies would agree they need clean, accurate data to populate their business intelligence systems, it's not always easy to understand how to get there. For starters, companies with no systems in place for data quality should start researching to find a solid solution. It is important to ensure that any solution employed has the capability to support enterprise wide customer data quality. It's also critical to ensure universal access to data and application files so the information can easily be shared throughout the organization.

Another important consideration is whether the system is capable of processing data from multiple channel sources, which allows the company to interact with its customers in the ways they prefer. Also important is the ability to conduct real-time data quality management, allowing companies to identify, standardize, cleanse and enrich data as it arrives and before bad data can enter the corporation's databases. Other considerations include whether the system is flexible and customizable to meet a company's changing needs, and whether it is scalable to handle large and complex data files and process data from multiple parallel data sources. For companies with international operations, international data support is also crucial.

While it is virtually impossible to eliminate the variation that causes data discrepancies, prevent data-entry errors, or guarantee third-party data quality, it is possible to ensure that all data entering a company's data systems can contribute to the organization's total business value. A robust data quality solution provides the complete, accurate and multi-level customer view that is the basis for realistic decision-making and accurate customer assessments.

About Len Dubois
Len Dubois is Vice President of Marketing for the Trillium Software division of Harte-Hanks. He has been with Harte-Hanks for five years and has more than 10 years of experience selling and marketing high-tech solutions. Len is responsible for the development and execution of worldwide marketing initiatives for Trillium Software and has created the Trillium Software System brand that is recognized as the leading data quality solution for data warehousing, CRM and e-business. For more information visit www.trilliumsoft.com.

Len Dubois, Trillium Software