A data warehouse is a central repository for all, or significant parts of the data that an enterprise of various computer systems collection


Dec 13 - 19, 2004





Data warehousing holds great promise. Technology planners rightly assert that it supports better customer relationships and decision making. But such goals can be achieved only where the application is appropriate and is executed skillfully.

Companies are feeling the pressure to cut expenses and increase profits. In today's business environment, the advantage goes to those companies that can respond to rapidly changing demands. Facilitating the timely access of corporate information to the employees will allow companies the competitive edge.

Today, the most successful way to provide valuable information to users is by modeling operational data into a format that is easy for end users to access and analyze. This approach is referred to as "Data Warehousing".

A data warehouse is a central repository for all, or significant parts of the data that an enterprise of various computer systems collection. Typically, a data warehouse is housed on an enterprise's server, but it can reside with a storage service provider also. Data from various online transaction processing applications and other sources are selectively extracted and organized on the data warehouse database.


The major elements of data warehousing are data acquisition, data modeling and schema, metadata and data management.

Data acquisition involves identifying, capturing, and transforming data in operational systems so the data can be loaded into a data warehouse. This transformation uses application tools that allow data warehouse administrators to apply business rules for integrating data from source systems.

Data modeling is the analysis of data objects used in a business or other context (e.g. a product, a product price, or a product sale) and expressions of the relationships between each of these objects.

A schema, in computer programming, is the organization or structure for a database. The activity of data modeling leads to a schema.

Metadata is a definition or description of data, commonly known as "data about data." It simplifies the process of managing all the processes involved in creating and sustaining a data warehouse.

Data management includes the access and storage mechanisms that support the data warehouse. This is usually a relational, multidimensional, or other specialized database designed to facilitate complex queries.

While data warehousing is a simple, logical concept, technically it is very complex and challenging. A data warehouse cannot be purchased as a completed application; it must be built and assembled differently to meet each organization's unique requirements.




Implementing a data warehousing solution can provide organizations a competitive advantage. It enables them to leverage information residing within their databases to respond quickly to changes in the marketplace and make better business decisions.

ENHANCED CUSTOMER SERVICE: The competitive nature of current business environments forces enhancements in providing improved customer service near the top of every corporation's list of goals.

Improvements in this area translate directly into additional business and cost savings.

BETTER BUSINESS INTELLIGENCE: Information related to organization's financial conditions, customers' vital data and product information etc. already exists within the organization. A data warehouse gives this ability to pull together information from multiple operational systems and combine that with information that may be acquired from outside sources.

SUPPORT BUSINESS PROCESS RE-ENGINEERING: It provides business users unlimited analysis of their business information and often provides insights into the work processes. This will yield breakthrough ideas for the re-engineering of those business processes.

A RETURN ON INVESTMENT: In the beginning, like any other technologies, the business benefits that a data warehouse can bring are intangible; resulting unable to convince executive to get financial approval for data warehousing projects.

A study by International Data Corporation revealed that data warehousing generated an average three-year return on investment (ROI) of 401%. Over 90% of the organizations included in the analysis reported return on investment returns over 40%.


Data warehousing is a technology that is being embraced by all major industries which include DFIs, insurance, manufacturing, retail, healthcare, telecommunications etc. Companies are setting up data warehouses for finance, budgeting, sales & marketing, customer information systems, product/inventory analysis, human resources etc. The trend is moving away from centralized mainframe-oriented data warehouses toward departmental server-based solutions. In some applications single subject-oriented data warehouses (data marts) are being implemented that can provide immediate value to targeted business users.

Trends also indicate the acceptance of data warehouse/data mart solutions that are integrated with application package solutions. A number of strategic relationships are in place between vendors to provide partial or complete data warehouse environments for popular manufacturing, human resources and financial applications.


Steps needed to be identified and integrated to create a data warehouse solution are:

IDENTIFYING SOURCE DATA: The first step in building a data warehouse is to identify the sources, format and location of data required to meet the needs of the business users. Most organizations have data in multiple locations and formats. The challenge for the data warehouse is not only to collect the data but to make the data from many different sources consistent and usable.

TRANSFORMATION AND CLEANSING: Data that is extracted from existing systems or acquired from outside sources must be examined for consistency and quality before it is loaded into the data warehouse.


PERFORMANCE: Data warehouses of more than a terabyte (1000 GB) are becoming more and more common. So, it is essential that the database server not only meet users' performance expectations during analysis but must be capable of handling complex analysis against huge amounts of data.

CAPACITY: The database server and hardware must be capable of storing entire data warehouse, including both relational and multidimensional data.

FLEXIBILITY: Normally, data warehouse implementation starts out as a departmental solution and evolves over time into an enterprise-wide warehouse. It is important that the server should be scalable to handle the requirements of an ever growing amount of data in the data warehouse.


The database server must be a proven as a cost effective platform. All the basic elements, hardware, operating system, database and communications sources should work together to provide an integrated, problem-free data warehouse solution.

In order to achieve optimum performance level of storing and analyzing fairly large amount of data and to meet users' expectations, the database engine must exploit the hardware power to it maximum level. To achieve such high performance, parallel input/output (I/O) processing would be suggested; which means that a single user can submit a query operation against the database and multiple I/O processor tasks are initiated internally to process the request. This capability can result up to a 700% improvement in I/O-bound query performance.

This results in significant savings when comparing costs of ownership for several data warehousing platforms.


The very core of a successful data warehouse is to have a clear understanding of a business problem and the user requirements for information analysis. It is strongly recommended that the team assembled to create a data warehouse must be comprised of IT professionals and business executives. Projects must have a clearly defined scope for managing economic and operational limitations. The process will be highly time-bound as IT experts and end users must work together towards a reasonable aggregation level for data in the warehouse.

The first data warehouse should cover a single subject area and be delivered at a relatively low cost. To minimize the risk, the targeted platform should have developed some IT infrastructure. The level of complexity involved in successfully designing and implementing a data warehouse must not be underestimated. Time must be spent to acquire and develop additional skills for data warehousing developers and users.


Data Warehousing is a complex field, with many vendors striving for market awareness.

The complexity of the technology the interactions between the various tools and the high price points for the products require companies to perform careful technology evaluation before embarking on a warehousing project. However, the potential for enormous returns on investment and competitive advantage make data warehousing difficult to ignore.

The author is Head of Department, Computer Science & IT, Beaconhouse-Informatics.