Manager IT
Pakistan Telecom Mobile Limited (Ufone)
Oct 13 - 19, 2008

Businesses and organizations are progressing and achieving success on an unpredictable speed. It is thus necessary for organizations to consider such a solution that not only fulfill their existing needs, but also feed the business to achieve more. It is the era of competition and only those businesses can survive who has the capability of competing. Data is a vital part for the survival of the business. The information processing on data gives an edge to the business to turn their dreams into reality. The realization of this fact gives a thought to the business to opt a solution of data warehousing.

Every growing organization is looking for a best solution for their business needs. These business needs are fulfilled by not only some specific business questions, rather these are fed by ad-hoc analysis as well. These needs can be achieved when enough data is available and the underlying platform is capable of extracting and presenting this information in timely and useful manner. It is, therefore, necessary to opt for a solution that is in line with the business vision. Every organization is concerned about which platform to choose? Which RDBMS / data warehousing solution will help them achieve their business needs and align their data to present not only the current picture of the data, rather to perform state-of-the-art ideas and thoughts by the visionaries, management, marketing staff and other departments. In this article I will put some of the vital concerns in front of the organizations which can be helpful to take a right decision in the early stages of going for the data warehousing and deciding the underlying data warehousing solution for their business needs.

BUSINESS MODEL: Business model is what represent the business. If the model is scattered and doesn't provide a single integrated view of the business, it becomes difficult to extract accurate and useful information in timely manner. Thus business model is an important factor to decide along with other factors. It is equally important that business models should be flexible enough to accommodate and integrate new applications without major changes in the system.

DATA STORAGE: Every day business is growing and thus the data is increasing. It is necessity of the business to select a solution that can easily accommodate large volume of data. The secret of success lies in the detailed data. It is identical to the fact that as much history one knows, one is in better position to give accurate prediction and analysis of future. Similarly, if large amount of data is available for analysis, it will place the business in better position to find the business trends and align the future needs and vision.

NUMBER OF CONCURRENT USERS: With the growth of business, the users involved in querying the data for information and analysis increases. It, therefore, becomes an important consideration to choose a platform that can accommodate future needs of the users and may not appear costly with the number of users, both in terms of financials and manageability. On the other hand, if a business wants to grow, it has to do early analysis and have to indulge all the visionaries, marketing staff, managers and other departments in such analysis to be in better position for the competition and growth.

WORKLOAD: The underlying platform not only support for specific type of workload, rather it must provide a support for all the future applications and analysis. Businesses that are static in nature can never grow. Such business, however, may fade away with time, and may be cited as lessons learned from past. Along with time, business has to adopt new and business friendly methods for their platform to achieve more from what they invested. For example, initially the workload may be only weekly batches, which then along with business requirements, understanding and growth can go to daily batches, and then eventually to ad-hoc queries, near real time data feeds and finally the active data warehousing destiny. The same cycle is true for reporting and business analysis as well. Therefore, it becomes necessary that the underlying platform have the capability to transform and can run hand in hand along with the business

QUERY DATA VOLUME: Ask your self, whether the kind of analysis is performed will always be returning a single data element, or will it always be returning a huge volume of data? The answer is certainly NO. Query data volume is dependent on the kind of analysis and the volume of data available in the data warehouse. However, it is important that all kind of query data volume is appropriately supported by the solution. Moreover, it is also important that it is handled effectively. For example, if only few rows have to be returned then there should be some mechanism to access the data efficiently (like indexes etc). Therefore, the concern should not only be how effectively it handles large volume of data, rather how efficiently it is handling / accessing small volume of data. There should be less cost involved with respect to system resource utilization (like memory, I/O etc).

QUERY COMPLEXITY: Growth leads to increase in data volume, and data volume leads to complexity of the business needs. It is relatively easy to start from ground and make a rapid success, however it is very difficult to maintain it and yet make more progress. The complexity of analysis increases, the complexity of business nature increase and thus it needs a solution that not only support for their simple needs, rather also support for complex nature of the business. In the beginning, there might be a couple of entities (or joins) of interest to the business users, however, after some time they might require to join more than dozen of entities to give a new value to the business.

Parallelism: The concept is similar to treating customer. Business wants to satisfy the entire customer and give them the level of importance necessary for it without delaying any of them. The same concept is true for the queries and other operations for the data warehousing, that is, every request should be treated like customers, without making them wait un-necessarily. This can be achieved only if the underlying platform supports parallelism in true sense.

Most of the platforms support all of the above. But it doesn't mean that all suits business needs. The question is not whether these are supported by the vendor or not, the question is, what the efficiency of the solution, when all of them are evaluated simultaneously. Every platform supports large volume of data, every platform supports concurrent users, but what are the limitations of the platform when these are required together? What will be the threshold when all of them are executed at the same time? Benchmark is not always a solution; similarly, answer to individual questions is not always a requirement, rather what will be the behavior of the underlying platform with all of the questions operating at the same time.

It is necessary that the right decision should be made for the first time. The return on investment (ROI) should be calculated to find the business value from the solution, because it becomes more costly and time consuming to migrate later to another platform. Remember the adage; intelligent people learn from their own mistakes, wise people learn from the mistakes of others. Similarly, right business requires right decisions in time.