A data warehouse is the main repository of an organization's historical data, its corporate memory. It contains the raw material for management's decision support system. The critical factor leading to the use of a data warehouse is that a data analyst can perform complex queries and analysis, such as data mining, on the information without slowing down the operational systems...
Bill Inmon, an early and influential practitioner, has formally defined a data warehouse in the following terms;
Subject-oriented
The data in the database is organized so that all the data elements relating to the same real-world event or object are linked together;
Time-variant
The changes to the data in the database are tracked and recorded so that reports can be produced showing changes over time;
Non-volatile
Data in the database is never over-written or deleted - once committed, the data is static, read-only, but retained for future reporting; and
Integrated
The database contains data from most or all of an organization's operational applications, and that this data is made consistent.
A data warehouse might be used to find the day of the week on which a company sold the most widgets in May 1992, or how employee sick leave the week before the winter break differed between California and New York from 2001–2005.
While operational systems are optimized for simplicity and speed of modification (see OLTP) through heavy use of database normalization and an entity-relationship model, the data warehouse is optimized for reporting and analysis (online analytical processing, or OLAP). Frequently data in data warehouses are heavily denormalised, summarised or stored in a dimension-based model. However, this is not always required to achieve acceptable query response times.
***********************************************************************************************************************************************************
Data Warehouses are a distinct type of computer database that were first developed during the late 1980s and early 1990s. They were developed to meet a growing demand for management information and analysis that could not be met by operational systems. Operational systems were unable to meet this need for a range of reasons:
* The processing load of reporting reduced the response time of the operational systems,
* The database designs of operational systems were not optimized for information analysis and reporting,
* Most organizations had more than one operational system, so company-wide reporting could not be supported from a single system
* Development of reports in operational systems often required writing specific computer programs which was slow and expensive
As a result, separate computer databases began to be built that were specifically designed to support management information and analysis purposes. These data warehouses were able to bring in data from a range of different data sources, such as mainframe computers, minicomputers, as well as personal computers and office automation software such as spreadsheet, and integrate this information in a single place. This capability, coupled with user-friendly reporting tools and freedom from operational impacts, has led to a growth of this type of computer system.
As technology improved (lower cost for more performance) and user requirements increased (faster data load cycle times and more features), data warehouses have evolved through several fundamental stages:
Off line Operational Databases
Data warehouses in this initial stage are developed by simply copying the database of an operational system to an off-line server where the processing load of reporting does not impact on the operational system's performance.
Off line Data Warehouse
Data warehouses in this stage of evolution are updated on a regular time cycle (usually daily, weekly or monthly) from the operational systems and the data is stored in an integrated reporting-oriented data structure.
Real Time Data Warehouse
Data warehouses at this stage are updated on a transaction or event basis, every time an operational system performs a transaction (e.g. an order or a delivery or a booking etc.)
Integrated Data Warehouse
Data warehouses at this stage are used to generate activity or transactions that are passed back into the operational systems for use in the daily activity of the organization.
***********************************************************************************************************************************************************
Advantages
There are many advantages to using a data warehouse, some of them are:
* Data warehouses enhance end-user access to a wide variety of data.
* Decision support system users can obtain specified trend reports, e.g. the item with the most sales in a particular area within the last two years.
* Data warehouses can be a significant enabler of commercial business applications, particularly customer relationship management (CRM) systems.