BI Resources - BI FAQs
- What Is Business Intelligence?
- What is a Data Warehouse?
- What is Dimensional Modeling?
- What is a Star Schema?
- What is a Snowflake Schema?
- What is ETL?
What is Business Intelligence?
Business Intelligence is a term introduced by Howard Dresner of Gartner Group in 1989. He described Business Intelligence as a set of concepts and methodologies to improve decision making in business through use of facts and fact based systems.
Over time as use of Business Intelligence has become mainstream more definitions of Business Intelligence have emerged.
Business intelligence (BI) is a broad category of application programs and technologies for gathering, storing, analyzing, and providing access to data to help enterprise users make better business decisions. BI applications include the activities of decision support, query and reporting, online analytical processing (OLAP), statistical analysis, forecasting, and data mining. (Definition source: www.sauder.ubc.ca/cgs/itm/itm_glossary.html)
TDWI defines Business Intelligence as The processes, technologies and tools needed to turn data into information, information into knowledge and knowledge into plans that drive profitable business actions. Business intelligence encompasses data warehousing, business analytic tools and content knowledge management.
Suggested Reading:- Business Intelligence: The Savvy Manager's Guide by David Loshlin
What is a data warehouse?
The term data warehouse, its interpretation and use have undergone an evolution over the past more than a decade since the term was coined by William Inmon in 1990. He defined the data warehouse as a subject-oriented, integrated, non-volatile, time-variant collection of data organized to support Management needs.
Ralph Kimball, his book The Data Warehouse Life Cycle Toolkit, 1998, has defined Datawarehouse as 'nothing more than a union of all the constituent data marts'.
A single subject/ business function data warehouse is referred to as a data mart.
A data warehouse is, primarily, a record of an enterprise's past transactional and operational information, stored in a database designed to favor efficient data analysis and reporting (especially OLAP).
Data warehouses often hold large amounts of information which are sometimes subdivided into smaller logical units called dependent data marts.
Usually, two basic ideas guide the creation of a data warehouse:- Integration of data from distributed and differently structured databases, which facilitates a global overview and comprehensive analysis in the data warehouse.
- Separation of data used in daily operations from data used in the data warehouse for purposes of reporting, decision support, analysis and controlling
Periodically, one imports data from enterprise resource planning (ERP) systems and other related business software systems into the data warehouse for further processing. It is common practice to "stage" data prior to merging it into a data warehouse. In this sense, to "stage data" means to queue it for preprocessing, usually with an ETL tool. The preprocessing program reads the staged data (often a business's primary OLTP databases), performs qualitative preprocessing or filtering (including denormalization, if deemed necessary), and writes it into the warehouse. (Definition source http://en.wikipedia.org/wiki/)
Recent trends
Data warehousing was earlier limited to historical data, however the need for real time data warehousing is now encompassing both historical and current, "live" data. With the everf increasing business cycle and 24/7 transaction processing on the web, the data warehouses have no longer the luxury to operate in the batch mode. Hence ability to process data in near real time using streaming ETL processes, automatically refreshing reports and analytical metrics and creating the alerts based on threshold are some of the ways data warehousing is changing. With the deluge of data anticipated to be hurled at the data marts and data warehouses by the advent of RF-ID and other technologies data warehouses are now real-time, 24/7 high performance, response centers for corporate information.
Suggested Reading- The Data warehouse Lifecycle Toolkit: Expert Methods for Designing, Developing and Deploying Data Warehouses by Ralph Kimball, Laura Reeves, Margy Ross and Warren Thornthwaite
- Building the Datawarehouse by William H Inmon
What is dimensional modeling?
Dimensional modeling a discipline of logical data modeling which addresses the data modeling requirements of databases for end-user query and analysis applications. The dimensional modeling has been formalized and popularized by the leading luminary in the field of data warehousing, Ralph Kimball. The focus of dimensional models is end user understandability of data, handling of historical data in the changing business environment, high query performance and consistent query performance in an ad-hoc query environment.
For in-depth discussion of the dimensional modeling technique, refer to the following resources- A Dimensional Modeling Manifesto by Ralph Kimball
- The Datawarehouse Toolkit: The complete guide to Dimensional Modeling By Ralph Kimball and Margy Ross
What is a star schema?
The star schema is the basic schema of a dimensional model. It is called a star schema because the visual representation of this schema resembles a star, with a table in the center and multiple tables connected to it representing points of the star. The center table of the star is a fact table and the points of the star are the dimension tables. The fact table contains the information which is usually numeric in nature e.g. No. of orders, Total sales dollars etc. The dimension tables contain the context information e.g. customer information, product information, region information etc. for the Sales figures in the fact table.
What is snowflake schema?
A snowflake schema is an extension of star schema, where the dimension tables are normalized (or snowflaked in the diagrammatic representation of the schema) to eliminate redundancy of data. Low cardinality attributes of the dimension tables are usually separated out in a different table in a snowflake schema. E.g. a product dimension may be normalized to hold product manufacturing facility details in a separate (snowflake) table.
What is ETL?
ETL is an acronym for a data movement process involving
Extract- Extract the required data components from multiple source systems Transform- Transform/cleanse/ aggregate the data for the target database Load - Load the transformed data into the target database
ETL process is typically used for getting the data from the source systems into the data warehouse. However, in recent times, the ETL process has become part of the larger Data Integration initiatives in organizations.
The ETL processes can be either hand coded or developed using ETL tools from ETL tool vendors like Ab Initio, Data Junction, IBM(Ascential), Informatica. Some of the query and analysis tools also have ETL functionality bundled in. The ETL functionality can also be achieved by adopting an ELT approach where the transformation capabilities of the target database are utilized to achieve the same results.
Suggested reading- Overview of ETL http://www.computerworld.com/databasetopics/businessintelligence/datawarehouse /story/0,10801,89534,00.html
- For more in-depth reading about ETL process design read the book The Data warehouse ETL Toolkit By Ralph Kimball and Joe Caserta

