Do you know what data your company has? What digital footprints are left by website customers? What is in your CRM systems, company spreadsheets, or analytical tools linked to the website?
I bet you or your colleagues monitor this data, conduct analysis and are guided by the numbers when making business decisions. But do you conduct research in silos, or do you look at the company data as a whole, full of dependencies?
An example. A retail company is present in several European countries. It has physical shops, an e-commerce service, and a mobile app. The company plans to enter a new market. To develop a strategy that takes into account actions that have been successful in other countries, it would be good to collate data from many databases – including information on costs, stationary sales, online sales, data on conducted marketing campaigns or user reactions to promotional actions. Manual analysis of each element will take a lot of time and may lead to wrong conclusions.
This data, which comes from various sources, can be put together in a single system – a data warehouse that will allow analysis to be carried out within minutes and present the results on an easy-to-read dashboard.
Data warehouse definition
A data warehouse is an analytical system that enables comprehensive data analysis from various sources. It supports business intelligence processes, which involve extracting relevant information to make data-driven decisions. The system takes data from different databases, cleans and unifies it. The data warehouse stores current data as well as historical data. The data warehouse does not affect the functioning of the used database systems or the content of the data – the information is extracted into the warehouse and then only available for inspection (it cannot be modified). The data is used to generate reports or make predictions, and the results can be presented in graphical visualisations.
An analysis result showed on interactive panels on the Looker platform
Benefits of having a data warehouse
Data insights. Looking at data from a broader perspective allows you to search for correlations between areas and pick out trends by analysing historical data. The warehouse also will enable you to analyse slices of data and create detailed reports.
Forecasting capability. The data warehouse can be used as part of a decision support system (DSS), in which business scenarios are simulated using machine learning models.
Data centralisation. The data warehouse makes it possible to combine all the company data previously dispersed among many databases, external systems or spreadsheets in one place.
Archiving. The data warehouse stores the most recent as well as historical data. A robust data analysis system enables petabytes of information to be stored.
Work efficiency. Employees of a company with a data warehouse have convenient, almost instant access to up-to-date information from their area and other departments. Shortened information paths and the ability to visually present data allow them to perform daily tasks more efficiently and to search for opportunities to optimise processes.
How does it work? A data warehouse architecture
A data warehouse consists of four elements:
- the data source,
- the ELT software,
- the proper warehouse,
- and analytical applications.
First, the data is taken from external sources. This could be a relational or non-relational database, Google Analytics, a CRM system or a company spreadsheet.
Then the records are unified – processed, cleaned and brought into the same format. Such integrated data is sent to the proper data warehouse – an application that stores the data and allows analytics to be carried out. This process of extracting, transforming and loading is called ETL, after the first letter of each stage.
The warehouse stores standardised data – both current and historical. This is where the analysis takes place in response to a query sent by the user.
The analysis results can then be presented using data visualisation programmes or uploaded to BI applications.
Database vs data warehouse. OLTP and OLAP
What is the difference between a data warehouse and a database? Although these two systems work with data, they perform completely different roles. In a nutshell, databases are used for fast, ongoing processing of small slices of data when the purpose of a data warehouse is to conduct complex analytics based on vast sets of information.
OLTP – on-line transactional processing
In the case of databases, we talk about OLTP (on-line transactional processing) – a type of data processing that involves the simultaneous execution of multiple transactions, for example in online banking, e-commerce portals or online booking services.
Databases hold information about a particular slice of business – for example, the status of products in an online shop. Hundreds, thousands of customers shop on an e-commerce site daily – and the database receives hundreds or thousands of requests and updates.
The database stores the latest information because its performance would be significantly reduced if the system had to maintain terabytes of historical data (a backup is stored in another location). OLTP environments process large volumes of simple queries well but are not capable of deep analytics – instead, they are a good source of information that a data warehouse can process.
Examples of OLTP applications:
- an e-commerce site creating an order for a sold product,
- airline portal for online booking,
- a hospital patient registration system,
- a bank adding an ATM cash withdrawal transaction to an account.
OLAP – on-line analytical processing
A data warehouse is an OLAP (on-line analytical processing) system. It is an environment that, unlike OLTP, is dedicated to handling less frequent, more complex queries on large amounts of data. The speed of execution is not so crucial, so data analysis conducted in a data warehouse can take a few seconds to even a few hours.
OLAP collects data from various sources and processes it during the ELT process. It also stores historical data for comprehensive analysis and reporting. The information that goes into the data warehouse cannot be modified – it is only available for inspection.
Examples of OLAP usage:
- segmenting customers into groups taking into account previous interactions with the company to provide tailored content and promotions,
- analysing the effectiveness of marketing activities among particular groups of customers,
- predicting the likelihood of churn based on sales data in recent years,
- trend analysis, forecasting demand and supply in the company’s business area.