Do you know what data your company has? What digital footprints are left by website customers? What is in your CRM systems, company spreadsheets, or analytical tools linked to the website?
I bet you or your colleagues monitor this data, conduct analysis and are guided by the numbers when making business decisions. But do you conduct research in silos, or do you look at the company data as a whole, full of dependencies?
An example. A retail company is present in several European countries. It has physical shops, an e-commerce service, and a mobile app. The company plans to enter a new market. To develop a strategy that takes into account actions that have been successful in other countries, it would be good to collate data from many databases – including information on costs, stationary sales, online sales, data on conducted marketing campaigns or user reactions to promotional actions. Manual analysis of each element will take a lot of time and may lead to wrong conclusions.
This data, which comes from various sources, can be put together in a single system – a data warehouse architecture that will allow analysis to be carried out within minutes and present the results on an easy-to-read dashboard.
Data warehouse definition
A cloud data warehouse is an analytical system that enables comprehensive data analysis from various sources. It supports business intelligence processes, which involve extracting relevant information to make data-driven decisions. The system takes data from different databases, cleans and unifies it. The data warehouse stores current data as well as historical data.
The data warehouse does not affect the functioning of the used database systems or the content of the data – the information is extracted into the warehouse and then only available for inspection (it cannot be modified). The data is used to generate reports or make predictions, and the results can be presented in graphical visualisations.
Benefits of having a data warehouse
Data insights. Looking at data from a broader perspective allows you to search for correlations between areas and pick out trends by analysing historical data. The warehouse also will enable you to analyse slices of data and create detailed reports.
Forecasting capability. The data warehouse can be used as part of a decision support system (DSS), in which business scenarios are simulated using machine learning models.
Data centralisation. The data warehouse makes it possible to combine all the company data previously dispersed among many databases, external systems or spreadsheets in one place.
Archiving. The data warehouse stores both current and historical data. A robust data analysis system enables petabytes of information to be stored.
Work efficiency. Employees of a company with a data warehouse have convenient, almost instant access to up-to-date information from their area and other departments. Shortened information paths and the ability to visually present data allow them to perform daily tasks more efficiently and to search for opportunities to optimise processes.
Cloud data warehouse vs data lake
Data lakes and data warehouses are both used to store and manage large amounts of data, but they differ in their approach and purpose. A data warehouse is a structured repository that is optimised for storing and querying structured data, such as transactional data from an organization’s operational systems. Data warehouses are designed for high-performance analytics and reporting, making it easier for business users to access and analyze data.
In contrast, a data lake is a more flexible and open-ended storage solution that is optimised for storing and processing large amounts of both structured and unstructured raw data. Data lakes are often used for exploratory data analysis and data science projects, where data is collected from various sources and transformed for analysis. The main difference between a data lake and a data warehouse is the level of structure and the intended use case of the data. While data warehouses are ideal for structured data used in reporting, data lakes are more versatile and designed for big data applications that require flexibility and agility.
What are enterprise data warehouses?
An enterprise data warehouse (EDW) is a centralized repository for all the data that an organization collects from various sources, such as operational systems, third-party vendors, and other data sources. The EDW is designed to support the reporting, analysis, and decision-making needs of the entire organisation. It provides a single source of truth for data, which helps to eliminate data silos and ensures that everyone in the organization is working with the same data.
Subsets of an enterprise data warehouse are data marts, designed to serve the needs of a specific business unit, department, or group within an organization. A data mart is essentially a smaller, more focused version of the EDW, with data that is tailored to the needs of a particular group of users.
Data marts are typically created to meet the specific needs of the intended users. This can include aggregating data, creating new dimensions, or reformatting data to match specific reporting requirements. Once the data mart is created, it can be accessed by users who have been granted the appropriate permissions.
How does it work? A data warehouse architecture
A data warehouse consists of four elements:
- the data source,
- the ELT software,
- the proper warehouse,
- and analytical applications.
First, the data is taken from external sources. This could be a relational or non-relational database, Google Analytics, a CRM system or a company spreadsheet.
Then the records are unified – processed, cleaned and brought into the same format. Such integrated data is sent to the proper data warehouse – an application that stores the data and allows analytics to be carried out. This process of extracting, transforming and loading is called ETL, after the first letter of each stage.
The warehouse stores standardised data – both current and historical. This is where the analysis takes place in response to a query sent by the user.
The analysis results can then be presented using data visualisation programmes or uploaded to BI applications.
Database vs data warehouse. OLTP and OLAP
What is the difference between data warehouses and a database? Although these two systems work with data, they perform completely different roles. In a nutshell, databases are used for fast, ongoing processing of small slices of data when the purpose of a data warehouse is to conduct complex analytics based on vast sets of information.
OLTP – on-line transactional processing
In the case of databases, we talk about OLTP (on-line transactional processing) – a type of data processing that involves the simultaneous execution of multiple transactions, for example in online banking, e-commerce portals or online booking services.
Databases hold information about a particular slice of business – for example, the status of products in an online shop. Hundreds, thousands of customers shop on an e-commerce site daily – and the database receives hundreds or thousands of requests and updates.
The database stores the latest information because its performance would be significantly reduced if the system had to maintain terabytes of historical data (a backup is stored in another location). OLTP environments process large volumes of simple queries well but are not capable of deep analytics – instead, they are a good source of information that a data warehouse can process.
Examples of OLTP applications:
- an e-commerce site creating an order for a sold product,
- airline portal for online booking,
- a hospital patient registration system,
- a bank adding an ATM cash withdrawal transaction to an account.
OLAP – on-line analytical processing
A data warehouse is an OLAP (on-line analytical processing) system. It is an environment that, unlike OLTP, is dedicated to handling less frequent, more complex queries on large amounts of data. The speed of execution is not so crucial, so data analysis conducted in a data warehouse can take a few seconds to even a few hours.
OLAP collects data from various sources and processes it during the ELT process. It also stores historical data for comprehensive analysis and reporting. The information that goes into the data warehouse cannot be modified – it is only available for inspection.
Examples of OLAP usage:
- segmenting customers into groups taking into account previous interactions with the company to provide tailored content and promotions,
- analysing the effectiveness of marketing activities among particular groups of customers,
- predicting the likelihood of churn based on sales data in recent years,
- trend analysis, forecasting demand and supply in the company’s business area.