Posted on: 3rd November 2020, 4:12 pm
The ETL process is a commonly used procedure in data warehousing and business intelligence and refers to Extracting data from one or more sources, Transforming the data into a structured format fit for analysis and finally, Loading it into a target destination for the end user to consume. When you don’t have data warehouses and dedicated ETL tools to assist you in this process, the functionality in Qlik Sense can serve as an alternative to manage that ETL process and create a data store.
I recently got a question from a client whether it’s always necessary to set up several ETL applications in Qlik. Isn’t it enough to do all of the data extracts and transformations in one application, if not directly in the end user application? As always, it depends on the specific use case, but it is best practice to use a stepwise ETL procedure in Qlik Sense and this blog post explains the advantages with it.
1. Performance and scalability
To extract and transform only a selective set of data straight from a database or an API, and then load it all in the same application as the end-user will use for analysis will most likely affect the performance negatively compared to doing it in several steps. It is typically faster to extract and transform data in separate applications and storing it to Qlik’s powerful and native file format QVD. The QVD file is 10-100 times faster than reading from other data sources and has up to 10 times the compression of a normal text file. By extracting entire tables without any formatting and storing them out to QVD-files will also reduce the workload and traffic on the data source you are querying.
Besides the improved performance, using a QVD library allows you to access and reuse the data in more than one application, rather than having to extract the same table for every application it is used in.
2. Managing increasing data volumes
As the number of rows in your data source tables grow for every day, the heavier the applications will get. Even if you limit the analysis to only look at data from the last couple of years or so, the data may increase on other fronts. The number of systems and tools used by the business to collect data may increase, or the number of fields and tables in the existing data sources may grow. This complexity, or big data challenges, is a reality for many organisations today.
The ETL procedure helps you manage the increase in data volumes in a controlled and governed environment. Even when the reload times start to increase in your ETL process, you can make use of incremental data loads where you only extract new data and add it to the existing tables, rather than retrieving the entire data set every time you need to update the data.
3. Minimising the risks by not “putting all eggs in one basket”
To do all of the steps in your ETL process in one application is risky. If the script execution stops for whatever reason, you need to go through the whole journey again after fixing the error. By splitting up the extracts and transformations into several applications, you not only minimise the risk of a single error to break the update of the entire data library, but also make it easier to detect and resolve errors. By having several smaller applications typically means smaller scripts to debug and shorter reload times to execute when testing.
4. Data Quality
The ETL procedure allows you to focus on the quality of your data rather than the quantity. By removing the challenges associated with increasing data volumes, you can steer the attention to improve the quality of the data. When you only extract tables once per reload cycle and build a compact QVD library, you can then shift the effort to selecting only the data you need from that storage. Moreover, you can move on and create steps for data cleansing and data profiling to make the data more usable and narrow it down even further.
5. The visual interface of Qlik Sense
Last but not least, one of Qlik Sense’s overlooked advantages is that it provides a user-friendly visual interface for data management; from data analysis and data modelling to troubleshooting on a granular level. Traditional query-based database and data warehouse tools do not always offer the same visual capabilities.
From ETL to a Data Integration platform
There are many advantages with preparing your data for analysis by setting up an ETL process. The overall reason is to think about your data in the long term and having a strategy that reduces risk and increases efficiency. With the recent additions of big data and ETL automation tools in Qlik’s data integration platform, this process is bound to be even simpler and require less manual elements.
By Mats Severin