Posted on: 21st June 2021, 3:10 pm
As of this writing, in the spring of 2021, the data cataloguing tools have just been integrated in Qlik Sense SaaS. These new features allow users to profile and govern data sets within the cloud hub, making it easier to evaluate which data is best to use within a new or existing app. In this blog post, we will take a closer look at the data catalog and answer some questions about it.
What is Data Catalog?
The Catalog tools allow users to view metadata information about the data files that has been stored in the cloud account. Users can see where the data comes from, what type of data it is, and how it can best be analysed and used. There is also functionality which lets you categorise the data. The catalog tools available at the moment is described as follows:
1. Dataset detail viewer: Provides an information overview of your data. User-defined properties and tags are included along with metadata such as size, owner, file type, and created and last modified timestamps.
2. Tagging data: Lets you apply filterable metatags that assist in locating and organising the data. This is useful when you need to filter on particular types of data assets and improves the search and categorisation of the data.
3. Data profiling: This feature profiles your datasets with statistics such as data type, preview of sample values, most common values, value frequency and number of distinct values. This is useful when you need to analyse the data and plan your visualisations before creating an app.
4. Dataset properties: Properties can be applied to the data in order to associate it with specific data compliance standards. This is useful for data protection and compliance with regional and industry-specific privacy requirements. Examples include GDPR and PII.
5. Create an app from your data: This feature presents an option to create a new app directly from an uploaded data file. This is a key element of Qlik’s ‘raw to analytics-ready’ workflow.
Where can I find it?
The Catalog tools are available for new and existing data files that a user has access to in their personal and shared spaces. You can access the datasets from two sections:
1. To view all data files that are available in the hub, go to the Explore section and then the Data tab.
2. To view only your uploaded data files, go to the Collections section and then the Generated tab, and click on Your Data.
Once you have located the data files, simply click on the dots in the bottom right corner on a file to see a drop down menu where you can select any of the catalog tools.
Navigation option 1. Explore > Data
Navigation option 2. Collections > Generated > Your Data
How does it work?
Dataset details and Data Profiling
These tools simply provide information about the data and can be found in the dropdown menu on a data file.
In the same dropdown menu as above, click on Edit. This allows you to add tags to your data. From here, you can also edit the file name and add a description.
By selecting Properties in the data file menu, it will take you to the Data Properties page where you can add new or edit existing properties. As an example, you can add the GDPR property to indicate that the dataset contains personal data and that the GDPR framework applies. Once saved, you can see what properties a dataset is associated with in the dataset details. Besides the pre-set properties, there is an option to add a custom subject area which is a free text field.
When and why should I use Data Catalog?
The data profiling feature is a good option to use before you start to develop an application. When you need to understand and evaluate a dataset, the data profile provides a user-friendly overview. This replaces the prior need to first load the data into an application in order to analyse it. The other metadata options, such as tags and properties, are useful when the number of files grow in your account. Since the cloud hub does not provide the hierarchical folder structure that you may be used to when working on a local machine, the more important it becomes to categorise your data in order to improve the searchability of it and to keep track of sensitive data.