SAP HANA - SQL Data Profiling

SQL Data Profiling task is used to understand and analyze data from multiple data sources. It is used to remove incorrect, incomplete data and prevent data quality problems before they are loaded in Data warehouse.

Here are the benefits of SQL Data Profiling tasks −

It helps is analyzing source data more effectively.
It helps in understanding the source data better.
It remove incorrect, incomplete data and improve data quality before it is loaded into Data warehouse.
It is used with Extraction, Transformation and Loading task.

The Data Profiling task checks profiles that helps to understand a data source and identify problems in the data that has to be fixed.

You can use the Data Profiling task inside an Integration Services package to profile data that is stored in SQL Server and to identify potential problems with data quality.

Note − Data Profiling Task works only with SQL Server data sources and does not support any other file based or third party data sources.

Access Requirement

To run a package contains Data Profiling task, user account must have read/write permissions with CREATE TABLE permissions on the tempdb database.

Data Profiler Viewer

Data Profile Viewer is used to review the profiler output. The Data Profile Viewer also supports drilldown capability to help you understand data quality issues that are identified in the profile output. This drill down capability sends live queries to the original data source.

Data Profiling Task Setup and Reviewing

Setting up the Data Profiling Task

It involves execution of a package that contains Data Profiling task to compute the profiles. The task saves the output in XML format to a file or a package variable.

Reviewing the Profiles

To view the data profiles, send the output to a file and then use the Data Profile Viewer. This viewer is a stand-alone utility that displays the profile output in both summary and detail format with optional drilldown capability.

Data Profiling − Configuration Options

The Data Profiling task has these convenient configuration options −

Wildcard columns

While configuring a profile request, the task accepts ‘*’ wildcard in place of a column name. This simplifies the configuration and makes it easier to discover the characteristics of unfamiliar data. When the task runs, the task profiles every column that has an appropriate data type.

Quick Profile

You can select Quick Profile to configure the task quickly. A Quick Profile profiles a table or view by using all the default profiles and settings.

The Data Profiling Task can compute eight different data profiles. Five of these profiles can check individual columns and the remaining three analyze- multiple columns or relationships between columns.

Data Profiling − Task Outputs

The Data Profiling task outputs the selected profiles into XML format that is structured like DataProfile.xsd schema.

You can save a local copy of the schema and view the local copy of the schema in Microsoft Visual Studio or another schema editor, in an XML editor or in a text editor such as Notepad.