This is the most common transformation used in Data Services and you can perform the following functions −
As Query transformation is the most commonly used transformation, a shortcut is provided for this query in the tool palette.
To add Query transform, follow the steps given below −
Step 1 − Click the query-transformation tool palette. Click anywhere on the Data flow workspace. Connect this to the inputs and outputs.
When you double click the Query transform icon, it opens a Query editor that is used to perform query operations.
The following areas are present in Query transform −
The Input and Output schemas contain Columns, Nested Schemas and Functions. Schema In and Schema Out shows the currently selected schema in transformation.
To change the output schema, select the schema in the list, right click and select Make Current.
Data Quality Transformations cannot be directly connected to the upstream transform, which contains nested tables. To connect these transform you should add a query transform or XML pipeline transform between transformation from nested table and data quality transform.
Step 1 − Go to Object Library → Transform tab
Step 2 − Expand the Data Quality transform and add the transform or transform configuration you want to add to data flow.
Step 3 − Draw the data flow connections. Double click the name of the transform, it opens the transform editor. In input schema, select the input fields that you want to map.
Note − To use Associate Transform, you can add user defined fields to input tab.
Text Data Processing Transform allows you to extract the specific information from large volume of text. You can search for facts and entities like customer, product, and financial facts, specific to an organization.
This transform also checks the relationship between entities and allows the extraction. The data extracted, using text data processing, can be used in Business Intelligence, Reporting, query, and analytics.
In Data Services, text data processing is done with the help of Entity Extraction, which extracts entities and facts from unstructured data.
This involves analyzing and processing large volume of text data, searching entities, assigning them to appropriate type and presenting metadata in standard format.
The Entity Extraction transform can extract information from any text, HTML, XML, or certain binary-format (such as PDF) content and generate structured output. You can use the output in several ways based on your work flow. You can use it as an input to another transform or write to multiple output sources such as a database table or a flat file. The output is generated in UTF-16 encoding.
Entity Extract Transform can be used in the following scenarios −
Finding a specific information from large amount of text volume.
Finding structured information from unstructured text with existing information to make new connections.
Reporting and analysis for product quality.
Text data processing is used for finding relevant information from unstructured text data. However, data cleansing is used for standardization and cleansing structured data.
|Parameters||Text Data Processing||Data Cleansing|
|Input Type||Unstructured Data||Structured Data|
|Input Size||More than 5KB||Less than 5KB|
|Input Scope||Broad domain with many variations||Limited variations|
|Potential Usage||Potential meaningful information from unstructured data||Quality of data for storing in to Repository|
|Output||Create annotations in form of entities, type, etc. Input is not changed||Create standardized fields, Input is changed|