What are the services of data transformation?

In data transformation, the data are transformed or linked into forms applicable for mining. Data transformation can contains the following −

  • Smoothing − It can work to remove noise from the data. Such techniques include binning, regression, and clustering.

  • Aggregation − In aggregation, where summary or aggregation operations are applied to the data.

  • Generalization − In Generalization, where low-level or “primitive” (raw) data are restored by larger-level concepts through the need of concept hierarchies.

There are various services of data transformation which are as follows −

Integration − Integration includes generating surrogate keys, mapping keys from one scheme to another, and mapping programs into complete descriptions. There is an implicit authority to support a master key lookup table behind this transformation.

Slowly changing dimension maintenance − It can be identifying changed values and creating surrogate keys is a tricky process, but it is not space-warp mathematics. If you buy a data staging tool, it should have the algorithms for managing slowly changing dimensions built-in.

Denormalization and renormalization − Denormalizing a hierarchy of separate tables into a dimension is a standard warehouse transformation process. Some of the data staging tools offer a star schema feature that automatically performs this function. Moreover, some denormalization takes place in the fact table phase.

For example, a financial schema may have a dimension that is the amount type, with the values Actual, Budget, or Forecast. It is depending on the level of detail these records contain, it may make a lot of sense to pivot this column out into a single row with three columns of dollar amounts, one for each amount type.

Cleansing, deduping, merge/purge − This is a big problem for many data warehouses, especially those concerned with external entities like customers, businesses, doctors, and patients. It is a complex process, but several vendors offer tools and services specifically for this problem.

Data type conversion − This involves lower-level transformations converting one data type or format to another. This ranges from converting IBM’s mainframe character set EBCDIC to ASCII, to converting date, numeric, and character representations from one database to another.

Calculation, derivation, allocation − These are transformations to use the business rules it is recognized during the requirements phase. Make sure the tool it can choose has a complete set of functions available, including string manipulation, date and time arithmetic, conditional statements, and basic math.

Aggregation − Aggregation can be managed in some element of the load process, based on which resources are accessible at which stage. If it can compute aggregations as an element of the extract or transformation process, it is possible to use tools like Syncsort directly to the flat records. These utilities are developed for sorting and reporting and are completely good at it.