data transformation

Data transformation is the process of converting data from one format, such as a database file, XML document or Excel spreadsheet, into another. Transformations often involve converting a raw data source into a cleansed, validated and ready-to-use format. The process of data transformation can also be referred to as extract/transform/load (ETL) and is crucial to data integration, data management and creating timely business insights.

Transforming data has become increasingly important as the volume of data available to and about any organization has proliferated. Therefore, harnessing this data requires a robust data transformation strategy that focuses on business users’ needs. When data transformation is done properly, data is easy to access, consistent, secure and considered trustworthy by the intended users.

The data transformation process

The process of data transformation can be described as a two-stage process, with a few steps in each stage.

  1. The first stage involves data discovery for identifying data sources and types, determining the structure of transformations that need to occur and data mapping for defining how fields will be changed or aggregated.
  2. The second stage involves extracting data from the original source, performing transformations and sending data to the target storage, such as a database or data warehouse. Locations that extractions can occur from include structured sources, streaming sources or log files from web applications.

Since data transformation requires the use of a program that is able to read the data’s original base language, it may need to be translated prior to being usable. Commonly used transformational languages include Perl, TXL, Awk and XSLT.

Reasons for transforming data

Reasons a data transformation might need to occur include making it compatible with other data, moving it to another system, comparing it with other data or aggregating information in the data. For example, databases might need to be combined following a corporate acquisition, transferred to a cloud data warehouse or merged for analysis. Data transformation may also be an approach to removing null or duplicated values from data.

Although data transformations typically involve converting documents, data conversions sometimes involve the conversion of a program from one computer language to another to enable the program to run on a different platform. This type of data migration is usually done when a new system is adopted.

Benefits of data transformation

  • Higher data quality.
  • Reduced number of mistakes like missing values.
  • Faster queries and retrieval times.
  • Less resources needed to manipulate data.
  • Better data organization and management.
  • More usable data, especially for advanced business intelligence or analytics purposes.
This was last updated in April 2019

Continue Reading About data transformation

Dig Deeper on MDM best practices