Data transformation is the process of converting data from one format, such as a database file, XML document or Excel spreadsheet, into another. Transformations often involve converting a raw data source into a cleansed, validated and ready-to-use format. The process of data transformation can also be referred to as extract/transform/load (ETL) and is crucial to data integration, data management and creating timely business insights.
Transforming data has become increasingly important as the volume of data available to and about any organization has proliferated. Therefore, harnessing this data requires a robust data transformation strategy that focuses on business users’ needs. When data transformation is done properly, data is easy to access, consistent, secure and considered trustworthy by the intended users.
The data transformation process
The process of data transformation can be described as a two-stage process, with a few steps in each stage.
- The first stage involves data discovery for identifying data sources and types, determining the structure of transformations that need to occur and data mapping for defining how fields will be changed or aggregated.
- The second stage involves extracting data from the original source, performing transformations and sending data to the target storage, such as a database or data warehouse. Locations that extractions can occur from include structured sources, streaming sources or log files from web applications.
Since data transformation requires the use of a program that is able to read the data’s original base language, it may need to be translated prior to being usable. Commonly used transformational languages include Perl, TXL, Awk and XSLT.
Reasons for transforming data
Reasons a data transformation might need to occur include making it compatible with other data, moving it to another system, comparing it with other data or aggregating information in the data. For example, databases might need to be combined following a corporate acquisition, transferred to a cloud data warehouse or merged for analysis. Data transformation may also be an approach to removing null or duplicated values from data.
Although data transformations typically involve converting documents, data conversions sometimes involve the conversion of a program from one computer language to another to enable the program to run on a different platform. This type of data migration is usually done when a new system is adopted.