Cross-platform integration, data preparation process grows in cloud

While cloud computing may be convenient and more cost-effective for users, it can also lead to new challenges and requirements in regard to integrating and processing data.

The economics of cloud computing enable many organizations to invest in the IT initiatives and business applications that remained tantalizingly out of their reach when their only option was on-premises deployment. But cloud environments often also increase the number of systems being used and, in turn, the need for a cross-platform integration and data preparation process to pull together their data.

In essence, using the cloud transforms IT costs from capital expenditures on hardware and software to ongoing operational expenses. Cloud computing also streamlines cash flow and potentially lowers costs by enabling a company to pay only for the technology that it really needs and to expand its IT systems -- and budget -- only when necessary. In addition, the organization need not worry about its hardware becoming outdated, as the cloud platform provider can be tasked with continually upgrading the systems within its environment.

Organizations attracted by the promise of those benefits are using cloud computing technologies in a number of different ways, primarily centered on the following three use cases.

Using cloud services as a straightforward replacement for on-premises IT systems. In this scenario, the IT team retains responsibility for the end-to-end design, development, testing, implementation and management of cloud-based applications. This reduces technology acquisition outlays, while allowing IT to keep full control of the application platform.

Using software as a service (SaaS) applications, such as those from Salesforce. In addition to reducing capital equipment costs, the SaaS approach simplifies the implementation and management of the application software supporting key corporate functions such as sales, marketing, customer service, finance and human resources.

Using fully managed platform as a service (PaaS) environments. In PaaS setups, the cloud service provider also handles the design, deployment and administration of back-end processing and data management resources for its customers.

Data, data, everywhere in the cloud

Despite all of the benefits the cloud offers, though, there's a significant potential drawback: the proliferation of platforms, applications, tools and locations in which corporate data resides. While cloud systems provide increased convenience, lower costs and faster time to value for users, they also establish a pattern of data distribution that not only spans different systems, but also crosses organizational and administrative boundaries. Big data platforms, increasingly deployed in the cloud themselves, add a further twist -- and bigger challenges, given the amount of data they typically contain.

This data diffusion leads to a number of questions about managing and using data in the cloud. For starters, what kind of control do you have over the data models and metadata for the various data sets that are being managed in cloud-based systems? Going further, how can all of that data be accessed? And what are the synchronization requirements for enabling information in different data sets to be used in a coordinated way, no matter where it's located?

Such questions are particularly pertinent for business intelligence, reporting and analytics applications. Methods must be implemented to facilitate the data integration and data preparation process across different cloud platforms, applications and data stores, as well as in on-premises systems. Meanwhile, you must also provide a workable user interface for business analysts, data scientists and other BI and analytics users looking to find, prepare and analyze relevant sets of information.

A way across the data divide

In fact, that effectively defines one possible solution to the problem: software products that support cross-platform data integration and preparation. These tools, which also encompass things such as self-service data preparation software, provide connectors to mainstream relational database management systems and newer NoSQL databases. The tools can also link to Hadoop clusters and data lakes to access information in the Hadoop Distributed File System and the related data repositories.

In addition, these cross-platform tools can ingest unstructured text files and structured XML and JSON documents, plus streaming data from sources such as social networks, website clickstream logs and stock market data feeds. And, yes, they can connect to SaaS applications and cloud services to pull together data generated there, combine it with other information as needed and automate the data preparation process.

Cross-platform tools possess three other key attributes. , they're able to direct data to any selected platform, a big difference from traditional data integration tools, which pull data from source systems into a single staging . Second, they support easy access to data via end-user BI and data visualization tools, no matter where the required data resides. Third, cross-platform tools provide semantic cataloguing of available data sets; corresponding business metadata that provides details about data elements, definitions and structures; and associated business rules needed to enable data integration processes.

All of this indicates that cross-platform data integration and data preparation tools are more than just souped-up extract, transform and load software mapped to a mix of internal and external data sources, both on-premises and in the cloud. The emerging technologies blend a variety of features to provide a uniform way to access, prepare, query and visualize disparate data. Cloud environments with widely dispersed data sets may have met their data management match.

Next Steps

How data preparation processes are changing

More resources for managing data in the cloud

Are you ready for a data integration tool?

Dig Deeper on Enterprise data integration (EDI) software