One day, early in the millennium, it was my job to predict the future. Of course, it wasn't as fancy as that all may sound. I was a member of an oversight board planning a technical conference built around XML.
XML was the basis for agnostic data exchange, and the greatest thing since sliced bread at the time. But I guess if I could have seen then into what would become Amazon Web Services history, I could have seen how the future was really germinating.
Yes, we'd already changed the name of the conference to include web services -- an obvious evolutionary step for XML. As I gathered with others -- journalists, analysts and practicing technologists -- the question was a common one in tech circles: "What is the next big thing?" I'm sure we had some notion of what was to come, but I don't think we fully figured that one out.
Now, I've seen the future, and it's one where scale-out distributed architecture is the imperative and organizations are freed from reliance on monolithic and proprietary software. It was on display at AWS re:Invent 2016, which was a culmination of sorts of the Amazon Web Services history that began 10 years ago. The technologies highlighted at the conference -- including new data management and analytics offerings -- set the stage for AWS to continue to grow rapidly in the cloud, despite the protestations of Oracle's Larry Ellison and others.
Web services at your service
It all started with web services, and looking at the course things have taken is somewhat mind-boggling. While such web services originally had little impact on data architecture, that's changing. As big data pipelines get built out more broadly, data management and analytics is taking on the look of services architecture, at Amazon and elsewhere.
Amazon, the e-commerce company, grew up in the web services era, and its technical management worked to ensure that teams exposed data through services interfaces. While not a rah-rah member of the open source software community, it enthusiastically embraced the use of RESTful services over HTTP, which became a hallmark of open source style. There was no rest when it came to production: The company moved quickly, in a style that came to be known as DevOps.
So-called REST also turned out to be very compatible with cloud computing architecture on massively scaled commodity computers. When Amazon realized what a lead it had over incumbent technology vendors in highly distributed computing, it started to sell cloud computer time. It did so at a discount -- just as it had sold books, CDs and laundry detergent.
It has been adding up -- Amazon claims the AWS business is set to reap $13.8 billion in revenue this year. It has gotten there in increments, and many of those increments are beginning to involve data management.
The data side of the cloud
Amazon got its start in the data sphere with interesting elements like the NoSQL database DynamoDB and the Amazon Relational Database Service. But when it released its Redshift massively parallel cloud data warehouse in 2014, the level of technology was startling. The very idea of moving your data warehouse off-premises was disruptive to the status quo.
There was more to come. Redshift was followed by Aurora, a relational MySQL database. The company is in a constant state of development, and in recent months, Aurora updates have included support for stored procedures based on its Lambda "serverless" services, data ingestion from Amazon Simple Storage Service (S3) and console cluster views.
Werner VogelsCTO, Amazon
At the recent re:Invent conference, Amazon previewed an upcoming PostgreSQL-compatible version of Aurora. Aurora was also joined in the data lineup by Amazon Athena, which performs ad hoc SQL queries on data inside S3, and AWS Glue, an upcoming service for cataloging data and transforming data formats.
To hear Amazon CTO Werner Vogels tell it, the data management tsunami that the company has ridden as part of Amazon Web Services history to date was compelled by users' unhappiness with vendors, their need to move quickly and their need to learn to scale. These, as much as XML or web services, paved the way to the future -- a future that is data-driven.
Back up the truck
At re:Invent, Vogels emphasized the importance of cloud approaches to data going forward. He said cloud has democratized access to IT resources, but greater attention to data quality is in store.
"It's no longer the case that, because you were able to spend a lot of money on your data warehouse, you'd actually get a competitive advantage," Vogels said. "Everybody can have a data warehouse today."
As well, everyone will have similar access to the same level of processing power. That being the case, he continued, the quality of the data and how it is managed will be the differentiator among companies.
Amazon has been transferring data from willing IT shops to its cloud for a number of years. At re:Invent, it vowed to accelerate the process. In fact, an odd harbinger of that appeared at the conference -- it took the form of AWS Snowmobile, a 45-foot-long shipping container pulled by a semitrailer truck, all set to provide very large-scale data transfer.
Unlike the Amazon delivery vans we see dropping off packages everywhere these days, the truck picks up your data and takes it to an AWS facility, where it's moved to the Amazon cloud. Like a few other things noted here, this just wasn't something we saw coming back on that long-ago day when we peered into the foggy cloud of the web services future. Go figure!
Explore Amazon Redshift services
Look at IBM's dashDB cloud service
See a video on middleware and web services history