maigi - Fotolia

Geospatial data is on the map for Hadoop, Spark

Software architect Mansour Raad is at the center of activity as geospatial data melds with Hadoop -- and soon, Spark.

As people accumulate big data and then look to do something useful with it, one of the first things they do is put it on a map. It's a simple first step, but it can be a challenging one. We asked Mansour Raad to shed some light on geographic data as it applies in the days of big data. Raad is a senior software architect at ESRI, and as such, he is at the center of a lot of activity.  He walks geospatial software veterans through the ins and outs of Hadoop, and he walks Hadoop users through the ins and outs of geospatial data.

Time was when geospatial data seemed to be about searching for oil or studying water use. Now, the systems are popping up everywhere. What has changed?

Mansour Raad, senior software architect at ESRIMansour Raad

Mansour Raad: Well, no pun intended, but geographic things are now on the map. Everything is becoming geographically relevant. What happens in Boston is different than what happens in Georgia, for example.

Location is very important, and people are connecting data from all over the place. If you are a retailer, people are coming to your website, and implicitly or explicitly, you know where they are and you find that there is a certain product that is relevant in that certain market. Making the association between that product and that location is starting to become very important.

How well do new technologies like Hadoop and NoSQL work in the domain of geography?

Raad: First of all, I am seeing something of a sunset for Hadoop because of Spark and because of different ways of storing data. Hadoop has a role, but some of the components are being phased out. In geographic data, there are other ways to store these things.  

The biggest problem we see is that traditional Hadoop and the key value stores, such as HBase, Cassandra and Accumulo, and all their friends, are relying on a sequential ordering of things -- that is, one-dimensional ordering. That makes them amazingly fast, but that creates a single-dimension style of search. The problem with geospatial data is that it is not about a single dimension. It is multidimensional. Things that are in space are not sequentially ordered. To solve that, people are introducing an abstraction layer. They take something that is multidimensional and they turn it into a single dimension. People discovered the underlying math for this a long time ago -- what's old is new again.

More important, perhaps, is the work people are doing to take geospatial processing algorithms, and to turn them from a serial way of programming to a distributed, parallel, shared-nothing architecture. That is very hard, honestly. That is because we have been trained to think differently. Taking existing algorithms and turning them into distributed algorithms takes work.

Can we expect big data and geospatial data technology to cross-pollinate in the years ahead? What is a geospatial data scientist's take on the "three Vs" of big data: volume, velocity and variety?

Raad: For me, big data is not about volume, variety and velocity as much as it is about meeting the needs you have to do your job. You move to new technology when your traditional means are causing you to fail at your job.

But the fact is that putting dots on a map gives you back relevance. We humans are visual people. With maps, you can visualize things that were formerly obscure. What really becomes essential, however, is use of geostatistics. That gives you the confidence you need to make comparisons between locations, and to make decisions based on trends you see. 

That happens in telecommunications now, for example, when you study 'dropped calls' [on a network] and have to decide to assign crews to update equipment based on what you see on the maps.

A challenging example would be something like the Tohoku earthquake a couple of years ago, where the tsunami came in and flumes of radiation were released. 

If you had to evacuate villages, what would be your confidence level in evacuating one or the other? The challenge people face is this: The window of opportunity you may have to respond to a scenario may be very small. Geostatistics applies math to the trending, to clusters of numbers, and that gives you confidence on how things are actually trending.

Next Steps

Find out how SAP is dealing with geospatial data

Government systems look at the lay of the land

Businesses survey geospatial data initiatives

Dig Deeper on Database management system (DBMS) software and technology