Curves in Open Data

Last week I talked about data formats and we continued it on Twitter.

No curves. It’s a good point. GeoJSON and TopoJSON don’t support curves. But neither does Shapefiles. All three formats are meant to handle simple features. Points, polygons and line. Whereas TopoJSON handles topology, it still can’t draw true curves. But what’s the implication here? To share data that requires curves (it’s an edge case but still an important one) you have to use a proprietary format? Enter WKT. Well-known text supports much more vector types than the previous including curves. Following up on sharing data in common file formats, WKT fits the bill perfectly. Share your data as GeoJSON/TopoJSON, KML and Shapefile if needed, then use WKT for complex features. Still open completely and it is well supported with most open and proprietary software packages.

Sometimes you need to use curves and generally it does work out.

3D Underground

There are plenty of 3D globes for desktop and for web that support above ground objects (mostly buildings) on the globe but there are few that support features underground (such as wells). The only one that really has good support is Esri’s CityEngine. You can render scenes such as this in the browser.

Now the problem is that this all requires CityEngine which is neither inexpensive nor easy to use. I’ve got a great database of wells with GeoJSON attributes that I’d love to map on a 3D browser view but most of the efforts so far have been put into 2.5D solutions. Most of my current project work is 3D but underground which means that I can’t view on Google Earth or other web solutions.

I get all excite to map wells and then disaster strikes.

Cuban Baseball Stadiums

You may or may not have seen, but there is a Cuban/Tampa Bay Rays game going on today. Given the love of baseball between the two countries I’m sure we’ll see much more Cuban baseball over the next couple years. It just so happens that the GeoJSON-Ballparks project has all the professional baseball stadiums in Cuba already mapped in GeoJSON format, including Estadio Latinoamericano where the game is being played today. Enjoy!

Cloud Agnostic

Last week Apple announced it was moving some of its iCloud storage to Google’s cloud from Amazon. Earlier this month Dropbox moved from AWS to their own cloud hardware. The motives for both these moves are to get off of someone else’s cloud and and on to their own1. But it does bring up a good point for all hosted development, try and be as cloud agnostic as possible because you never know what you might have to do.

Early on in WeoGeo’s life, we had a product called WeoCEO. You won’t find too much about it but it was WeoGeo’s way of managing S3, EC2 instances and balancing traffic. WeoGeo created this because Amazon’s EC2 tools weren’t robust enough to handle a business yet and WeoGeo didn’t know how AWS might turn out. The point of WeoCEO was it could be staged in any CentOS environment and handle the WeoGeo database, geoprocessing and websites. In theory that would have allowed WeoGeo to easily move off of AWS and on to Azure, Google or Rackspace. WeoGeo abandoned WeoCEO and started using AWS’s native tools because it made it easier to work with new technology such as Cloudfront, new storage solutions and database options. While there was zero chance WeoGeo would move off of AWS, having such a tool could have made it easier for the WeoGeo platform to be integrated into other technology.

All this got me thinking about my current hosting GIS systems. I’ve got some on AWS and some on Azure. Could I move from one provider to the other and how much work would it be? Most of my stack isn’t proprietary to one or the other system2. Node.js, PostgreSQL and other open technology runs really well on just about any hosted system out there. But there is somewhat proprietary cloud technology out there you can lock yourself into.

I don’t think that everyone needs to develop their own WeoCEO type management system3 but making pragmatic choices with how to deploy cloud applications can pay itself back in spades. I have clients who want their applications on AWS or Azure and I can’t deploy the same application there with little effort, but keeping it that way requires planning and a will to be cloud agnostic. I’ve always liked the term and I’ve always tried to prototype and develop applications that aren’t locked into to infrastructure. I’d love to keep it that way and you should too.

  1. In Dropbox’s case, they have already done so 

  2. Even my Azure stuff is all PostGIS and Node.js 

  3. Clearly even we didn’t stick with it 

Data Formats and the Datastore

Yesterday’s post generated some email, mostly in agreement, but I wanted to highlight one question.

Finding data for me is only half the problem, it’s formats that come and go. I’m archiving data in formats that I have no idea if they’ll be supported or not. What’s the point of indexing if you can’t view the file?

That’s a big issue of course. I mean think about it this way, what if I saved out my site plan in Manifold GIS format 5 years ago and wanted to open it today? Either I find a copy of Manifold or I don’t use the file. The solution isn’t as easy as you might think unfortunately. Software such as Safe Software FME can rescue many file formats but it’s a risk that you run hoping that Safe will add your format. One thing I try and do is save data as common format types. While I might use SDE and FGDB in production, I make an effort to save these layers off as Shapefiles and TIFF. We termed this over beers a couple years ago as “pragmatic file formats”. SHP, KML, TIFF, JPG, GeoJSON were all mentioned as ones that we thought were widely supported1. At WeoGeo, while we could support the 250+ file formats that FME supports, we left it at about 10 of the most requested formats.

But that brings up one thing we pushed with the WeoGeo Library2. You could load a WeoGeo supported file format, even a “pragmatic file format” type and because we used FME on the backend, know that it would be usable in the future. That’s a true “Library” environment, one where you can not only find what you are looking for, but know that it will be readable.

GIS by its very nature is file format verbose3 and we have to deal with this more than other professions. My recommendation today as it has been for years is try and do the following:

  1. Save in common file formats over niche proprietary ones
  2. Safe FME pays for itself
  3. Index your data. If you know what you have, you know what you need to support4

Simple enough, right? Don’t get turned upside down with your file formats.

  1. though you could argue GeoJSON back then wasn’t exactly supported well 

  2. now WeoGeo is Trimble Data 

  3. right now I can look at this MXD in front of me and see with 8 layers, I have 6 format types 

  4. that is why I have a copy of Corel Draw 

Ninety Percent of the World’s Data Has Been Generated Over the Last Two Years

Tucked into a company blog post about Smart Cities was this statement that caught my eye.

Ninety percent of the world’s data has been generated over the last two years.

Unlike the “80% of Data is Spatial” I have to admit this is totally believable and I can find the source. Most of this data is pure junk but the biggest problem with it is that it is literally unsearchable. Even in the age of Google, we can’t even begin to start aggregating this data and sorting through it.

On the BLM GPM projected that I was part of at AECOM/URS, we teamed with Voyager to attempt to find all their spatial data and share it. The good news is that I hear the BLM Navigator will be rolling out soon so at least we can know that the BLM is indexing their data and attempting to share it. But that is one organization out of billions.

This unaccounted for data is unable to be leveraged by users and becomes wasted. We all know GIS is great for making informed decisions about just about anything, yet we are most likely uninformed ourselves because the data just doesn’t happen to be at our fingertips. We’re a society that loves to create data, but not one that likes to organize data. If we’re truly going to change the world with GIS, we need to make sure we have all the information available to do so. Smart Cities, GeoDesign and all the rest are big data use cases. Let’s figure out how to start pumping them full of it.