Data Formats and the Datastore

Yesterday’s post generated some email, mostly in agreement, but I wanted to highlight one question.

Finding data for me is only half the problem, it’s formats that come and go. I’m archiving data in formats that I have no idea if they’ll be supported or not. What’s the point of indexing if you can’t view the file?

That’s a big issue of course. I mean think about it this way, what if I saved out my site plan in Manifold GIS format 5 years ago and wanted to open it today? Either I find a copy of Manifold or I don’t use the file. The solution isn’t as easy as you might think unfortunately. Software such as Safe Software FME can rescue many file formats but it’s a risk that you run hoping that Safe will add your format. One thing I try and do is save data as common format types. While I might use SDE and FGDB in production, I make an effort to save these layers off as Shapefiles and TIFF. We termed this over beers a couple years ago as “pragmatic file formats”. SHP, KML, TIFF, JPG, GeoJSON were all mentioned as ones that we thought were widely supported1. At WeoGeo, while we could support the 250+ file formats that FME supports, we left it at about 10 of the most requested formats.

But that brings up one thing we pushed with the WeoGeo Library2. You could load a WeoGeo supported file format, even a “pragmatic file format” type and because we used FME on the backend, know that it would be usable in the future. That’s a true “Library” environment, one where you can not only find what you are looking for, but know that it will be readable.

GIS by its very nature is file format verbose3 and we have to deal with this more than other professions. My recommendation today as it has been for years is try and do the following:

  1. Save in common file formats over niche proprietary ones
  2. Safe FME pays for itself
  3. Index your data. If you know what you have, you know what you need to support4

Simple enough, right? Don’t get turned upside down with your file formats.

  1. though you could argue GeoJSON back then wasn’t exactly supported well 

  2. now WeoGeo is Trimble Data 

  3. right now I can look at this MXD in front of me and see with 8 layers, I have 6 format types 

  4. that is why I have a copy of Corel Draw