SpatialTau v1.2 - Tilting at the Shapefile

SpatialTau is my weekly newsletter that goes out every Wednesday. The archive shows up in my blog a month after the newsletter is published. If you’d like to subscribe, please do so here.


Tilting at the Shapefile

Now I’m sure if I went back to my blog and searched for how many times I’ve tried to kill off the shapefile even I would be surprised at how many times I’ve blogged about it.  Thus it seems about for the second newsletter I’ve ever written to focus on the Shapefile Problem”

The Problem

So what exactly is this problem?  I mean what is so bad about a well supported, somewhat open file format?  I’ve told this story before but it never hurts to repeat.  My dad was borrowing my laptop a couple years ago and commented about all these DBF files all over my desktop.  He wondered why on earth would I have a format that he used in the late 80’s and outgrew because of it’s limitations.  Well I proceeded to explain to him the shapefile and how it worked and he just laughed.  That’s right, my 72 year old dad laughs at us wankers and our shapefile.  The DBF is only half the problem with the shapefile.  It doesn’t understand topology, only handles simple features (ever try and draw a curve in a shapefile?), puny 2GB file size limitation and not to mention you can’t combine points, polygons and lines in one file (hence every shapefile name has the word point, line or poly in it).

Oh and it’s anywhere between 3 and 15ish file types/extensions.  Sure 3 are required but the rest just clutter up your folders.  I love the .shp.xml one especially because clearly they thought so much about how to render metadata.  If I had a penny for every time someone emailed me just the .shp file without the other two I’d be a rich man.  Heck just the other day I got the .shp and .dbf but not the *.shx.  Just typing the sentence makes me cringe.

The Contenders

  1. The File Geodatabase (FGDB):  Esri’s default format for their tools.  It’a spatial database in a folder format.  The less mentioned about the Personal Geodatabase, the better.  But unlike most companies in the past 5 years, it isn’t built on SQLite, but Esri proprietary geodatabase format.  There isn’t anything inherently wrong with Esri taking this path but it means you’re stuck using their software or their APIs to access the file format.  To me this severely limits the FGDB to me an interchange file format and I think that is perfectly fine with Esri as they don’t really care too much if the FGDB doesn’t work with other’s software.  I’d link to an Esri page that describes the FGDB but there isn’t one. It’s a secret proprietary format that even Esri doesn’t want to tell you about.
  2. SpatiaLite: SpatiaLite has everything going for it.  It’s a spatial extension to SQLite which means at its core it’s open.  It’s OGC Simple Features compliant.  It is relatively well supported by GIS software (even Esri technically can support it with the help of Safe Software).  Plus it supports all those complex features that the shapefile can’t.  Heck OGC even chose it as the reference implementation for the GeoPackage (assuming people still care about that).  Heck supports rasters too!  But honestly, SpatiaLite was released in 2008 and hasn’t really made a dent into the market.  I can’t ever remember downloading or being sent a SpatiaLite file.  I’m guessing you can’t either.  I mean we all want a format that is similar to PostGIS and easily transferable (one file).  On paper that’s SpatiaLite.  But I think we have to chalk this up as Esri not supporting the format and it is relegated to niche use.
  3. GML/KMLRon Lake probably loves I grouped these together but honestly they’re so similar in basic structure I’ve really just left them together.  My company uses KML quite a bit to share georeferenced photos.  That’s about it, pretty low use.  There is a ton of KML out there but it is mostly points.  There might be a ton of GML out there but I’m not Ron Lake.  KML is nice in the sense it has visualization included in the spec (you can make a line yellow) but it isn’t enough to get excited about.  It’s an OGC standard but as with SpatiaLite that doesn’t really seem to matter in the real world.  Don’t even try and use a different projection.  They have their use in specific cases but the limits of the formats means you’ll never see it being an interchange format.  Plus XML?  Oh and feel free to email me how GML is powerful because it supports OGC Simple features, I’ll still include it with KML.
  4. GeoJSON: It’s an open standard, so open in fact that OGC isn’t involved.  That’s a huge plus because mostly standards organizations do is make complex file formats for simple uses.  That’s not what GeoJSON is.  It can be many types of projections, it can be points, polygons and lines (with variations of many), it supports topology with the TopoJSON format and it’s JSON so it’s human readable.  But alas it isn’t supported by Esri so we run into the same problem as SpatiaLite.  BUT, Esri has shown interest in GeoJSON so there is hope that it will be well supported soon.  As with the shapefile/KML and unlike SpatiaLite it won’t support curves and other complex geometry or rasters and never will.  Thus it is not well suited as a shapefile replacement.
  5. Well Known Text (WKT): This comes out of the OGC and is used by software such as PostGIS for storage.  WKT supports lots of geometric objects (curves!) and TINs.  I’ve never been limited by WKT for vector files (you can almost feel where the end of this is going though) and many spatial databases from PostGIS and Oracle to SpatiaLite and SQL Server use the WKB (Well Known Binary) equivalent to store information.  But alas, we still don’t support rasters.  It’s a vector format for vector data.  SpatiaLite and the File Geodatabase both support rasters.

There are many other formats but I think these are the only ones that really have any traction.  I could list formats such as GeoTIFF and say you could use that for rasters but you are limited to 4GB of data.  The vector guy in me wants to just say the heck with it all and use GeoJSON and WKT to solve the problem but given I’m still writing about this subject in December 2014 neither is a good solution.  We’re left with one simple truth…

The Verdict

The shapefile will outlive us all.  Unless Esri stops supporting it with their software at the same time as QGIS, Autodesk, etc it will continue to be the format that everyone uses.  In 2014 I’d wager 80% of all production geospatial data (I’m sandbagging here, probably this number is 95%) is stuck in the shapefile format where it resides comfortably.  Personally I’m a big fan of GeoJSON but I’ve started to get back into WKT lately and love the complex geometry support. If there is one thing I’ve learned in the past 20 years of professional GIS I’ve done, the shapefile is king.

December 3, 2014 file geodatabase geojson GML kml shapefile spatialite Thoughts wkt






SpatialTau v1.1 - Why a Newsletter?

SpatialTau is my weekly newsletter that goes out every Wednesday. Archive shows up in my blog a month after the newsletter is published. If you’d like to subscribe, please do so here.

Why a Newsletter?

Earlier this month I turned off Planet Geospatial. It had been in operation for almost 10 years but honestly it peaked about 4 years ago and has been in a very slow decline. Blogs, while still critically important to our communicating with others, have taken a back seat to Twitter, Facebook and Tumblr. Heck, even I have made my blog dormant and moved my posting to Tumblr.

But Tumblr has taught me one thing, a need for longer form writing. Tumblr, much like Twitter and Facebook is really meant for short quick thoughts that you want to get out fast. I originally thought I could move back to my old blog but the whole format seems limiting for me. Clearly what I really need is a format where I can write and get a bit deeper into my thoughts. Oddly enough the format I kept coming back to was a weekly newsletter. It’s a more relaxed format where I can take time to formulate my thoughts on a subject or subjects without that need to hit the publish button on a blog post.

So this is SpatialTau, my weekly Spatial IT newsletter. It goes out every Wednesday and will be more in line with my older blog posts where I had more time to write and share my thoughts. I hope you enjoy it and share them with friends and colleagues.

What do you do?”

Remember this question? I used to get it all the time and it was so hard to explain. I’d go into maps, databases and then the Internet. People sort of nod and seem to agree they understand just so you’ll stop talking about intersecting polygons and buffering the result. Then when Google Earth exploded on the scene, I’d used to just always say, You know, like Google Earth…” and the other person would get all excited and say they looked up their hometown and saw their elementary school and how awesome it was that Google could find it.

My fiancée’s mother asked me last week what I did. I started to go in with #opendata, #opengovernment (explaining hashtags along the way of course) and visualization. Unlike that Google Earth moment, lots of what we do is still very difficult for most people to really get their heads around. Sure they understand what it means to share data and make it open, but the process is still so difficult. I mean geospatial data is still locked up in that crazy File Geodatabase format which my fiancée’s mother would never begin to grasp. I was lucky enough to have some data I was working with in Google Drive so I showed her a spreadsheet view of it and she sort of got the idea. But going through the workflow of how I got it there is very foreign to everyone.

I’m not pretending to say that spatial is special again, just that I think we’ve let the technology get ahead of the story. Even sharing a great blog post by the Sunlight Foundation about government data still gets that look that we used to get explaining an intersection of polygons. What really gets me is when you back into what we do from an open government perspective, allowing them to grasp the point of data being free and open, they start getting excited. But the tools we use are still very niche, very technical and very difficult to share. Rather than sharing how we do something, we need to be sharing why we do something. It’s the why that get’s peoples attention. It’s the reason why we do what we do that interests people. Then you can gage how much what” they can take and decide if sharing the OpenLayers vs Leaflet.js debate is worth it. It’s hard for technologists to break things down” because the excitement they feel is the touch and feel of the how we accomplish things. But the why is really the sexy part of our jobs.

I really think we’re so lucky that Spatial IT has moved from the backrooms of GIS and into the front and center of the open data and open government movement. But we can’t lose sight that the world could care less about that great NPM module you wrote to massage spatial data. My soon to be mother-in-law gets the picture now and understands why what we do is so important. One person at a time.

If you were lucky enough to be forwarded this newsletter from a friend, you can sign up here and get it delivered weekly to your inbox.

November 26, 2014 newsletter Thoughts






SpatialTau v1.1 - Why a Newsletter?

SpatialTau is my weekly newsletter that goes out every Wednesday. Archive shows up in my blog a month after the newsletter is published. If you’d like to subscribe, please do so here.

Why a Newsletter?

Earlier this month I turned off Planet Geospatial. It had been in operation for almost 10 years but honestly it peaked about 4 years ago and has been in a very slow decline. Blogs, while still critically important to our communicating with others, have taken a back seat to Twitter, Facebook and Tumblr. Heck, even I have made my blog dormant and moved my posting to Tumblr.

But Tumblr has taught me one thing, a need for longer form writing. Tumblr, much like Twitter and Facebook is really meant for short quick thoughts that you want to get out fast. I originally thought I could move back to my old blog but the whole format seems limiting for me. Clearly what I really need is a format where I can write and get a bit deeper into my thoughts. Oddly enough the format I kept coming back to was a weekly newsletter. It’s a more relaxed format where I can take time to formulate my thoughts on a subject or subjects without that need to hit the publish button on a blog post.

So this is SpatialTau, my weekly Spatial IT newsletter. It goes out every Wednesday and will be more in line with my older blog posts where I had more time to write and share my thoughts. I hope you enjoy it and share them with friends and colleagues.

What do you do?”

Remember this question? I used to get it all the time and it was so hard to explain. I’d go into maps, databases and then the Internet. People sort of nod and seem to agree they understand just so you’ll stop talking about intersecting polygons and buffering the result. Then when Google Earth exploded on the scene, I’d used to just always say, You know, like Google Earth…” and the other person would get all excited and say they looked up their hometown and saw their elementary school and how awesome it was that Google could find it.

My fiancée’s mother asked me last week what I did. I started to go in with #opendata, #opengovernment (explaining hashtags along the way of course) and visualization. Unlike that Google Earth moment, lots of what we do is still very difficult for most people to really get their heads around. Sure they understand what it means to share data and make it open, but the process is still so difficult. I mean geospatial data is still locked up in that crazy File Geodatabase format which my fiancée’s mother would never begin to grasp. I was lucky enough to have some data I was working with in Google Drive so I showed her a spreadsheet view of it and she sort of got the idea. But going through the workflow of how I got it there is very foreign to everyone.

I’m not pretending to say that spatial is special again, just that I think we’ve let the technology get ahead of the story. Even sharing a great blog post by the Sunlight Foundation about government data still gets that look that we used to get explaining an intersection of polygons. What really gets me is when you back into what we do from an open government perspective, allowing them to grasp the point of data being free and open, they start getting excited. But the tools we use are still very niche, very technical and very difficult to share. Rather than sharing how we do something, we need to be sharing why we do something. It’s the why that get’s peoples attention. It’s the reason why we do what we do that interests people. Then you can gage how much what” they can take and decide if sharing the OpenLayers vs Leaflet.js debate is worth it. It’s hard for technologists to break things down” because the excitement they feel is the touch and feel of the how we accomplish things. But the why is really the sexy part of our jobs.

I really think we’re so lucky that Spatial IT has moved from the backrooms of GIS and into the front and center of the open data and open government movement. But we can’t lose sight that the world could care less about that great NPM module you wrote to massage spatial data. My soon to be mother-in-law gets the picture now and understands why what we do is so important. One person at a time.

If you were lucky enough to be forwarded this newsletter from a friend, you can sign up here and get it delivered weekly to your inbox.

November 26, 2014 newsletter Thoughts






World Champs Again

Of course, it is an even year. The Giants win again. See you in 2016!

October 30, 2014 sf giants Thoughts






World Champs Again

Of course, it is an even year. The Giants win again. See you in 2016!

October 30, 2014 sf giants Thoughts






San Francisco Giants to World Series - Again…

It’s an even year so that means one thing, the San Francisco Giants win the World Series. Terrible we have to wait until Tuesday for Game 1 but it will be here soon.

October 17, 2014 sf giants Thoughts