GIS Data Formats and My Stubborn Opinons

Taking this break I’ve been looking over my spatial data and trying to figure out how to best organize it. The largest public project I manage is the GeoJSON Ballparks and this one is easy to manage as it is just a Git repository with text files. GeoJSON makes sense here because it is a very simple dataset (x/y) and it has been used for mapping projects mostly which makes the GeoJSON format perfect. I used to maintain a Shapefile version of it in that repository but nobody ever downloaded it so I just killed it eventually.

But my other data projects, things I’ve mapped or worked on the past are in a couple of formats:


  • Shapefile
  • File Geodatabase
  • Personal Geodatabase
  • GeoJSON
  • KML
  • SpatiaLite


  • TIFF (mostly GeoTIFF)
  • Esri Grid

Now you can tell from some of these formats, I haven’t touched these datasets in a long time. Being Mac centric, the Personal Geodatabase is dead to me and given the modification dates on that stuff is 2005-2007 I doubt I’ll need it anytime soon. But it does bring of the question of archival, clearly PGDB isn’t the best format for this and I probably should convert it soon to some other format. Bill Dollins would tell me GeoPackage would be the best as Shapefile would cause me to lose data given limits of DBF, but I’m not a big fan of the format mostly because I’ve never needed to use it. Moving the data to GeoJSON would be good because who doesn’t like text formats, but GeoJSON doesn’t handle curves and while it might be fine for the Personal Geodatabase data, it doesn’t make a ton of sense for more complex data.

This is as close to a shapefile icon as I could find, tells you everything doesn’t it?

I’ve thought about WKT as an archival format (specifically WKB) which might make sense for me given the great WKT/WKB support in databases. But again, could I be just making my life harder than it needs to be just to not use the GeoPackage? But there is something about WKT/WKB that makes me comfortable for storing data for a long time given the long term support of the standard among so many of those databases. The practical method might be everything in GeoJSON except curves and those can get into WKT/WKB.

Raster is much easier given most of that data is in two fairly open formats. GeoTIFF or TIFF probably will be around longer than you or I and Esri grid formats have been well support through the years making both fairly safe. What are some limits to data formats that I do worry about?

  1. File size, do they have limits to how large they can be (e.g. TIFF and 32-bit limit)
  2. File structure, do they have limits to what can be stored (e.g. GeoJSON and curves)
  3. File format issues (e.g. everything about the Shapefile and dbf)
  4. OS centric formats (PGDB working only on Windows)

I think the two biggest fears of mine are the last two, because the first to can be mitigated fairly easily. My plan is the following; convert all vector data into GeoJSON, except where curves are required, I’m punting curves right now because I only have 3 datasets that require them and I’ll leave them in their native formats for now. The raster data is fine, TIFF and grid is perfect and I won’t be touching them at all. The other thing I’m doing is documenting the projects and data so that future James (or whomever gets this hard drive eventually) knows what the data is and how it was used. So little of what I have has any documentation, at least I’m lucky enough the file names make sense and the PDFs help me understand what the layers are used for.

One thing I’ve ignored through this, what to do with those MXDs that I cannot open at all? While I do have PDF versions of those MXDs, I have no tool to open them on Mac and even if I could, the pathing is probably a mess anyway. It bring up the point that the hardest thing to archive is cartography, especially if it is locked in a binary file like an MXD. At least in that case, it isn’t too hard to find someone with a license of ArcMap to help me out. But boy, it would be nice to have a good cartography archival format that isn’t some CSS thing.

Italian Baseball Stadiums in GeoJSON

I’ve been working at cleaning up all the GeoJSON-Ballparks records this past month. While the MLB stadiums and many of the AAA minor league teams have been updated, the international and small market teams have not. Some were out of data by almost 5 years. Long tail baseball stadiums are what they are and I’m working on automating much of this moving forward. The last two leagues that I’m updating are the Italian Baseball League and the German Bundesliga League. I hope to finish Germany tonight but I did post Italy yesterday.

Ballparks of the Italian Baseball League in GeoJSON

While Italy can’t go out and enjoy baseball just yet, at least their top tier baseball league has been mapped. If you’re looking for some live baseball, check out streaming Korean KBO League league (I’ve been watching the Giants of course). The next live stream will be on March 25th at approximately 7:40pm PDT.

Spring Cleaning During Spring Training

GeoJSON-Ballparks is my favorite data project I’ve been part of. Probably because not only is it the best sport ever, but it is great keeping track of all the changes at ballparks through the years. MLB teams have mostly stopped building new ball parks so the changes are generally just updates to their names. This year the only new name was Truist Park. Oakland Coliseum reverted back from RingCentral which it never was able to become because of shenanigans. We do bring on a new ballpark in Arlington which is named almost the same as the old ballpark (Globe Life Field vs the old Globe Life Park in Arlington). Apparently the old stadium has been renovated to XFL standards so we should probably not call it a ballpark anymore. I just removed the old one since it is no longer a baseball stadium. I did the same thing with Turner Field.

I plan to review all the Spring Training Facilities of the Cactus League and the Grapefruit League and then review the AAA stadiums. We’ll have to see what happens with the MLB/MiLB negotiations. While it doesn’t affect the actual stadium points (at least in the short term, some of the fields could go away because of lack of support), the alignment of teams in leagues could be changed. So stay tuned and if you want to help out with the AAA stadiums, just create a pull request, would be greatly appreciated!

The iPhone U1 UWB Chip, Digital Twins and Data Collection

Oddly enough the biggest news this week from the iPhone 11 introduction by Apple barely got any play. In fact, on the iPhone 11 Pro website, you have to scroll past Dog Portrait mode to get any information about it. Apple describes the U1 chip thusly:

The new Apple‑designed U1 chip uses Ultra Wideband technology for spatial awareness — allowing iPhone 11 Pro to understand its precise location relative to other nearby U1‑equipped Apple devices.4 It’s like adding another sense to iPhone, and it’s going to lead to amazing new capabilities. With U1 and iOS 13, you can point your iPhone toward someone else’s, and AirDrop will prioritize that device so you can share files faster.4 And that’s just the beginning.

Makes sense right? A better way to AirDrop. But there is so much more there, “precise location relative to other nearby equipped Apple Devices“. But what is UWB and why does it matter? The UWB Alliance says:

UWB is a unique radio technology that can use extremely low energy levels for short-range, high-bandwidth communications over a large portion of the radio spectrum. Devices powered by a coin cell can operate for a period of years without recharge or replacement. UWB technology enables a broad range of applications, from real-time locating and tracking, to sensing and radar, to secure wireless access, and short message communication. The flexibility, precision and low-power characteristics of UWB give it a unique set of capabilities unlike any other wireless technology.

So that’s really interesting, low energy use, high bandwidth and is very secure. I thought Jason Snell did a great job looking into the U1 on Six Colors:

From raw data alone, UWB devices can detect locations within 10 centimeters (4 inches), but depending on implementation that accuracy can be lowered to as much as 5 millimeters, according to Mickael Viot, VP of marketing at UWB chipmaker Decawave.

That’s pretty amazing. Basically it takes what makes Bluetooth LE great for discover, secures it and then makes it faster and more accurate. So we can see the consumer use cases for UWB, sharing files and finding those tiles we’ve heard so much about. But where this gets very interesting for our space is for data collection and working inside digital twins. You can already see the augmented reality use case here. A sensor has gone bad in a building, I can find it now with millimeter accuracy. But it’s not just what direction it’s how far. UWB uses “time of flight” to pinpoint location (measuring the time of signal to gauge distance), enabling it to know how far away it is. Just knowing a sensor is ahead of you is one thing, but knowing it is 20 feet away, that’s really a game changer.

You can see this through a little known app Apple makes called Indoor Survey. Small side note, back in late 2015 I blogged about Apple’s Indoor Positioning App which ties into all this. Where you really see this use is when you go to the signup page see how data is brought into this app using a standard called Indoor Mapping Data Format. Indoor Mapping Data Format (IMDF) provides a generalized, yet comprehensive data model for any indoor location, creating a basis for orientation, navigation and discovery. IMDF is output as an archive of GeoJSON files.  Going to the IMDF Sandbox really shows you what this format is about.

Apple’s IMDF Sandbox

Basically you see a map editor that allows you to really get into how interiors are mapped and used. So Apple iPhone 11 UWB devices can help place themselves more accurately on maps and route users around building interiors. Smart buildings get smarter by the devices talking to each other. Oh and IMDF, Apple says, “For GIS and BIM specialists, there is support for IMDF in many of your favorite tools.“. I will need to spend a bit more time with IMDF but its basically GeoJSON objects so we already know how to use it.

The thing about GPS data collection is it works great outdoors, but inside it is much harder to get accuracy, especially when you need it. With Indoor Survey, devices can collect data much more accurately indoors because they know exactly where they are. If you’ve ever used Apple Maps in an airport and seen how it routes you from gate to gate, you get an idea how this works. But with UWB, you go from foot accuracy to sub centimeter. That’s a big difference.

Now we’re a long way away from UWB being ubiquitous like Bluetooth LE is. Right now as far as I can tell, only Apple has UWB chips in their devices and we don’t know how compatible this all is yet. But you can see how the roadmap is laid out here. UWB, GeoJSON and an iPhone 11. Devices help each other get better location and in turn make working with Digital Twins and data collection so much easier.

Curves in Open Data

Last week I talked about data formats and we continued it on Twitter.

No curves. It’s a good point. GeoJSON and TopoJSON don’t support curves. But neither does Shapefiles. All three formats are meant to handle simple features. Points, polygons and line. Whereas TopoJSON handles topology, it still can’t draw true curves. But what’s the implication here? To share data that requires curves (it’s an edge case but still an important one) you have to use a proprietary format? Enter WKT. Well-known text supports much more vector types than the previous including curves. Following up on sharing data in common file formats, WKT fits the bill perfectly. Share your data as GeoJSON/TopoJSON, KML and Shapefile if needed, then use WKT for complex features. Still open completely and it is well supported with most open and proprietary software packages.

Sometimes you need to use curves and generally it does work out.

Cuban Baseball Stadiums

You may or may not have seen, but there is a Cuban/Tampa Bay Rays game going on today. Given the love of baseball between the two countries I’m sure we’ll see much more Cuban baseball over the next couple years. It just so happens that the GeoJSON-Ballparks project has all the professional baseball stadiums in Cuba already mapped in GeoJSON format, including Estadio Latinoamericano where the game is being played today. Enjoy!

Rendering Spatial Data Without Having to Generalize Beforehand

Yesterday I posted about Chris Hogan’s walk-through of generalizing data in PostGIS to make it usable in a web app.  Basically he went through the process of finding out what is the sweet spot of quality vs speed.  But there are other ways to accomplish this.  Mapbox happened to post about a new library called geojson-vt.

Let’s see if Mapbox GL JS can handle loading a 106 MB GeoJSON dataset of US ZIP code areas with 33,000+ features shaped by 5.4+ million points directly in the browser (without server support):

[vimeo 137819760 w=500 h=306]

Mapbox GL JS and GeoJSON-VT from Mapbox on Vimeo.

Wait, what?! A few seconds loading the data, and you can browse the whole data set smoothly and seamlessly. But how exactly does that work? Let’s find out

So that’s actually pretty amazing.  We all know what GeoJSON does in the browser and how it impacts the speed of maps drawing.  100 MB+ data rendering so quickly?  Impressive.  Read the whole post to see how they do it and the details on how to start using it.  The only limitation is that it requires mapbox-gl-js or
Mapbox Mobile[footnote]which is actually a big limitation if you think about it[/footnote].
 UPDATE: Per Tom MacWright:


Still this comes down to using tools that make your mapping products better.  Maybe Mapbox does that cheaper and quicker than you could on your own.  This kind of on-the-fly simplification is what we’ve all been asking for and Mapbox is really pushing the envelope.  This could be what gets people to start using their platform.

SpatialTau v1.2 – Tilting at the Shapefile

SpatialTau is my weekly newsletter that goes out every Wednesday. The archive shows up in my blog a month after the newsletter is published. If you’d like to subscribe, please do so here.

Tilting at the Shapefile

Now I’m sure if I went back to my blog and searched for how many times I’ve tried to kill off the shapefile even I would be surprised at how many times I’ve blogged about it.  Thus it seems about for the second newsletter I’ve ever written to focus on the “Shapefile Problem”

The Problem

So what exactly is this problem?  I mean what is so bad about a well supported, somewhat open file format?  I’ve told this story before but it never hurts to repeat.  My dad was borrowing my laptop a couple years ago and commented about all these DBF files all over my desktop.  He wondered why on earth would I have a format that he used in the late 80’s and outgrew because of it’s limitations.  Well I proceeded to explain to him the shapefile and how it worked and he just laughed.  That’s right, my 72 year old dad laughs at us wankers and our shapefile.  The DBF is only half the problem with the shapefile.  It doesn’t understand topology, only handles simple features (ever try and draw a curve in a shapefile?), puny 2GB file size limitation and not to mention you can’t combine points, polygons and lines in one file (hence every shapefile name has the word point, line or poly in it).

Oh and it’s anywhere between 3 and 15ish file types/extensions.  Sure 3 are required but the rest just clutter up your folders.  I love the *.shp.xml one especially because clearly they thought so much about how to render metadata.  If I had a penny for every time someone emailed me just the *.shp file without the other two I’d be a rich man.  Heck just the other day I got the *.shp and *.dbf but not the *.shx.  Just typing the sentence makes me cringe.

The Contenders

  1. The File Geodatabase (FGDB):  Esri’s default format for their tools.  It’a spatial database in a folder format.  The less mentioned about the Personal Geodatabase, the better.  But unlike most companies in the past 5 years, it isn’t built on SQLite, but Esri proprietary geodatabase format.  There isn’t anything inherently wrong with Esri taking this path but it means you’re stuck using their software or their APIs to access the file format.  To me this severely limits the FGDB to me an interchange file format and I think that is perfectly fine with Esri as they don’t really care too much if the FGDB doesn’t work with other’s software.  I’d link to an Esri page that describes the FGDB but there isn’t one. It’s a secret proprietary format that even Esri doesn’t want to tell you about.
  2. SpatiaLite: SpatiaLite has everything going for it.  It’s a spatial extension to SQLite which means at its core it’s open.  It’s OGC Simple Features compliant.  It is relatively well supported by GIS software (even Esri technically can support it with the help of Safe Software).  Plus it supports all those complex features that the shapefile can’t.  Heck OGC even chose it as the reference implementation for the GeoPackage (assuming people still care about that).  Heck supports rasters too!  But honestly, SpatiaLite was released in 2008 and hasn’t really made a dent into the market.  I can’t ever remember downloading or being sent a SpatiaLite file.  I’m guessing you can’t either.  I mean we all want a format that is similar to PostGIS and easily transferable (one file).  On paper that’s SpatiaLite.  But I think we have to chalk this up as Esri not supporting the format and it is relegated to niche use.
  3. GML/KMLRon Lake probably loves I grouped these together but honestly they’re so similar in basic structure I’ve really just left them together.  My company uses KML quite a bit to share georeferenced photos.  That’s about it, pretty low use.  There is a ton of KML out there but it is mostly points.  There might be a ton of GML out there but I’m not Ron Lake.  KML is nice in the sense it has visualization included in the spec (you can make a line yellow) but it isn’t enough to get excited about.  It’s an OGC standard but as with SpatiaLite that doesn’t really seem to matter in the real world.  Don’t even try and use a different projection.  They have their use in specific cases but the limits of the formats means you’ll never see it being an interchange format.  Plus XML?  Oh and feel free to email me how GML is powerful because it supports OGC Simple features, I’ll still include it with KML.
  4. GeoJSON: It’s an open standard, so open in fact that OGC isn’t involved.  That’s a huge plus because mostly standards organizations do is make complex file formats for simple uses.  That’s not what GeoJSON is.  It can be many types of projections, it can be points, polygons and lines (with variations of many), it supports topology with the TopoJSON format and it’s JSON so it’s human readable.  But alas it isn’t supported by Esri so we run into the same problem as SpatiaLite.  BUT, Esri has shown interest in GeoJSON so there is hope that it will be well supported soon.  As with the shapefile/KML and unlike SpatiaLite it won’t support curves and other complex geometry or rasters and never will.  Thus it is not well suited as a shapefile replacement.
  5. Well Known Text (WKT): This comes out of the OGC and is used by software such as PostGIS for storage.  WKT supports lots of geometric objects (curves!) and TINs.  I’ve never been limited by WKT for vector files (you can almost feel where the end of this is going though) and many spatial databases from PostGIS and Oracle to SpatiaLite and SQL Server use the WKB (Well Known Binary) equivalent to store information.  But alas, we still don’t support rasters.  It’s a vector format for vector data.  SpatiaLite and the File Geodatabase both support rasters.

There are many other formats but I think these are the only ones that really have any traction.  I could list formats such as GeoTIFF and say you could use that for rasters but you are limited to 4GB of data.  The vector guy in me wants to just say the heck with it all and use GeoJSON and WKT to solve the problem but given I’m still writing about this subject in December 2014 neither is a good solution.  We’re left with one simple truth…

The Verdict

The shapefile will outlive us all.  Unless Esri stops supporting it with their software at the same time as QGIS, Autodesk, etc it will continue to be the format that everyone uses.  In 2014 I’d wager 80% of all production geospatial data (I’m sandbagging here, probably this number is 95%) is stuck in the shapefile format where it resides comfortably.  Personally I’m a big fan of GeoJSON but I’ve started to get back into WKT lately and love the complex geometry support. If there is one thing I’ve learned in the past 20 years of “professional GIS” I’ve done, the shapefile is king.

Baseball Ballparks in GeoJSON

Update: Mexican League and Eastern League are done. That means we’ve got all the Majors, Triple-A and a third of the Double-A stadiums mapped.

I’ve been working on getting all the MLB and AAA baseball ballparks in GeoJSON on GitHub. MLB1 parks are done and I think I’ve got all the AAA parks thanks to Wikipedia but I’m still missing most of the Mexican League which is a AAA league. I’m also hoping to complete the AA and A ballparks as well. If you can help, just fork the repo and submit a pull request with the new ballpark.

  1. Ignoring the fact I had the Nationals still playing at RFK