The GIS Database

I’ve been thinking about GIS data a bit lately, mostly because I’m cleaning off old hard drives I’ve had in my possession to try and consolidate my data (or not lose the data off of old hard drives). Typically GIS data was accessed one of two ways, either from a server through some endpoint or via a local file store. I can’t look at these old ArcGIS Desktop MXDs anymore but I recall most of the work we did was local file store. You know, sitting on the “P drive” and referenced via a file path. We all remember opening up projects and seeing those red exclamation points telling us that data was moved (or the project file was).

It is very easy in retrospect to go back and call yourself batshit crazy for storing data this way (back up hopefully every night on a DLT tape). I mean think about this for a minute, nothing was versioned. We live in this world of git where everything I do (including this blog) is stored in a database where I can track changes and revert if need be. Now I’m not using this post to talk about the need of GeoGig or whatever that project is called these days (I’m not even sure it still exists), but the realization that GIS over the years is such a workgroup discipline.

I worked for AECOM, the largest AEC in the world. We did some amazing enterprise projects but GIS was never one of them. It was a small group of GIS “pros”, “doing” GIS to support some enterprise project that changed the world. Tacked on if you will, and it’s not just AECOM that worked that way. Every organization views GIS this way, like “graphics”. Why is this? Because GIS “pros” have let it be this way.

I’m not trying to come up with a solution here because I don’t think there is one. GIS is just very small minded compared to other professions in the tech space. Even the word “enterprise” has been appropriated to mean something totally different. Just having a web map does not make GIS “enterprise”, in fact all you’re doing is taking workgroup and making it worse. It is easy to pick on Esri (as I did above) but they’re not the big problem. It’s the implementations which make Esri have such terminology. That is, it is the GIS “pros” who cause these problems on themselves. Who is to fault Esri for trying to make a buck?

I have made it my professional career to fix broken GIS systems. People always ask me, “What madness you must see trying to undo broken GIS systems” but the reality is I see some amazing work. Just small minded implementations. It is easy to make fun of ArcObjects or GML but they are just libraries that people use to create tools.

This isn’t a call to arms or a reminder that you’re doing GIS wrong, it’s just thoughts on a plane headed across the country where I’m looking at data that I created as a workgroup project. I’m sure there are people cleaning up my work that I implemented in the past, I can tell you there is some bad choices in that work. Technology has caused many of us to lose being humble. And that results in only one thing, bad choices. In the end this is my reminder to be humble. The good thing is I have no shapefiles anywhere on this laptop. That’s a start.

BIM Database Long Tail

In the GIS world the database part of GIS files is the power. I would wager the average GIS Analyst spends more time editing, calculating, transforming the GIS database more than they do the editing of the points/lines/polygons. The first thing I did working with GIS files is open the table to see what I have (or don’t have) for data.

One of the key aspects to BIM is the database. In the hands of an Architect, the database takes a back seat but tools such as Revit make sure that everything that is placed has detailed information about it stored in a database. It isn’t Revit though, IFC, CityGML and other formats treat the database as an important part of a BIM model. But when we share BIM models, the focus is always on the exterior of the model and not the data behind it.

Aqua Tower, Chicago, IL inside Cityzenith SmartWorld

One thing I’ve focused on here at Cityzenith since I joined as the CTO is pulling out the power from BIM models and expose them to users. As someone who is used to complex GIS databases I’m amazed at how much great data is locked in these BIM formats unable to be used by planners, engineers and citizens. I talked last week about adding a command line to Cityzenith so that users can get inside datasets and getting access to BIM databases is no exception.

That’s why we’re going to expose BIM databases the same way we expose SQL Server, Esri ArcGIS and other database formats. When you drag and drop BIM models into Cityzenith that have databases attached them you will be prompted to transform them with our transformation engine. BIM has always been treated as a special format that is locked up and kept only in hands of special users. That’s going to change, we are going to break out BIM from its protected silo and expose the longest of long tails in the spatial world, the BIM database.

I’ve always said Spatial isn’t Special and we can also say BIM isn’t Special.

Data Formats and the Datastore

Yesterday’s post generated some email, mostly in agreement, but I wanted to highlight one question.

Finding data for me is only half the problem, it’s formats that come and go. I’m archiving data in formats that I have no idea if they’ll be supported or not. What’s the point of indexing if you can’t view the file?

That’s a big issue of course. I mean think about it this way, what if I saved out my site plan in Manifold GIS format 5 years ago and wanted to open it today? Either I find a copy of Manifold or I don’t use the file. The solution isn’t as easy as you might think unfortunately. Software such as Safe Software FME can rescue many file formats but it’s a risk that you run hoping that Safe will add your format. One thing I try and do is save data as common format types. While I might use SDE and FGDB in production, I make an effort to save these layers off as Shapefiles and TIFF. We termed this over beers a couple years ago as “pragmatic file formats”. SHP, KML, TIFF, JPG, GeoJSON were all mentioned as ones that we thought were widely supported1. At WeoGeo, while we could support the 250+ file formats that FME supports, we left it at about 10 of the most requested formats.

But that brings up one thing we pushed with the WeoGeo Library2. You could load a WeoGeo supported file format, even a “pragmatic file format” type and because we used FME on the backend, know that it would be usable in the future. That’s a true “Library” environment, one where you can not only find what you are looking for, but know that it will be readable.

GIS by its very nature is file format verbose3 and we have to deal with this more than other professions. My recommendation today as it has been for years is try and do the following:

  1. Save in common file formats over niche proprietary ones
  2. Safe FME pays for itself
  3. Index your data. If you know what you have, you know what you need to support4

Simple enough, right? Don’t get turned upside down with your file formats.

  1. though you could argue GeoJSON back then wasn’t exactly supported well 

  2. now WeoGeo is Trimble Data 

  3. right now I can look at this MXD in front of me and see with 8 layers, I have 6 format types 

  4. that is why I have a copy of Corel Draw