The Matrix of Spatial Data

I was thinking this morning about how much of my professional life has been about vector data. From the moment I started using Macromedia Freehand in college in the early 90s (before I had heard about GIS) to make maps to the 3D work, I’m doing with Unity and Cityzenith I’ve used vector data. I wasn’t genuinely introduced to raster data until I started using ArcInfo 5 at my first internship and working with grids and even then it was still about coverages and typing “build” or “clean” again and again. We did a bunch of raster analysis with Arc, but mostly it was done in Fortran by others (I never was able to pick up Fortran for some reason, probably best in the long run).

It’s easy to see and use vectors in professional spatial work for sure. I always feel like Neo from the Matrix, I look at features in the world and mentally classify them as vectors:

  • Bird -> point
  • Electrical transmission line -> line
  • House -> polygon

Heck or how you might think of a bird as a point (sighting), line (migratory pattern) or polygon (range). So damn nerdy and my wife fails to see the fun in any of this. Again, like Neo when he finally sees the world like the Matrix truly is we see things as the basic building blocks of vector data.

As I’m flying to Chicago this morning and I stare out the window of the airplane, I can’t help but think of rasters though. Sort of like that hybrid background we throw on maps, the world beneath me is full of opportunities to create vectors. Plus I bet we could run some robust agriculture analysis (assuming I even knew what that was) to boot. The world is not full of 1s and 0s but full of rasters and vectors.

As I’m a point, traveling a line on my way to a polygon, I can’t help but appreciate the spatial world that has been part of my life for over 20 years. I can’t help but think the next 20 is going to be amazing.

Focus on Data

When you think geospatial you think data, right? You imagine GIS professionals working their butts off making normalized datasets that have wonderful metadata. Nah, that’s just some slide at the Esri UC where “best practices” become the focus of a week away from the family in the Gaslamp. For some reason, GIS has become more about the how we do something and less about the why we do something. I guess that all that “hipster” and “technologist” thinking that goes into these “best practices” loses the focus on why we do what we do, the data.

At Cityzenith the first question a customer asks me is what data do we have available. See that’s because they aren’t GIS technologists, they’re just working folk who have to solve a problem. That problem requires the same problem that an accountant requires, accurate data. The last question these people care about is “Should I script this with JavaScript, Python or Ruby?”. They’re just looking for data that they can combine with their proprietary company data to make whatever decisions they need to make.

Finding Data is Hard

So much of what we do in our space is wasted on the tools to manage the data anymore. Sure in the 90s we needed to create these tools, or improve them so they could rely on enough to get our work done. But the analysis libraries are basically a commodity at this point. I can probably find 100 different ways to perform a spatial selection on GitHub to choose from. Personally, I can’t even recall opening ArcGIS or QGIS to solve a problem. There just isn’t a need to do so anymore. These tools have become so prevalent that we don’t need to fight battles over which one to use anymore.

Your TIGER WMS is available

Thanks to Google and OpenStreetMap, base maps are now commoditized to the point that we rarely pay for them. That part we can be sure that we’ve got the best data. (Disclosure, Cityzenith users Mapbox for our base mapping) But everything else is still lacking. I won’t pick on any vendor of data but generally, it works the same way, you either subscribe to a WMS/WFS feed (or worse, some wacky ArcGIS Online subscription) and if you’re “lucky”, a downloaded zip file of shapefiles. Neither lends itself to how data is managed or used in today’s companies.

Back to our customers, they expect a platform that can visualize data and one that is easy to use. But I know the first question they ask before signing up for our platform is, “What data do you have?”. They want to know more about our IoT data, data from our other partners (traffic, weather, demographics, etc.) and how they can combine it with their own data. They will ask about our tech stack from time to time, or how we create 3D worlds in the browser but that is so rare. It’s:

  1. What do you have?
  2. Where do you have it?

There are so many choices people have on how they can perform analysis on data. Pick and choose, it’s all personal preference. But access to the most up-to-date, normalized, indexed and available data for their area of interest. That’s why our focus has been partnering with data providers who have these datasets people need and present them to our users in formats and ways that are useful to them. Nobody wants a shapefile. Get over it. They want data feeds that they can bring into their workflows that have no GIS software in them whatsoever.

As I sit and watch the news from the Esri UC it is a stark reminder that the future of data isn’t in the hands of niche geospatial tools, it’s in the hands of everyone. That’s what we’re doing at Cityzenith.

Data Formats and the Datastore

Yesterday’s post generated some email, mostly in agreement, but I wanted to highlight one question.

Finding data for me is only half the problem, it’s formats that come and go. I’m archiving data in formats that I have no idea if they’ll be supported or not. What’s the point of indexing if you can’t view the file?

That’s a big issue of course. I mean think about it this way, what if I saved out my site plan in Manifold GIS format 5 years ago and wanted to open it today? Either I find a copy of Manifold or I don’t use the file. The solution isn’t as easy as you might think unfortunately. Software such as Safe Software FME can rescue many file formats but it’s a risk that you run hoping that Safe will add your format. One thing I try and do is save data as common format types. While I might use SDE and FGDB in production, I make an effort to save these layers off as Shapefiles and TIFF. We termed this over beers a couple years ago as “pragmatic file formats”. SHP, KML, TIFF, JPG, GeoJSON were all mentioned as ones that we thought were widely supported1. At WeoGeo, while we could support the 250+ file formats that FME supports, we left it at about 10 of the most requested formats.

But that brings up one thing we pushed with the WeoGeo Library2. You could load a WeoGeo supported file format, even a “pragmatic file format” type and because we used FME on the backend, know that it would be usable in the future. That’s a true “Library” environment, one where you can not only find what you are looking for, but know that it will be readable.

GIS by its very nature is file format verbose3 and we have to deal with this more than other professions. My recommendation today as it has been for years is try and do the following:

  1. Save in common file formats over niche proprietary ones
  2. Safe FME pays for itself
  3. Index your data. If you know what you have, you know what you need to support4

Simple enough, right? Don’t get turned upside down with your file formats.

  1. though you could argue GeoJSON back then wasn’t exactly supported well 

  2. now WeoGeo is Trimble Data 

  3. right now I can look at this MXD in front of me and see with 8 layers, I have 6 format types 

  4. that is why I have a copy of Corel Draw