GIS for Math

There was great reflection over Thanksgiving at my house.

Well maybe that is hyperbole but I was asked how the heck did I get myself where I am today. I think I’ve told this story many times before on this blog, but one more time won’t hurt. I was working toward a degree in Economics when statistics classes his my schedule. I really took to these and started to try and take as many as I could before I graduated. One of these was given by the Geography Department at Arizona State University. The name of the course has been lost to time but I do recall they used SPSS which I despised. The kicker though was the TA for that class introduced me to Perl and that was the introduction to the freedom that open scripting tools can give you.

Maps have been something as a kid I loved, like you I read the atlas and the Thomas Brothers Guide, but math and statistics is what drew me to GIS. SPSS and Perl are no longer part of my toolset (thank god honestly) but the skills I learned back then still make calculations in GIS analysis much easier for me. Cartography is the tip of the iceberg with GIS, the math is what makes it sing. Don’t forget that.

The GIS Database

I’ve been thinking about GIS data a bit lately, mostly because I’m cleaning off old hard drives I’ve had in my possession to try and consolidate my data (or not lose the data off of old hard drives). Typically GIS data was accessed one of two ways, either from a server through some endpoint or via a local file store. I can’t look at these old ArcGIS Desktop MXDs anymore but I recall most of the work we did was local file store. You know, sitting on the “P drive” and referenced via a file path. We all remember opening up projects and seeing those red exclamation points telling us that data was moved (or the project file was).

It is very easy in retrospect to go back and call yourself batshit crazy for storing data this way (back up hopefully every night on a DLT tape). I mean think about this for a minute, nothing was versioned. We live in this world of git where everything I do (including this blog) is stored in a database where I can track changes and revert if need be. Now I’m not using this post to talk about the need of GeoGig or whatever that project is called these days (I’m not even sure it still exists), but the realization that GIS over the years is such a workgroup discipline.

I worked for AECOM, the largest AEC in the world. We did some amazing enterprise projects but GIS was never one of them. It was a small group of GIS “pros”, “doing” GIS to support some enterprise project that changed the world. Tacked on if you will, and it’s not just AECOM that worked that way. Every organization views GIS this way, like “graphics”. Why is this? Because GIS “pros” have let it be this way.

I’m not trying to come up with a solution here because I don’t think there is one. GIS is just very small minded compared to other professions in the tech space. Even the word “enterprise” has been appropriated to mean something totally different. Just having a web map does not make GIS “enterprise”, in fact all you’re doing is taking workgroup and making it worse. It is easy to pick on Esri (as I did above) but they’re not the big problem. It’s the implementations which make Esri have such terminology. That is, it is the GIS “pros” who cause these problems on themselves. Who is to fault Esri for trying to make a buck?

I have made it my professional career to fix broken GIS systems. People always ask me, “What madness you must see trying to undo broken GIS systems” but the reality is I see some amazing work. Just small minded implementations. It is easy to make fun of ArcObjects or GML but they are just libraries that people use to create tools.

This isn’t a call to arms or a reminder that you’re doing GIS wrong, it’s just thoughts on a plane headed across the country where I’m looking at data that I created as a workgroup project. I’m sure there are people cleaning up my work that I implemented in the past, I can tell you there is some bad choices in that work. Technology has caused many of us to lose being humble. And that results in only one thing, bad choices. In the end this is my reminder to be humble. The good thing is I have no shapefiles anywhere on this laptop. That’s a start.

Natural Language Processing is All Talk

I’ve talked about Natural Language Processing (NLP) before and how it is beginning to change the BIM/GIS space. But NLP is just part of the whole solution to change how analysis is run. I look at this as three parts:

  1. Natural Language Processing
  2. Curated Datasets
  3. Dynamic Computation

NLP is understanding ontologies more than anything else. When I ask how “big” something is, what do I mean by this. Let’s abstract this away a bit.

How big is Jupiter?

One could look at this a couple ways. What is the mass of Jupiter? What is the diameter of Jupiter? What is the volume of Jupiter? Being able to figure out intent of the question is critical to having everything else work. We all remember Siri and Alexa when they first started. They were pretty good at figuring out the weather but once you got out of those canned queries all bets were off. It is the same with using NLP with BIM or GIS. How long is something? Easy! Show me all mixed-use commercial zoned space near my project? Hard. Do we know what mixed-use commercial zoning is? Do we know where my project is? That because we need to know more about the ontology of our domain. How do we do this, learn about our domain? We need lots of data to teach the NLP and then run it through a Machine Learning (ML) tool such as Amazon Comprehend to figure out the context of the data and structure it in a way the NLP can understand out intents.

As discussed above, curated data to figure out ontology is important but it’s also important to help users run analysis without understanding what they need. Imagine using Siri, but you needed to provide your own weather service to find out the current temperature? While I have many friends who would love to do this, most people just don’t care. Keep it simple and tell me how warm it is. Same with this knowledge engine we’re talking about. I want to know zoning for New York City? It should be available and ready to use. Not only that, curated so it is normalized across geographies. Asking a question in New York or Boston (while there are unique rules in every city) should’t be difficult. Having this data isn’t as sexy as the NLP, but it sure as heck makes that NLP so much better and smarter. Plus, who wants to worry about do they have the latest zoning for a city, it should always be available and on demand.

Lastly once we understand the context of the natural language query and have data to analysis, we need to run the algorithms on the question. This is what we typically think of as GIS. Rather than manually running that buffer and identity, we use AI/ML to figure out the intent of the user using the ontology and grab the data for the analysis from the curated data repository. This used to be something very special, you needed to use some monolithic tool such as ArcGIS or MapInfo to accomplish the dynamic computation. But today these algorithms are open and available to anyone. Natural language lets us figure out what the user is asking and then run the correct analysis, even if they call it something different from what a GIS person might.
The “Alexa-like” natural language demos where the computer talks to users is fun, but much like the AR examples we see these days, not really useful in the context of real world use. Who wants their computer talking to them in an open office environment? But giving users who don’t know anything about structured GIS analysis the ability to perform complex GIS analysis is the game changer. It isn’t about how many seats of some GIS program are on everyones desk but how easy these NLP/AI/ML systems can be integrated into the existing workflows or websites. That’s where I see 2019 going, GIS everywhere.

The Matrix of Spatial Data

I was thinking this morning about how much of my professional life has been about vector data. From the moment I started using Macromedia Freehand in college in the early 90s (before I had heard about GIS) to make maps to the 3D work, I’m doing with Unity and Cityzenith I’ve used vector data. I wasn’t genuinely introduced to raster data until I started using ArcInfo 5 at my first internship and working with grids and even then it was still about coverages and typing “build” or “clean” again and again. We did a bunch of raster analysis with Arc, but mostly it was done in Fortran by others (I never was able to pick up Fortran for some reason, probably best in the long run).

It’s easy to see and use vectors in professional spatial work for sure. I always feel like Neo from the Matrix, I look at features in the world and mentally classify them as vectors:

  • Bird -> point
  • Electrical transmission line -> line
  • House -> polygon

Heck or how you might think of a bird as a point (sighting), line (migratory pattern) or polygon (range). So damn nerdy and my wife fails to see the fun in any of this. Again, like Neo when he finally sees the world like the Matrix truly is we see things as the basic building blocks of vector data.

As I’m flying to Chicago this morning and I stare out the window of the airplane, I can’t help but think of rasters though. Sort of like that hybrid background we throw on maps, the world beneath me is full of opportunities to create vectors. Plus I bet we could run some robust agriculture analysis (assuming I even knew what that was) to boot. The world is not full of 1s and 0s but full of rasters and vectors.

As I’m a point, traveling a line on my way to a polygon, I can’t help but appreciate the spatial world that has been part of my life for over 20 years. I can’t help but think the next 20 is going to be amazing.

Focus on Data

When you think geospatial you think data, right? You imagine GIS professionals working their butts off making normalized datasets that have wonderful metadata. Nah, that’s just some slide at the Esri UC where “best practices” become the focus of a week away from the family in the Gaslamp. For some reason, GIS has become more about the how we do something and less about the why we do something. I guess that all that “hipster” and “technologist” thinking that goes into these “best practices” loses the focus on why we do what we do, the data.

At Cityzenith the first question a customer asks me is what data do we have available. See that’s because they aren’t GIS technologists, they’re just working folk who have to solve a problem. That problem requires the same problem that an accountant requires, accurate data. The last question these people care about is “Should I script this with JavaScript, Python or Ruby?”. They’re just looking for data that they can combine with their proprietary company data to make whatever decisions they need to make.

Finding Data is Hard

So much of what we do in our space is wasted on the tools to manage the data anymore. Sure in the 90s we needed to create these tools, or improve them so they could rely on enough to get our work done. But the analysis libraries are basically a commodity at this point. I can probably find 100 different ways to perform a spatial selection on GitHub to choose from. Personally, I can’t even recall opening ArcGIS or QGIS to solve a problem. There just isn’t a need to do so anymore. These tools have become so prevalent that we don’t need to fight battles over which one to use anymore.

Your TIGER WMS is available

Thanks to Google and OpenStreetMap, base maps are now commoditized to the point that we rarely pay for them. That part we can be sure that we’ve got the best data. (Disclosure, Cityzenith users Mapbox for our base mapping) But everything else is still lacking. I won’t pick on any vendor of data but generally, it works the same way, you either subscribe to a WMS/WFS feed (or worse, some wacky ArcGIS Online subscription) and if you’re “lucky”, a downloaded zip file of shapefiles. Neither lends itself to how data is managed or used in today’s companies.

Back to our customers, they expect a platform that can visualize data and one that is easy to use. But I know the first question they ask before signing up for our platform is, “What data do you have?”. They want to know more about our IoT data, data from our other partners (traffic, weather, demographics, etc.) and how they can combine it with their own data. They will ask about our tech stack from time to time, or how we create 3D worlds in the browser but that is so rare. It’s:

  1. What do you have?
  2. Where do you have it?

There are so many choices people have on how they can perform analysis on data. Pick and choose, it’s all personal preference. But access to the most up-to-date, normalized, indexed and available data for their area of interest. That’s why our focus has been partnering with data providers who have these datasets people need and present them to our users in formats and ways that are useful to them. Nobody wants a shapefile. Get over it. They want data feeds that they can bring into their workflows that have no GIS software in them whatsoever.

As I sit and watch the news from the Esri UC it is a stark reminder that the future of data isn’t in the hands of niche geospatial tools, it’s in the hands of everyone. That’s what we’re doing at Cityzenith.

GIS Software has to be Hard to Use

Serious though, right? GIS has been defined by those who create much of it at “Scientific Software”. Because of such, it needs to be:

  1. Expensive
  2. Difficult to use
  3. Poorly documented
  4. Buggy
  5. Slow
ArcGIS Toolbars

Professional GIS*

GIS software is literally the kitchen sink. Most GIS software started out as a project for some company and then morphed into a product. They are a collection of tools created for specific projects duct taped together and sold as a subscription. We’ve talked about re-imagining how we work with spatial data but we rarely turn the page. The GIS Industrial Complex (open source and proprietary, everything is awful) is built upon making things hard to do. There has been attempts to solve the problem but then in themselves are usually built for a project rather than a product. Somewhat cynical but you have to wonder if this is true.

Tools such as Tableau are the future and as they add more spatial capability GIS Specialists will be out of a job. Being a button pusher seems more and more like a dead end job.

GIS and the Keyboard

I think you can usually tell when a GIS Professional learned GIS by how they use their keyboard. Those who learned either on UNIX command line programs such as ArcInfo or GDAL seem to go out of their way to type commands either through keystrokes or scripting while those who learned in the GUI era, either ArcView 3.x or ArcGIS Desktop prefer to use a mouse. Now generalizing is always dangerous but it highlights things about how GIS analysis is done.

GUI GIS

I almost feel like Yakov Smirnoff saying “What a country!” when you realize that most of the complicated scripting commands of the 90s are completed almost perfectly by dropping a couple GIS layers on a wizard and keep clicking next. Esri should be commended for making these tools drop-dead simple to use. But it brings up the issue of does anyone under stand what is going on with these tools when they run them? Let’s take a simple example for Intersect.

Esri Intersect Tool

So simple right? You just take your input features, choose where the output feature goes and hit OK. Done. But what about those optional items below. How many people actually ever set those? Not many of course and many times you don’t need to set them but not understanding why they are options makes it dangerous that you might not perform your analysis correctly. I’ll say you don’t understand how to run a GIS command unless you understand not only what the command does but all the options.
You don’t have to learn Python to be a GIS Analyst, running Model Builder or just the tools from ArcCatalog is good enough. But if you find yourself not even seeing these options on the bottom, let alone understand what they are and why they are used, you aren’t anything more than a button pusher. And button pushers are easily replaced. The Esri Intersect Tool has many options and using it like below will only give you minimum power and understanding of how GIS works.

Esri Intersect Tool with blinders on.

In the old days of keyboards, you have to type commands out and know what each one did. In fact many commands wouldn’t run unless you put an option in. Part of it is when you type the words “fuzzy_tollerance” enough times you want to know what they heck it is. I think keyboard GIS connected users to the commands and concepts of GIS more than wizards do. Much like working with your hands connects people to woodworking, working with your keyboard connects people to GIS.