Tag: data

  • Long Term Storage of Spatial Data

    Following on with yesterday’s blog post, I’m also concerned about where I’m storing the data. Until this month I stored the data in Dropbox. I can’t recall when I signed up for Dropbox, but I’ve probably paid them over $1,000 for the privilege of using their service. As with most SaaS products, they start trying to help consumers and then they pivot to enterprise. That’s what Dropbox is doing and I’m tired of it. Their client software is just a hack and there are too many other solutions that better fit with my budget needs than a stand along cloud storage solution.

    So as of May 2020, I no longer pay Dropbox $99/year. I’ve moved all my data to iCloud because I do pay for 2TB of storage there (Family plan) and it integrates better with my workflows. I could have put it in Google Drive too, but I’ve never liked how it works which is a shame because it is easy to share with other users. But this isn’t archival by any means. All I’m doing is putting data on a hard drive, though a virtual hard drive in the cloud. It gets backed up sure, but there isn’t any check to make sure my daughter doesn’t drag the data to the trash and click empty. A true archival service is one that makes the data much safer than just storing it in a folder.

    Now back in the old days, we used to archive off to DLT tapes and then send those offsite to a place like Iron Mountain. Eventually you’d realize you needed a restoration and the IT guy would request the tape/tapes come back from offsite and restore them to a folder that you could access. Hopefully they were in a format you could read, but generally that wasn’t too much of a problem, there is a reason though we kept a Sun workstation around in case we needed to restore data from ARC/INFO on Solaris. The good thing about this is that that data was always a copy, sure the tape could get damaged, but it was offsite and not prone to being messed with. If I needed data from October 2016, I could get it. Of course, eventually, old tapes were destroyed because of space needs but generally it was a great system.

    I’m doing the math in my head as to the cost of those DLT tapes

    Now I’m not thinking I need to get a DLT tape drive and pay Iron Mountain for this privilege, but I do need to get data off site and by offsite I mean off my easy to access cloud services (iCloud, Google Drive, AWS S3, etc). I have been working with Amazon S3 Glacier and it has been a really great service. I’ve been moving a ton of data there to not only clean up my local drives and iCloud storage, but ensure that that data is backed up and stored in a way that makes it much safer than having it readily available. Not Glacier is easy enough to use, especially if you are familiar with S3, but you don’t want to throw data in there that you need very often because of how it is costed. Uploading is free, and they charge you $0.004 per GB/mo which is insanely low. Retrieval is 3 cents per GB which is reasonable and after 90 days you can delete data for free.

    Glacier isn’t new by any means, I had been using it to archive my hard drives using Arq but not this specifically using projects. I’ve just started doing this over the weekend so we’ll see how it goes but I like that the data is in a deep freeze, ready to be retrieved if needed but not taking of space where it isn’t needed. I’ve also set a reminder in 2 years to evaluate the data storage formats to ensure that they are still the best method moving forward. If I do decide to change formats, I’ll continue to keep the original files in there just in case the archival formats are a bad decision down the road. Storing this all in Glacier means that space is cheap, and I can keep two copies of the data without problems.

  • The Matrix of Spatial Data

    I was thinking this morning about how much of my professional life has been about vector data. From the moment I started using Macromedia Freehand in college in the early 90s (before I had heard about GIS) to make maps to the 3D work, I’m doing with Unity and Cityzenith I’ve used vector data. I wasn’t genuinely introduced to raster data until I started using ArcInfo 5 at my first internship and working with grids and even then it was still about coverages and typing “build” or “clean” again and again. We did a bunch of raster analysis with Arc, but mostly it was done in Fortran by others (I never was able to pick up Fortran for some reason, probably best in the long run).

    It’s easy to see and use vectors in professional spatial work for sure. I always feel like Neo from the Matrix, I look at features in the world and mentally classify them as vectors:

    • Bird -> point
    • Electrical transmission line -> line
    • House -> polygon

    Heck or how you might think of a bird as a point (sighting), line (migratory pattern) or polygon (range). So damn nerdy and my wife fails to see the fun in any of this. Again, like Neo when he finally sees the world like the Matrix truly is we see things as the basic building blocks of vector data.

    As I’m flying to Chicago this morning and I stare out the window of the airplane, I can’t help but think of rasters though. Sort of like that hybrid background we throw on maps, the world beneath me is full of opportunities to create vectors. Plus I bet we could run some robust agriculture analysis (assuming I even knew what that was) to boot. The world is not full of 1s and 0s but full of rasters and vectors.

    As I’m a point, traveling a line on my way to a polygon, I can’t help but appreciate the spatial world that has been part of my life for over 20 years. I can’t help but think the next 20 is going to be amazing.

  • Focus on Data

    When you think geospatial you think data, right? You imagine GIS professionals working their butts off making normalized datasets that have wonderful metadata. Nah, that’s just some slide at the Esri UC where “best practices” become the focus of a week away from the family in the Gaslamp. For some reason, GIS has become more about the how we do something and less about the why we do something. I guess that all that “hipster” and “technologist” thinking that goes into these “best practices” loses the focus on why we do what we do, the data.

    At Cityzenith the first question a customer asks me is what data do we have available. See that’s because they aren’t GIS technologists, they’re just working folk who have to solve a problem. That problem requires the same problem that an accountant requires, accurate data. The last question these people care about is “Should I script this with JavaScript, Python or Ruby?”. They’re just looking for data that they can combine with their proprietary company data to make whatever decisions they need to make.

    Finding Data is Hard

    So much of what we do in our space is wasted on the tools to manage the data anymore. Sure in the 90s we needed to create these tools, or improve them so they could rely on enough to get our work done. But the analysis libraries are basically a commodity at this point. I can probably find 100 different ways to perform a spatial selection on GitHub to choose from. Personally, I can’t even recall opening ArcGIS or QGIS to solve a problem. There just isn’t a need to do so anymore. These tools have become so prevalent that we don’t need to fight battles over which one to use anymore.

    Your TIGER WMS is available

    Thanks to Google and OpenStreetMap, base maps are now commoditized to the point that we rarely pay for them. That part we can be sure that we’ve got the best data. (Disclosure, Cityzenith users Mapbox for our base mapping) But everything else is still lacking. I won’t pick on any vendor of data but generally, it works the same way, you either subscribe to a WMS/WFS feed (or worse, some wacky ArcGIS Online subscription) and if you’re “lucky”, a downloaded zip file of shapefiles. Neither lends itself to how data is managed or used in today’s companies.

    Back to our customers, they expect a platform that can visualize data and one that is easy to use. But I know the first question they ask before signing up for our platform is, “What data do you have?”. They want to know more about our IoT data, data from our other partners (traffic, weather, demographics, etc.) and how they can combine it with their own data. They will ask about our tech stack from time to time, or how we create 3D worlds in the browser but that is so rare. It’s:

    1. What do you have?
    2. Where do you have it?

    There are so many choices people have on how they can perform analysis on data. Pick and choose, it’s all personal preference. But access to the most up-to-date, normalized, indexed and available data for their area of interest. That’s why our focus has been partnering with data providers who have these datasets people need and present them to our users in formats and ways that are useful to them. Nobody wants a shapefile. Get over it. They want data feeds that they can bring into their workflows that have no GIS software in them whatsoever.

    As I sit and watch the news from the Esri UC it is a stark reminder that the future of data isn’t in the hands of niche geospatial tools, it’s in the hands of everyone. That’s what we’re doing at Cityzenith.

  • Data Formats and the Datastore

    Yesterday’s post generated some email, mostly in agreement, but I wanted to highlight one question.

    Finding data for me is only half the problem, it’s formats that come and go. I’m archiving data in formats that I have no idea if they’ll be supported or not. What’s the point of indexing if you can’t view the file?

    That’s a big issue of course. I mean think about it this way, what if I saved out my site plan in Manifold GIS format 5 years ago and wanted to open it today? Either I find a copy of Manifold or I don’t use the file. The solution isn’t as easy as you might think unfortunately. Software such as Safe Software FME can rescue many file formats but it’s a risk that you run hoping that Safe will add your format. One thing I try and do is save data as common format types. While I might use SDE and FGDB in production, I make an effort to save these layers off as Shapefiles and TIFF. We termed this over beers a couple years ago as “pragmatic file formats”. SHP, KML, TIFF, JPG, GeoJSON were all mentioned as ones that we thought were widely supported1. At WeoGeo, while we could support the 250+ file formats that FME supports, we left it at about 10 of the most requested formats.

    But that brings up one thing we pushed with the WeoGeo Library2. You could load a WeoGeo supported file format, even a “pragmatic file format” type and because we used FME on the backend, know that it would be usable in the future. That’s a true “Library” environment, one where you can not only find what you are looking for, but know that it will be readable.

    GIS by its very nature is file format verbose3 and we have to deal with this more than other professions. My recommendation today as it has been for years is try and do the following:

    1. Save in common file formats over niche proprietary ones
    2. Safe FME pays for itself
    3. Index your data. If you know what you have, you know what you need to support4

    Simple enough, right? Don’t get turned upside down with your file formats.

    1. though you could argue GeoJSON back then wasn’t exactly supported well 

    2. now WeoGeo is Trimble Data 

    3. right now I can look at this MXD in front of me and see with 8 layers, I have 6 format types 

    4. that is why I have a copy of Corel Draw 

  • Google Maps API Data Quality Issues

    Link – GMaps API data quality deteriorating?

    the basemap data you get via the API is only from TeleAtlas, but if you look at the maps through Google’s branded gateway, they are enhanced with NavTech data too. As rich pointed out, there’s a long discussion about this on the Google Maps API Google Group, or Usenet group as it was once known.

    We heard a little bit about this a couple weeks ago. I’ve been saying since day one that the problem with a free API for web mapping is that you need to either reduce your costs as low as possible or have another revenue stream (advertisements). One of the biggest arguments for a paid service like ArcWeb is that you get a great choice of data. We’ve already seen that the satellite imagery in ArcWeb is much better than Google Maps and that I don’t think people mind paying for a service if the quality is better.

    Conspiracy theories fly! Do people really care enough about very high quality base maps to pay for a premium API service? Or are geodata licensing costs driving this decision on the part of GMaps? If quality of service continues to deteriorate, will this provide a boon to collaborative mapping in the land of the free geodata, augmenting the accuracy and currency that Google’s maps may be losing?

    So there is an opportunity for ArcWeb 2005. The question is how soon will it be to we here/see it (with a name like ArcWeb 2005, you’d think we’d see it soon). I’ll tell you this, as soon as the new ArcWeb 200X is out, I’m going to replace my blog map with it.

  • The Battle to be Google Maps’s Data Provider

    Link – Google Maps and Their Data Providers

    NAVTEQ spends a lot of money to get the most accurate data on the streets and roads, and they make most of their money selling routing (directions) through in-car navigation systems. I bet NAVTEQ wish they had a dollar for every time a prospective customer came to them expecting Google Maps-style driving directions to be free. Oh wait, they do. Every set of driving directions you get from Google Maps (or Yahoo! Maps, or MapQuest) represents real money in the pocket of NAVTEQ-they charge per route. Google, Yahoo!, MapQuest, and others are all eating those charges when they offer the service to you for free, planning to make it back on advertising and related travel services.

    It is pretty easy to see who is being squeezed here. The data providers hold all the cards, but competition is driving the market and no one wants to be left holding the bag. One almost has to wonder though wen these data providers might strike back at Google/MapQuest/Yahoo! and start charging them more. For now they seem willing to undercut each other and maybe that business model will work. Still I have to wonder if NAVTEQ, TeleAtlas and other might look toward the fight that the RIAA is having with Apple and start wondering if they should start controlling more of the delivery of their datasets.