Cloud Agnostic
Last week Apple announced it was moving some of its iCloud storage to Google’s cloud from Amazon. Earlier this month Dropbox moved from AWS to their own cloud hardware. The motives for both these moves are to get off of someone else’s cloud and and on to their own1. But it does bring up a good point for all hosted development, try and be as cloud agnostic as possible because you never know what you might have to do.
Early on in WeoGeo’s life, we had a product called WeoCEO. You won’t find too much about it but it was WeoGeo’s way of managing S3, EC2 instances and balancing traffic. WeoGeo created this because Amazon’s EC2 tools weren’t robust enough to handle a business yet and WeoGeo didn’t know how AWS might turn out. The point of WeoCEO was it could be staged in any CentOS environment and handle the WeoGeo database, geoprocessing and websites. In theory that would have allowed WeoGeo to easily move off of AWS and on to Azure, Google or Rackspace. WeoGeo abandoned WeoCEO and started using AWS’s native tools because it made it easier to work with new technology such as Cloudfront, new storage solutions and database options. While there was zero chance WeoGeo would move off of AWS, having such a tool could have made it easier for the WeoGeo platform to be integrated into other technology.
All this got me thinking about my current hosting GIS systems. I’ve got some on AWS and some on Azure. Could I move from one provider to the other and how much work would it be? Most of my stack isn’t proprietary to one or the other system2. Node.js, PostgreSQL and other open technology runs really well on just about any hosted system out there. But there is somewhat proprietary cloud technology out there you can lock yourself into.
I don’t think that everyone needs to develop their own WeoCEO type management system3 but making pragmatic choices with how to deploy cloud applications can pay itself back in spades. I have clients who want their applications on AWS or Azure and I can’t deploy the same application there with little effort, but keeping it that way requires planning and a will to be cloud agnostic. I’ve always liked the term and I’ve always tried to prototype and develop applications that aren’t locked into to infrastructure. I’d love to keep it that way and you should too.
Data Formats and the Datastore
Yesterday’s post generated some email, mostly in agreement, but I wanted to highlight one question.
Finding data for me is only half the problem, it’s formats that come and go. I’m archiving data in formats that I have no idea if they’ll be supported or not. What’s the point of indexing if you can’t view the file?
That’s a big issue of course. I mean think about it this way, what if I saved out my site plan in Manifold GIS format 5 years ago and wanted to open it today? Either I find a copy of Manifold or I don’t use the file. The solution isn’t as easy as you might think unfortunately. Software such as Safe Software FME can rescue many file formats but it’s a risk that you run hoping that Safe will add your format. One thing I try and do is save data as common format types. While I might use SDE and FGDB in production, I make an effort to save these layers off as Shapefiles and TIFF. We termed this over beers a couple years ago as “pragmatic file formats”. SHP, KML, TIFF, JPG, GeoJSON were all mentioned as ones that we thought were widely supported1. At WeoGeo, while we could support the 250+ file formats that FME supports, we left it at about 10 of the most requested formats.
But that brings up one thing we pushed with the WeoGeo Library2. You could load a WeoGeo supported file format, even a “pragmatic file format” type and because we used FME on the backend, know that it would be usable in the future. That’s a true “Library” environment, one where you can not only find what you are looking for, but know that it will be readable.
GIS by its very nature is file format verbose3 and we have to deal with this more than other professions. My recommendation today as it has been for years is try and do the following:
- Save in common file formats over niche proprietary ones
- Safe FME pays for itself
- Index your data. If you know what you have, you know what you need to support4
Simple enough, right? Don’t get turned upside down with your file formats.
though you could argue GeoJSON back then wasn’t exactly supported well ↩︎
now WeoGeo is Trimble Data ↩︎
right now I can look at this MXD in front of me and see with 8 layers, I have 6 format types ↩︎
that is why I have a copy of Corel Draw ↩︎
Data Formats and the Datastore
Yesterday’s post generated some email, mostly in agreement, but I wanted to highlight one question.
Finding data for me is only half the problem, it’s formats that come and go. I’m archiving data in formats that I have no idea if they’ll be supported or not. What’s the point of indexing if you can’t view the file?
That’s a big issue of course. I mean think about it this way, what if I saved out my site plan in Manifold GIS format 5 years ago and wanted to open it today? Either I find a copy of Manifold or I don’t use the file. The solution isn’t as easy as you might think unfortunately. Software such as Safe Software FME can rescue many file formats but it’s a risk that you run hoping that Safe will add your format. One thing I try and do is save data as common format types. While I might use SDE and FGDB in production, I make an effort to save these layers off as Shapefiles and TIFF. We termed this over beers a couple years ago as “pragmatic file formats”. SHP, KML, TIFF, JPG, GeoJSON were all mentioned as ones that we thought were widely supported1. At WeoGeo, while we could support the 250+ file formats that FME supports, we left it at about 10 of the most requested formats.
But that brings up one thing we pushed with the WeoGeo Library2. You could load a WeoGeo supported file format, even a “pragmatic file format” type and because we used FME on the backend, know that it would be usable in the future. That’s a true “Library” environment, one where you can not only find what you are looking for, but know that it will be readable.
GIS by its very nature is file format verbose3 and we have to deal with this more than other professions. My recommendation today as it has been for years is try and do the following:
- Save in common file formats over niche proprietary ones
- Safe FME pays for itself
- Index your data. If you know what you have, you know what you need to support4
Simple enough, right? Don’t get turned upside down with your file formats.
though you could argue GeoJSON back then wasn’t exactly supported well ↩︎
now WeoGeo is Trimble Data ↩︎
right now I can look at this MXD in front of me and see with 8 layers, I have 6 format types ↩︎
that is why I have a copy of Corel Draw ↩︎
Ninety Percent of the World’s Data Has Been Generated Over the Last Two Years
Tucked into a company blog post about Smart Cities was this statement that caught my eye.
Ninety percent of the world’s data has been generated over the last two years.
Unlike the “80% of Data is Spatial” I have to admit this is totally believable and I can find the source. Most of this data is pure junk but the biggest problem with it is that it is literally unsearchable. Even in the age of Google, we can’t even begin to start aggregating this data and sorting through it.
On the BLM GPM projected that I was part of at AECOM/URS, we teamed with Voyager to attempt to find all their spatial data and share it. The good news is that I hear the BLM Navigator will be rolling out soon so at least we can know that the BLM is indexing their data and attempting to share it. But that is one organization out of billions.
This unaccounted for data is unable to be leveraged by users and becomes wasted. We all know GIS is great for making informed decisions about just about anything, yet we are most likely uninformed ourselves because the data just doesn’t happen to be at our fingertips. We’re a society that loves to create data, but not one that likes to organize data. If we’re truly going to change the world with GIS, we need to make sure we have all the information available to do so. Smart Cities, GeoDesign and all the rest are big data use cases. Let’s figure out how to start pumping them full of it.
Ninety Percent of the World’s Data Has Been Generated Over the Last Two Years
Tucked into a company blog post about Smart Cities was this statement that caught my eye.
Ninety percent of the world’s data has been generated over the last two years.
Unlike the “80% of Data is Spatial” I have to admit this is totally believable and I can find the source. Most of this data is pure junk but the biggest problem with it is that it is literally unsearchable. Even in the age of Google, we can’t even begin to start aggregating this data and sorting through it.
On the BLM GPM projected that I was part of at AECOM/URS, we teamed with Voyager to attempt to find all their spatial data and share it. The good news is that I hear the BLM Navigator will be rolling out soon so at least we can know that the BLM is indexing their data and attempting to share it. But that is one organization out of billions.
This unaccounted for data is unable to be leveraged by users and becomes wasted. We all know GIS is great for making informed decisions about just about anything, yet we are most likely uninformed ourselves because the data just doesn’t happen to be at our fingertips. We’re a society that loves to create data, but not one that likes to organize data. If we’re truly going to change the world with GIS, we need to make sure we have all the information available to do so. Smart Cities, GeoDesign and all the rest are big data use cases. Let’s figure out how to start pumping them full of it.
The New Federal Source Code Policy
The White House released a draft policy yesterday for sharing source code among federal agencies, including a pilot program that will make portions of federal code open source.
This policy will require new software developed specifically for or by the Federal Government to be made available for sharing and re-use across Federal agencies. It also includes a pilot program that will result in a portion of that new federally-funded custom code being released to the public.
The policy outlined here highlights a draft policy proposal of a pilot program requiring covered agencies to release at least 20 percent of their newly-developed custom code, in addition to the release of all custom code developed by Federal employees at covered agencies as part of their official duties, subject to certain exceptions as noted in the main body of the policy.
Many Federal GIS consultants just had a bad morning.