Happy Friday everyone. These weeks just fly by when you are locked in your house looking out your front window for the Instacart delivery from the grocery store. I just wanted to remind everyone that I’ve got a weekly newsletter were I do some deep dives into things that are on my mind related to GIS, BIM, Smart Cities and technology.
Just sign up below and you’ll get a newsletter in your inbox each Wednesday (or maybe Thursday LOL).
I’ve spent years trying to build worldwide building datasets for Smart City and Digital Twin applications. I’ve tried building them using off-the-shelf data providers that give you COLLADA files, I’ve tried using APIs such as the Mapbox Unity SDK and buying buildings one by one to fill in gaps. None of these solutions have the resolution needed to perform the types of analysis needed to make better choices for cities and development potential. How to create real 3D cities with enough resolution has been out of our grasp until now.
I’ve been following Pixel8 for a while now and it is clear that crowdsourcing these models is going to be the only way forward. Over 10 years ago, Microsoft actually had this figured out with their Photosyth tool but they never were able to figure out what to do with it. Only today are we seeing startups attack this problem with a solution that has enough resolution and speed that we can start seeing cities build highly detailed 3D models that have actual value.
It is still early days with these point cloud tools, but at the speed they’ve improved over the last year, we should be seeing their use more and more. Mixing the data from smartphones, lidar and satellite imagery can make large areas of cities mapped in 3D with high accuracy. Pixel8 isn’t the only company attempting this so we should see real innovation over the next year. Stay tuned!
Technology is verbose. There are no shortages of superlatives that help define the solution. It becomes almost noise when you are looking at what truly this solution solves, or even has a problem defined. Just drop something in something and then something could happen. I’ve spent a career trying to help fight through this noise and in the end one question should always come up.
And then what?
Yes you can spend millions of dollars on what seems like the perfect application, workflow or cloud-based solution, but after you get it all done, what then? We deal with this all the time on our own, part of why I’ve left digital note taking is because the “And then what?” of putting all that effort into getting text into a smartphone is either non-existent, worthless, or unneeded.
So much money has been wasted on “solutions” that “revolutionize” the “process”. Being able to answer the question above is how you get the best value out of the proposal. Dead projects, software not being used, databases withering on the server all happen because the users of the tools have no idea what to do with them when they are done. Time for this madness to stop.
Uber Technologies Inc. announced that it has entered into a Google master agreement under which the ride-hailing company will get access to Google Maps platform rides and deliveries services.
I mean today Uber uses Google Maps with their app, even on iOS. This is basically a continuation of the previous agreement with some changes that better align with how Uber does business. Rather than number of requests that Uber makes for Google Maps services, it is based on billable trips that are booked using Uber, a much more manageable deal for Uber. Last year, it came out that Uber paid Google $58 million over the past 3 years for access to Google Maps. This quote really strikes me as bold:
“We do not believe that an alternative mapping solution exists that can provide the global functionality that we require to offer our platform in all of the markets in which we operate. We do not control all mapping functions employed by our platform or Drivers using our platform, and it is possible that such mapping functions may not be reliable.”
For as much money Uber has invested in mapping, they don’t believe their technology is reliable enough to roll out to the public. That is mapping services in a nutshell, when you business is dependent on the best routing and addressing, those businesses pick Google every time. All that time and effort to build a mapping platform and they still pay another company tens of millions of dollars.
I’ve read so much about how Uber is about ready to release their own mapping platform run on OSM. But in the end the business requires the best mapping platform and routing services and clearly nobody has come close to Google in this regard. Google Maps is not only the standard but almost a requirement anymore.
A couple months I made one last attempt to enjoy taking notes digitally. I used a combination of Github, Microsoft VS Code and VIM to make my notes shareable and archivable across multiple platforms. As I expected, it failed miserably. It isn’t to say that Github doesn’t do a good job of note taking, just the workflow is wonky because that is what technologists do, make things harder for them because they can.
The thing is though, I find myself taking less notes now than before, because of the workflow. Just because I can do something doesn’t mean I should. Moving back to analog is usually a good choice, how often do I need to search my notes? Rarely, I mostly look at the dates and then go from there.
My workflow has now standardized to using the Studio Neat Totebook which I enjoy because it is thin, has the dot grid that gives note taking flexibility and has archival stickers so I can put them on the shelf like my old Field Notes. Why did I not go back to them? I find their size for normal note taking too constrictive, but that’s just me. The size of the Totebook is just right, small enough to not be to big, but big enough to not be too small.
I still use the same pens I’ve been using for years, I feel like they don’t smear and don’t cost a bundle if you lose them and have a bit of friction on the writing that makes it much easier to control. Pens are more a personal preference, it’s hard to move between them as easily as paper. Find a pen you like and stick with it.
I’m just done with Evernote, Bear, OneNote and all the rest of platforms I’ve spent years trying to adapt to.
Following on with yesterday’s blog post, I’m also concerned about where I’m storing the data. Until this month I stored the data in Dropbox. I can’t recall when I signed up for Dropbox, but I’ve probably paid them over $1,000 for the privilege of using their service. As with most SaaS products, they start trying to help consumers and then they pivot to enterprise. That’s what Dropbox is doing and I’m tired of it. Their client software is just a hack and there are too many other solutions that better fit with my budget needs than a stand along cloud storage solution.
So as of May 2020, I no longer pay Dropbox $99/year. I’ve moved all my data to iCloud because I do pay for 2TB of storage there (Family plan) and it integrates better with my workflows. I could have put it in Google Drive too, but I’ve never liked how it works which is a shame because it is easy to share with other users. But this isn’t archival by any means. All I’m doing is putting data on a hard drive, though a virtual hard drive in the cloud. It gets backed up sure, but there isn’t any check to make sure my daughter doesn’t drag the data to the trash and click empty. A true archival service is one that makes the data much safer than just storing it in a folder.
Now back in the old days, we used to archive off to DLT tapes and then send those offsite to a place like Iron Mountain. Eventually you’d realize you needed a restoration and the IT guy would request the tape/tapes come back from offsite and restore them to a folder that you could access. Hopefully they were in a format you could read, but generally that wasn’t too much of a problem, there is a reason though we kept a Sun workstation around in case we needed to restore data from ARC/INFO on Solaris. The good thing about this is that that data was always a copy, sure the tape could get damaged, but it was offsite and not prone to being messed with. If I needed data from October 2016, I could get it. Of course, eventually, old tapes were destroyed because of space needs but generally it was a great system.
Now I’m not thinking I need to get a DLT tape drive and pay Iron Mountain for this privilege, but I do need to get data off site and by offsite I mean off my easy to access cloud services (iCloud, Google Drive, AWS S3, etc). I have been working with Amazon S3 Glacier and it has been a really great service. I’ve been moving a ton of data there to not only clean up my local drives and iCloud storage, but ensure that that data is backed up and stored in a way that makes it much safer than having it readily available. Not Glacier is easy enough to use, especially if you are familiar with S3, but you don’t want to throw data in there that you need very often because of how it is costed. Uploading is free, and they charge you $0.004 per GB/mo which is insanely low. Retrieval is 3 cents per GB which is reasonable and after 90 days you can delete data for free.
Glacier isn’t new by any means, I had been using it to archive my hard drives using Arq but not this specifically using projects. I’ve just started doing this over the weekend so we’ll see how it goes but I like that the data is in a deep freeze, ready to be retrieved if needed but not taking of space where it isn’t needed. I’ve also set a reminder in 2 years to evaluate the data storage formats to ensure that they are still the best method moving forward. If I do decide to change formats, I’ll continue to keep the original files in there just in case the archival formats are a bad decision down the road. Storing this all in Glacier means that space is cheap, and I can keep two copies of the data without problems.
Taking this break I’ve been looking over my spatial data and trying to figure out how to best organize it. The largest public project I manage is the GeoJSON Ballparks and this one is easy to manage as it is just a Git repository with text files. GeoJSON makes sense here because it is a very simple dataset (x/y) and it has been used for mapping projects mostly which makes the GeoJSON format perfect. I used to maintain a Shapefile version of it in that repository but nobody ever downloaded it so I just killed it eventually.
But my other data projects, things I’ve mapped or worked on the past are in a couple of formats:
TIFF (mostly GeoTIFF)
Now you can tell from some of these formats, I haven’t touched these datasets in a long time. Being Mac centric, the Personal Geodatabase is dead to me and given the modification dates on that stuff is 2005-2007 I doubt I’ll need it anytime soon. But it does bring of the question of archival, clearly PGDB isn’t the best format for this and I probably should convert it soon to some other format. Bill Dollins would tell me GeoPackage would be the best as Shapefile would cause me to lose data given limits of DBF, but I’m not a big fan of the format mostly because I’ve never needed to use it. Moving the data to GeoJSON would be good because who doesn’t like text formats, but GeoJSON doesn’t handle curves and while it might be fine for the Personal Geodatabase data, it doesn’t make a ton of sense for more complex data.
I’ve thought about WKT as an archival format (specifically WKB) which might make sense for me given the great WKT/WKB support in databases. But again, could I be just making my life harder than it needs to be just to not use the GeoPackage? But there is something about WKT/WKB that makes me comfortable for storing data for a long time given the long term support of the standard among so many of those databases. The practical method might be everything in GeoJSON except curves and those can get into WKT/WKB.
Raster is much easier given most of that data is in two fairly open formats. GeoTIFF or TIFF probably will be around longer than you or I and Esri grid formats have been well support through the years making both fairly safe. What are some limits to data formats that I do worry about?
File size, do they have limits to how large they can be (e.g. TIFF and 32-bit limit)
File structure, do they have limits to what can be stored (e.g. GeoJSON and curves)
File format issues (e.g. everything about the Shapefile and dbf)
OS centric formats (PGDB working only on Windows)
I think the two biggest fears of mine are the last two, because the first to can be mitigated fairly easily. My plan is the following; convert all vector data into GeoJSON, except where curves are required, I’m punting curves right now because I only have 3 datasets that require them and I’ll leave them in their native formats for now. The raster data is fine, TIFF and grid is perfect and I won’t be touching them at all. The other thing I’m doing is documenting the projects and data so that future James (or whomever gets this hard drive eventually) knows what the data is and how it was used. So little of what I have has any documentation, at least I’m lucky enough the file names make sense and the PDFs help me understand what the layers are used for.
One thing I’ve ignored through this, what to do with those MXDs that I cannot open at all? While I do have PDF versions of those MXDs, I have no tool to open them on Mac and even if I could, the pathing is probably a mess anyway. It bring up the point that the hardest thing to archive is cartography, especially if it is locked in a binary file like an MXD. At least in that case, it isn’t too hard to find someone with a license of ArcMap to help me out. But boy, it would be nice to have a good cartography archival format that isn’t some CSS thing.
I’ll be honest, I really don’t follow Esri as closely as I used to. Not so much in that I don’t care to learn about what they are working on, more just that they do so many more things these days. It’s honestly hard to follow along sometimes, but every once in a while I see something that catches my eye.
Esri is now offering a new option in our Community Maps Program for contributors to have Esri share their data with selected Esri partners and other organizations (e.g. OpenStreetMap) that maintain popular mapping platforms for businesses and consumers. If contributors choose to share their data with others, Esri will aggregate the data and make it available to these organizations in a standardized way to make the data more easily consumable by them and accessible to others. It will be up to those organizations whether they choose to include the data in their mapping platforms. Where the data is used, attribution will be provided back to Esri Community Maps Contributors and/or individual contributing organizations.
I have to admit this intrigues me. Not so much that Esri is trying to insert themselves into a process, but that it makes sharing data easier for users of Esri software. In the end that’s probably more important than philosophical differences of opinion about closed fists and the such. The data is shared via the CC by 4.0 license that Esri uses for the Community Maps AOIs. I really like this, anything that helps share data much easier is a good thing for everyone, including OpenStreetMap. I’m sure we’ll hear more about this during the Esri UC later this month but it’s still a great announcement. I’ve always been a big users of OSM and getting more organizations to update their data in OSM is a huge win in my book.