Trying to unlock transit data

Something that has been bothering me somewhat these last few months is transit data and how we as a community can have access to it.  Here in Phoenix, we are unlucky enough not to have access to Google Transit, Valley Metro is locked in to the 1990′s with their system and you should see how bad the mobile version is.  So many of us here have been trying to free the Valley Metro data to get it incorporated into OpenStreetMap or even Google Transit.  Since I’m not blogging about how successful that effort has been, you can guess that we are stuck back in the last decade still.

But that isn’t the whole story right?  Just getting transit data into Google Transit is great for users of Google services, but not good for the community at large.  Google has been really good at getting data locked up behind government bureaucracy, but they’ve done nothing to help free this data beyond using their own APIs.  But at the same time, is this Google’s responsibility?  Google seems to say they try to get people to share it, but it isn’t their job.  I think I tend to agree with them.  If an organization is willing to share data with Google, why aren’t they willing to share it with the public?  That is where these organizations fall down and it would appear where public transit is a huge problem.

Public transit organizations are looking at protecting their data from being used by others.  It isn’t everyone, but it does send a message that if you as a member of the riding public want to access public data, you need to do so under an application approved by that transit organization.  I find it amazing that DMCA can be applied to public data and it should send chills to anyone who wants to use data they as a taxpayer are funding.  Google is able to negotiate deals because they have the money and the eyeballs that transit organizations want.  I just don’t like the path that we seem headed down where if I want to access public data that I helped create… I need to do so using a commercial API and possibly have to pay a vendor (paying twice is a tax on a tax) the right to use what should be free.

Stay away from that slope, you wont like where it will lead!

Keep away from that slope, you won't like where it will lead!

Don’t Give Away the Farm!

So Google and ESRI will allow indexing of ArcGIS Server services by Google (and anyone who crawls the web).  So what does that mean moving forward.  It really isn’t big news if you think about it because this “feature” (service description) is already enabled in ArcGIS Server 9.3.  The problem with this is no one has really been thinking about what this means for everyone.  If you expose these metadata pages to the Google Bot, you’ll be opening up your services to the world. 

Now don’t get me wrong, this is a great thing.  As a user of data, I’m always wishing that I could search datasets using Google rather than the haphazard way we do it today (luck has more to do with it than anything), but data providers will lose control of their datasets.  Plus how do you monetize your information in such a world?

There are two types of organizations on the Internet, those who want to work with Google and those who don’t.  A great example of a company that isn’t allowing Google to index their pages (well beyond the Whitehouse) is Facebook.  You never see Facebook results on the web and that is probably why they have been so successful.  Giving away your data to Google can be dangerous to your business model. 


Make sure you hoard your geospatial data

That said, I’d like for everyone to expose all their data on the Google so I can perform my job much easier. Maybe I’ll be surprised and there will be millions of new datasets available from ESRI servers by the end of the year, but I’m not holding my breath.

The GIS Interchange File

All too often we have to request people resend datasets to each other because they get blocked by email, one important file gets left off or systems just don’t recognize a file type. I’ve run into a problem today where a company FTP site is rejecting a shapefile because it doesn’t recognize the .shp, .shx, .dbf extensions. I thought I could get around by zipping the data, but it appears to scan the zip file for extension types. So the “solution” was to zip the shapefile, change its extension to .doc and tell the recipient that they need to change the extension back to .zip.

This kind of stuff happens way too often. Personal Geodatabases have the problem of the .mdb extension that is rejected outright by most email systems and other formats aren’t readily usable by folks systems. The “old days” were easy because we all used coverages and shared them via the .e00 format that was almost always acceptable by everyone. Amazing how we take such steps back over time and you’d think data sharing would be easier than it was in 1995.

How do you folks share data? KML, GML, Etch A Sketch, e00, zip, web services, etc?

Update: Jason Birch has some ideas about using SQLite as an interchange format. Well worth the read.