Data.gov is already broken — just like everything before it

Like most people (I assume), I was doing a little GIS project SuperBowl morning. Needing some data, the first place I thought of going what the new [Data.gov] site to download some data. After doing a quick and simple search, I got the [dataset I wanted](http://www.data.gov/details/12) ready to download. But as with every government data repository before it, it is broken. Posted datasets download links are many times 404:

![Broken download](http://images.spatiallyadjusted.com/fws-data-flyways.png “FWS data is 404″)

It just isn’t the download, but the [metadata](http://www.fws.gov/data/migflyway.html) as well. I know, some datasets still work and who knows, maybe this one will again one day. But for [Data.gov] to be valuable it needs to ping the data sources to let the users know that they are down (and for web services [what percentage](http://registry.fgdc.gov/statuschecker/wmsResultsReport.php?catalog=gos) they are down). Also it wouldn’t hurt to let the owner of the data know that their datasets are no longer linked correctly in the Data.gov website. Otherwise we’ll just get [link rot](http://en.wikipedia.org/wiki/Link_rot) and that can kill a project.

If projects are going to be built on data discovered with Data.gov, much more has to be done to ensure that this data is available consistently, not when people get around to updating broken links. If things don’t change it is another waste of taxpayer money and we’d just have been better off sticking with the [previous government data boondoggle](http://gos2.geodata.gov/wps/portal/gos).

[1]: http://en.wikipedia.org/wiki/Link_rot “Link Rot”

About James Fee
Chief Evangelist for WeoGeo.com

9 Responses to Data.gov is already broken — just like everything before it

  1. Link rot is like foot fungus. Potential is always there, sure, but the cause is poor hygiene. (@sgillies).

  2. Archie Belaney says:

    Last spring, agencies threw whatever was handy onto Data.gov to answer the mail from the newbie politicals. They hated doing it, that’s why most of the content was/is marginally interesting.

    You know what comes next?

    “It’s not my job”

    Sigh.

    BTW – let’s ask OMB how much money has gone into GOS in the past years, and what the hit rate is on that dreadful cobweb. Might be fun to file a FOIA request to the USGS for hit statistics, normalize out the bots and see what’s left.

    Betcha it’s tens of hits from ‘real’ users every month and tens of thousands of hits from Google’s bots looking to scrape content.

  3. wow says:

    wow – not getting any action james? – one broken link and you pronounce the end of data.gov – the cloud – soa and the whole internet thing… how about just sending an email letting them know there is a broken link.

    • Archie Belaney says:

      Huh.

      Wow, apparently you’re not an active user of either data.gov or GOS. Have you ever used either? You also appear not to have understood the post…the point is, Wow, that the site should have been set up so folks in the user community [don't] have to send emails to prod the admin to keep it running right.

      Data.gov is just another lipstick-slathered, porcine PR blast that’s another DC-irrelevancy-in-the-making. Honestly, expecting a government agency to post information that would make them accountable and be shared with others…

      Methinks thou doth protest too much, Wow.

      Now go fix those links.

  4. AlbertW says:

    I was working on a project last month and ran into a couple broken links (csv data, not shp). I can’t find them in my browser cache so I’ll see if I can find them when I get into work tomorrow.

    I hadn’t heard of the term link rot before, but I think this is going to be a huge problem with Data.gov moving forward.

  5. Dave Smith says:

    Federal agencies need coherent policies, and a consistent means of publishing, organizing, accessing and cataloging data and metadata. These efforts, GOS and Data.gov are a work in progress, presumably to be combined at some point and further improved.
    With GOS, the service status checkers are supposed to ping the assets to check them. That capability can and should be bolstered – along with better ways to manage data and metadata lifecycle.
    Did you post a suggestion to have data.gov periodically check the status and availability of posted assets to the ideascale site that was set up to solicit comments and suggestions? → http://datagov.ideascale.com/

    • Lefty says:

      Dave, other than your link to that ideascale.com site, how would an average user know to go there to give feedback?

      • Dave Smith says:

        I knew about the IdeaScale from working as a federal contractor involved in agencies, with feds who are working on getting their data submitted to Data.gov – I’d agree, it probably wasn’t as broadly communicated as it should be, it’s likely just the metadata wonks like me who have been tracking it – though the feds have used various IdeaScale sites for soliciting various comments and ideas for a year or so now. The IdeaScale site should be posted on their main Data.gov site and communicated via other means as well. That’s another suggestion that needs to be posted…

      • Dave Smith says:

        In checking the Data.gov site, there is in fact a link to the ideascale site – on the right hand of the Data.gov page, it says:

        Join the Dialogue

        When Data.gov launched last May, we promised to evolve this initiative with the public. We are taking that step, asking for your review and input on the design document and soliciting your ideas. Read More.. 

        Clicking that link, in turn, takes you to the data.gov ideascale site.