Let’s Save Metadata

Metadata

When you see the word metadata I’m sure you begin to sweat. You get that lump in your throat and suppressed memories bubble to the surface (none of which are good).

They can get you at any time

Now it isn’t hard to think about why, metadata as we’ve been exposed to is just not human readable and thus barely human usable. Working in the government sector as a consultant exposed me to the worst two words that any DoD consultant can be exposed to; “metadata required”.

We deal with four letter acronyms all the time right? FGDC Even the website is built on Plone which of course feels more like Ivy League research project than the traditional SharePoint website we’d all expect from a government website. One should be scared navigating it and trying to find information. Anyway what about metadata as we’ve been utilizing it (FGDC or ISO) is just so painful?

Machine Readable vs. Human Readable

So FGDC or ISO metadata is complex, but there could be good reasons for this. They both try and address every conceivable possibility that might need describing in geo-data. If both were primarily designed for allowing servers to talk with each other, I’m not sure any of us would have any problem with it (nor would we really be looking for it). But servers rarely read and write metadata on their own without human interaction. Thus the reality of the situation is we poor humans have to ingest and parse metadata regularly.

Yikes

Well this brings me to what I see as the biggest problem with metadata. It is almost always in XML format. Now don’t get me wrong, XML does have its purpose. In fact I could list probably thousands of times that XML is the right answer. Sometimes it works and works well, other times you end up with a whole bunch of brackets and text that blends together. With a good eye you can parse out what you need, but there is so much noise there that it almost feels like a “Where’s Waldo” exercise. But XML does do a good job of organizing data for machines, but it doesn’t do it in ways that are easily readable.

What Human Readable Metadata Should Focus on

So some person sends you a dataset for a project you are working on. There are some questions you want answered before you commit to using the dataset:

  1. Who is responsible for the dataset?
  2. What is the dataset representing?
  3. When was it created?
  4. Where are its extents (projection, datum, etc)?
  5. How was it created?
  6. Why was it created?

The problem with metadata today is those questions are hard to parse out of metadata. If you know what to search for you might be able to find it relatively quickly, but the simple fact is that if I want to see the those answers above for a dataset, they should be exposed to me first.

Metadata Style Sheets

One way people have tried to make FGDC metadata (and ISO to some extent) more readable is through the use of style sheets. Many ESRI users are exposed to this inside their ArcCatalog. That drop-down list that lets you choose different ways of viewing the metadata is a style sheet selector. This means that you can take that ugly XML metadata and parse it out in ways that are easier to read. I’ve not seen much in the way of usability improvements on this front. At WeoGeo we offer human readable metadata on our dataset information pages. Others are doing it as well, but there is really no standard as to how this should be organized.

So Who Cares About FGDC/ISO?

Honestly you really shouldn’t care. You should care though about getting information describing the data you are working with. I think most of the issue with both metadata standards is that they are just too hard to input data into and too hard to get out the relevant information. Committee designed standards such as these always end up being way too much for real world use. We need to make sure we get the who, what, when, where, how and why of the dataset and to do this we need to look at the geo-data creation tools and how they help us input metadata. Data creators should have an easy time filling out those 6 things about their data. The issues are in the weeds of the metadata standards. But out on the fringes of the metadata requires, creation tools (such ArcCatalog) can help us manage things. Databases should be tracking who created the data (their name/address/etc), when it was last modified, any look up tables, aliases for field names, links to additional information and anything else that is being used for that dataset. Not having to track all that down gives the creator of the data enough focus to make the who, what, when, where, how and why so much better than they would if they had to enter everything.

And on the display end of things, I’d like to see UI experts work at creating better human readable metadata style sheets that hide the details that you don’t need to see at first glance and expose what we as uses of data need at first glance. It is easy enough to expand the details “below the fold” of a metadata page.

What Now?

It is up to all of us. We are stuck with the metadata standards so changing them at this point isn’t feasible. At WeoGeo we’re committed to working on bringing complex/detailed FGDC/ISO metadata to users in easy to digest methods. What I’d like to hear though is from others trying to crack this same nut and see if we can collaborate on this more and in this age of NSDIs still have usable metadata for people to make decisions.