Don’t Give Away the Farm!

So Google and ESRI will allow indexing of ArcGIS Server services by Google (and anyone who crawls the web).  So what does that mean moving forward.  It really isn’t big news if you think about it because this “feature” (service description) is already enabled in ArcGIS Server 9.3.  The problem with this is no one has really been thinking about what this means for everyone.  If you expose these metadata pages to the Google Bot, you’ll be opening up your services to the world. 

Now don’t get me wrong, this is a great thing.  As a user of data, I’m always wishing that I could search datasets using Google rather than the haphazard way we do it today (luck has more to do with it than anything), but data providers will lose control of their datasets.  Plus how do you monetize your information in such a world?

There are two types of organizations on the Internet, those who want to work with Google and those who don’t.  A great example of a company that isn’t allowing Google to index their pages (well beyond the Whitehouse) is Facebook.  You never see Facebook results on the web and that is probably why they have been so successful.  Giving away your data to Google can be dangerous to your business model. 


Make sure you hoard your geospatial data

That said, I’d like for everyone to expose all their data on the Google so I can perform my job much easier. Maybe I’ll be surprised and there will be millions of new datasets available from ESRI servers by the end of the year, but I’m not holding my breath.

About James Fee
Chief Evangelist for WeoGeo.com

38 Responses to Don’t Give Away the Farm!

  1. Barry says:

    We’ve been thinking the same thing since I saw the video. What is in it for us trying to make some money off the data we create? I’d love to expose our datasets via Google, but we need to be helped. Google ads aren’t how we make our money.

  2. Organized_Chaos says:

    This seems to be a good opportunity for ESRI to step to the plate with an ‘Arcweb Services-Like’ model that they use with data providers of arcweb services. ESRI has a ‘per-click’ model in place now to handle data access to premium data.
    This could be expanded and then exposed via Google to the clamoring hoards…
    Sounds Interesting.

  3. Allan Doyle says:

    What’s the #1 geo question of all time? “Where can I find data to help me do such-and-so” would be way up there. Why shouldn’t I be able to use Google to find data, even if it returns links to non-freely available data? When I’m looking for a new external hard drive, I google for what I need and don’t expect to simply download the drive. I expect I’ll have to order one.

    The other day there was a story on National Public Radio about a medical researcher who found some new treatment or another. During the interview, he said the first place he looks for drug information is Google.

    Why should spatial data be any different?

  4. Frank Warmerdam says:

    > You never see Facebook results on the
    > web and that is probably why they have
    > been so successful. Giving away your
    > data to Google can be dangerous to
    > your business model.

    James,

    I think you are missing the point here. Facebook data is kept out of google because Facebook honours peoples desire for privacy or at least controlled disclosure. There are clearly a lot of people who want that limited disclosure. I don’t believe it is about Facebook controlling access to data they consider proprietary.

    So the analog in the geospatial space is that folks need to consider whether the data they are considering publishing has privacy issues associated with it. If yes, then they can no longer depend on privacy-through-obsurity since their datasets are likely to start turning up on google searches. OK, we really never should have depended on that.

  5. James Fee says:

    You sure about that Frank? Linked In shows up on the Google search and proably because they are far down on the social networking scale. If they had the kind of views that Facebook had, they’d lock out Google as well.

    Facebook doesn’t let you export out your contacts lists, they may call this “privacy”, but I call it controlling their proprietary data.

    We’ll see Frank how much of a deal this makes moving forward. I suspect that we won’t be seeing much change and silos such as the Geospatial One Stop will continue to exist.

  6. Matt Giger says:

    Finding data and having the license to use it are two very different things. I could really use some of the great ESRI datasets (like country borders), but unless one purchases a re-distribution license or even a license to display the data in a derivative work, it is useless to know it is there. The same holds true for all other non-public data out there.

  7. Lefty says:

    Our organization had a meeting about this last week. The directors saw the ESRI/Google video and freaked out. I told them there is no difference between stopping Google from scraping our webpages and stopping Google from scraping our geospatial data. As a data provider, I suspect we’ll block Google because it is hard enough to compete against them already given how much “free” data is available on Google Earth/Maps. I’m beginning to see the writing on the wall with our business model. Selling data is only possible if you are the one selling to Google. Everyone else expects it to be free. Shame really because it devalues expertise.

    What was interesting is one of the directors asked why we have to opt out of Google. He things it should be the opposite (that folks need to opt in to Google search).

  8. Kevin says:

    Someone correct me if I’m wrong, but I thought Facebook and Google DO work together if you set it up in your profile. This was news a few months ago.

    I’m locked out of FB right now from work, but I thought there was a setting in your profile policy where you check if off, and someone searching for “Bob Nobody” in google will come across your facebook page….of course based on other settings you’ll either have to join facebook, or be a friend of the person to see their page (unless they’ve opened up their profile to everyone).

  9. Lefty says:

    Kevin, you can see someone is on Facebook, but not any of the information that is of value. I think this is the same thing we are dealing with today with geospatial information.

    You can search for it, but it isn’t indexed in Google Maps/Earth.

  10. Bill says:

    Sounds like the OGC Catalogue Service just became obsolete.

  11. Sean Fulton says:

    This is, essentially, a non-issue.

    “how do you monetize your information in such a world?”

    The same way people do with any other kind of content.

    1. Charge for access. As someone already noted, having your data search-able doesn’t mean you have to give away access.

    2. Sell advertising. If you don’t then someone else will come up with a way to sell ads that go along with the data. Picture that Google Earth demo with a big Nike swoosh laid on top.

    3. As a sort of loss leader/traffic driver/advertisement for your services. You put out a little data for free to lure them in to your consulting business or web-site.

    4. You don’t. Government agencies and other orgs that give data away for free should jump on this. They could probably get usage statistics that will help get further funding for more data.

    For most media, the you’re- giving-away-your-content-for-free! scare died somewhere in the late ’90s – early ’00s as soon as someone figure out how to make money giving it away. The geospatial field hasn’t caught up yet but it will. Someone will come up with a business plan that makes it work. Once they do, many will follow.

    Sean

  12. Frank Warmerdam says:

    Lefty,

    I think you should tell your director that he opted in to Google when material was published on the public internet. If you want to only make it available to some users of the internet setup account based control!

    James,

    re: facebook – I’m not sure of all the details but my understanding is that most facebook pages are only visible to established friends other than some absolute bare minimum information. They are essentially not public. And of course, it is also possible that via robots.txt that Facebook asked google to exclude large parts of their site. Linkedin obviously takes a different approach and is less interested in protecting privacy than it is in promoting members and itself.

  13. Barry says:

    “1. Charge for access. As someone already noted, having your data search-able doesn’t mean you have to give away access.”

    Maybe I’m missing something, but how can you expose your ArcGIS Server service and still restrict access to the data? Everything I’ve seen so far says either it is exposed or it isn’t. No middle ground.

    “2. Sell advertising. If you don’t then someone else will come up with a way to sell ads that go along with the data. Picture that Google Earth demo with a big Nike swoosh laid on top.”

    Sell ads, is that what we’ve come to these days?

    “3. As a sort of loss leader/traffic driver/advertisement for your services. You put out a little data for free to lure them in to your consulting business or web-site.”

    We’ve been squeezed by Google (and Microsoft) with their free data layers already. There isn’t any juice left to give away. Plus those who provide data to us to resell will proably just go the Google route anyway and take Google’s money over ours.

    “4. You don’t. Government agencies and other orgs that give data away for free should jump on this. They could probably get usage statistics that will help get further funding for more data.”

    We’ll see, they should jump on this because their data is public domain.

    “For most media, the you’re- giving-away-your-content-for-free! scare died somewhere in the late ’90s – early ’00s as soon as someone figure out how to make money giving it away.”

    Sure, those who have enough eyeballs get to sell ads. Our market is so narrow that ads wouldn’t pay for the time it took to put them in there.

    “The geospatial field hasn’t caught up yet but it will. Someone will come up with a business plan that makes it work. Once they do, many will follow.”

    So professionals selling data to professionals is out the window. The time is for VCs with their money to fund non-sustaiable business plans with cash?

  14. Lefty says:

    @Frank: Yea I explained it all to him. I said if you leave your front door open and ask everyone to come it, what did they expect?

    I think they’d just rather shut down the internet tubes. ;)

  15. Ben R. says:

    “Plus how do you monetize your information in such a world?”

    As mentioned above, you don’t. The sustainable model in the digital age is in offering expert, specialized services rather than trying to produce artificial scarcity via DRM or hiding yourself entirely behind a firewall.

    Go on any given piracy website and prepare to find the latest TomTom or other proprietary maps/data. Undoubtedly there are more specialized and discrete sites out there just for geodata (or if there isn’t then there will be when the demand exists).

    This kind of thing can be mitigated by making your product very accessible (easy to buy) and superior to the free alternatives. Working with Google or other distributors to set up a one-click purchase system for geodata together with free samples and an open format. Continue to improve by offering semantic and interest associations (people who bought this also bought this, etc).

    Young people – time rich and cash poor – will still pirate the heck out of it. But when they join the business world they will be pitching your data to their bosses/clients. And they’ll be buying it themselves because they will eventually be relatively poor in time and rich in cash.

    Selling to businesses will still be perfectly viable though (they assess the risk of piracy a little differently than individual actors), but the mass market is a different animal.

  16. shawno says:

    Nice thread James.

    The portrayed, orthorectified, seamed data is one use case for the consumer market. It’s really only a visualization product made from the imagery. As Frank has pointed out, it puts highlight on the requirement for service security. The majority of data providers DO NOT participate in Google’s imagery mosaics, only some large ones with large coverage. They also still have robust business model and SELL data. It’s questionable if the contributors are getting the value from this “partnership” that they were hoping, but thats a different thread.

    They would essentially go out of business with the premise presented in this thread, then we would have no data.

    A “Google” search engine access is great for publicly published data! In no way would I call it an SDI. It’s a good way to consolidate the rats nest of data that is available but difficult to find, but has many unsatisfied use cases for SDI’s. The majority of data is not public and will not be published for free….ever.

    It’s provides a pretty picture…but very limited ability to provide determinative answers, even in the GIS world.

  17. Lefty says:

    As mentioned above, you don’t. The sustainable model in the digital age is in offering expert, specialized services rather than trying to produce artificial scarcity via DRM or hiding yourself entirely behind a firewall.

    Depends on what you are selling. If you fly aerial imagery, who pays the pilots for their service or the fuel to put them in the air? Google I guess….

    Working with Google or other distributors to set up a one-click purchase system for geodata together with free samples and an open format.

    There is the problem, who wants Google involved with everything, not I? So give the keys and the lock to Google? Yikes!

  18. Ben R. says:

    Depends on what you are selling. If you fly aerial imagery, who pays the pilots for their service or the fuel to put them in the air? Google I guess….

    They seem to be doing okay so far in providing it free. The intention is to sell something else entirely: sponsored ads (though I’m sure they make a bit back with selling Pro/Enterprise).

    There wouldn’t be anything stopping someone from selling imagery in the method I suggested, they just need to scale their costs depending on how well they can control distribution. I would recommend dirt cheap individual licenses, particularly for students who are time rich/cash poor, which would be somewhat subsidized by business customers in order to breed new business customers down the road.

    This isn’t a new idea at all. It is how Photoshop has become the standard (and verb!) for image manipulation.

    If they are smart, they will get ahead of the issue before the whole population is used to getting the best data for free forever from file sharing networks.

    There is the problem, who wants Google involved with everything, not I? So give the keys and the lock to Google? Yikes!

    Google was just a convenient example. It could easily be VE or something cobbled together by the providers/creators themselves (see Hulu). An alliance of providers could set their own terms – the chance to be the iTunes of geodata is something few capable companies would refuse.

  19. anon says:

    Always got to put a turn in the punchbowl, eh James?

    Personally, I fear my new Google overlords.

  20. shawno says:

    There’s a major concern here that is fundamental to any “Enterprise”-class software…SECURITY.

    I dove head first into defining “Enterprise Software” on my blog:

    http://owston.blogspot.com/2008/02/enterprise-software.html

    No Security is a gaping, massive, profound enterprise No, No. It’s a fundamental base requirement.

    The race for data providers to produce higher fedelity data more frequently for much larger coverage areas is also increasing. Correlate this with the sell of sensors and the number of satellites going into orbit. The quantity of data will be increase massively over the next 5 years and there is no sign of consolidation to a single vendor (i.e. Google) or for “free”.

    If your business model doesn’t look good because of Google, then a new 5 year plan needed to be created like 2 years ago.

    I usually avoid comment to avoid the “spam” chastizing, but I thought it would be important to dispell the paranoia. ;)

  21. anon says:

    An alliance of providers could set their own terms – the chance to be the iTunes of geodata is something few capable companies would refuse.

    Like WeoGeo does?

  22. MTBMaven says:

    I’m not much of a blogger (as one can tell by the number of posts on my blog) but I posted something today with my thoughts on this issue. After watching the presentation by John and Jack at the Where 2.0 Conference and my recent experiences with Google have got me thinking a lot about this issue. More can be found here: http://giscogitations.blogspot.com/2008/05/government-gis-going-public-20.html

  23. FantomPlanet says:

    Why don’t you form a farmer’s co-op for geodata? You can have data elevators, etc. Some VC can be the bank.

  24. ChrisW says:

    Sorry, but I don’t see the problem – this is mostly just a standard data management/security issue, overlaid with a lot of hype. My bank has lots of data, but they don’t put all of it online, thankfully. Google Earth displays lots of online satellite imagery, but if I want the real stuff I still have to pay Eurimage or whoever. Or go to Google Scholar, another catalogue service providing access to a whole lot of specialised information, but you’ll find that you often only get access to a summary and have to pay if you want the detailed stuff.

    Even access to Google Maps is controlled through the API key. They could start charging for a licence any time if they wanted.

    For comparison, the UK government also has lots of data (and wants more), and they do sometimes put it online accidentally, because they’re idiots. So don’t be an idiot.

    Unfortunately for those of us who want to use UK geodata, the Ordnance Survey is not quite so stupid. And if the OS catches me swapping OS data with my unlicenced pals, then I guess their lawyers will dump on us from a great height.

    Online access to data and services is a great idea, although I wonder how far people will want to put complex GIS processing services online (how many professional-grade, publicly accessible WPS services do you know of?), or how far other people will want to rely on them. If you’re making your data/services public, make sure your legal insurance is up to date.

    As for data, if you don’t want to put your data online, don’t. And hang some garlic over your door. Then Google will be unable to steal your soul.

  25. AlbertW says:

    @ChrisW It won’t be a problem because no one will be putting their data online to be indexed. Who in their right mind is going to give up their data to Google?

  26. MarkB says:

    Who in their right mind is going to give up their data to Google?

    At least both of the following requirements must be met:

    1. Liability isn’t an issue or that the associated risks are outweighed by other benefits.

    2. Bottom line positively tethered towards wider use (free or otherwise). Either profit driven (sell more services) or mission driven (have greater impact) bottom lines or some combination there of.

    Much of government and non-profits will be in this camp. Though wider use will likely have a net gain in funding for these entities (not entirely independent of monetary issues).

  27. Paul Bissett says:

    I have read with great interest the comments on this post. It seems to me that there may be two separate issues being discussed. The first is search and discover and the second is monetization of content. While related, they are not necessarily the same.

    The ESRI/Google deal did nothing more than enable indexing of exposed ArcGIS Server services. This is definitely an advantage for anyone wishing to peer into the available services that are currently exposed to the web. If you don’t want your content “discovered” don’t publish the metadata HTML page. I am not an ArcGIS Server expert, but I would be curious if you could still publish a service without this metadata page. This would allow you to have an open feed, but not be indexed by Google.

    The other issue is monetization. This is the tricky one. Google’s (and other search providers) revenue model is built on advertising. Vincent Tao from Microsoft’s VE said at the last Location Intelligence meeting that the ad market is ~$250 billion, and that all current internet advertising combined accounts for maybe 10% of this market. The current yellow pages market is approximately equal to this value and is inherently a location based service. One of their strategies is to double their available ad market by taking over the current yellow pages business.

    What this means is that these guys are going to give away location based services in order to secure yellow pages type advertising. If your geo-content can be turned into a reasonably demanded service that can be coupled to location based ads, consider Google and Microsoft your new competitors.

    However, what are your monetization options if you create quality geo-content that can be distributed via services or file delivery, but may be beyond those related to location based ad revenues? This is the theme that ChrisW above discussed, as well as mentioned by James, Ben R., and anon. This is the exact problem we at WeoGeo are trying to solve. I’ll not spam this list by talking any more about our solutions, but feel free to contact me directly if you wish (through my blogs).

  28. Adam Conner says:

    I think the people who are trying to close their doors to being indexed/searched are only limiting their potential revenues. It seems like quality free data is a great way to advertise. If your company is the definitive source for a dataset, any user who wants to do analysis with it, or include it in a custom application will come to you for those added services.

  29. Paul Bissett says:

    @Adam – While I agree in principal with you, we have discovered that a large amount of survey, engineering, and architecture work is accomplished under professional services agreements. The ownership of that content does not typically resides with the content generator. In some these cases, it is unclear if the content generator can even acknowledge that the work was done.

    The historical professional services contract just does not deal well with easily distributed digital content. I think the construction, survey, and engineering content providers are going to have to re-think some of their business models as we move into a new digital mapping age.

  30. shawno says:

    I would estimate that the “free data” from Google was generated from less than < .00001% imagery and vectors available today. Imagery and Vectors also only constitute < 0.01% of Geospatial Information (GI). Somebody smarter than me with a lot more time on their hands can do the “commercial” imagery calculation and come up with a more accurate number (and takers?).

    GE is a great visualization product! It has also brought into focus the consumer markets better understanding of spatial phenomena and the wealth of spatial information “at their fingertips”.

    I think an alternative argument can be made that it has “expanded” the geospatial market into a larger community and a new realized customer base.

    “Your” data is very important! It has a documented “accuracy” (I hope), can be used for more than visualization (analysis), provides a greater time extent and is yours to distribute under your rules.

    The only people losing in this is the makers of competitive visualization clients. They don’t meet the expected performance watermark set by GE or the ability to scale to the mass of users. The gap with this will closing quickly as well.

    In the end, I predict an increase in Pro Services for spatial technology. I also predict an increase in spatial data products available to the consumer (new sensors, new data types, new business models for getting access to data). I also predict a massive increase in “disruptive” technologies. This is GREAT for the market as a whole and our industry.

  31. MTBMaven says:

    @Adan Conner, re: I think the people who are trying to close their doors to being indexed/searched are only limiting their potential revenues.

    Not all of us with geographic content are in the business of selling our data, nor can we.

    Municipal governments in particular are subject to public information laws, which can prohibit us from selling data owned by the public. (This is not the place to discuss the public’s right to government spatial data except to acknowledge this issue is not a hard and fast rule).

  32. SEWilco says:

    Maybe you need to look more closely at how other people are providing data to Google. Many publish everything. Some allow the googlebots to crawl everything, but other visitors get a login or summary page. Some allow the googlebots to crawl the first paragraph or a summary page but not the entire data. Over in Google Books, publishers are making many pages of books available but visitors are limited browsing a few pages (and some pages aren’t available at all in the Google Books preview).

    What is needed is similar levels of authorization and permission as a web server provides. Then geo servers could be told to let the googlebots access only the first 100 records, give visitors only the metadata, let visitors browse half of Manhattan but no more, etc.

  33. Paul Bissett says:

    @MTBMaven – I think you have hit on the crux of the Google/ESRI “deal”. A large fraction of ESRI’s revenue is through government services. Nearly every government office that I have been associated with has an ESRI product. ESRI’s focus in this deal is not necessarily to enhance the revenue stream of content providers. Instead, it is to provide feature enhancement and functionality to a major customer base.

    This is a smart move by ESRI. It keeps their public service customers happy by allowing them to provide enhanced index, search, and viewing capabilities, without any new major investments. Its a good deal for Google because it gets them a new free data feed from tens of thousands of quality controlled geo-content servers.

    However, their is little here for commercial content producers other than perhaps enhanced marketing of their products. Jack’s statement in the presentation was that this new search capability allowed quality content to be mashed up for new value-enhanced products. He did not mention who was going to pay for this new value-added mash up…

  34. ChrisW says:

    It’s not just ESRI, GeoServer 1.7 is aiming to make itself crawlable for Google (sounds kind of kinky…):

    http://blog.geoserver.org/2008/05/13/geoserver-and-googles-geo-search/

  35. Powered by Wires says:

    Data doesn’t keep the doors open and the lights on. It’s what you do with the data that makes it valuable. Remember also that we don’t live in a static world. Data becomes outdated and has to be constantly updated to stay relevant and usefull, just like a blog. I see this as an opportunity rather than a nail in the coffin. What “data availability” via a machine like Google really means is a new power to combine and analyze data sets that were never available across different industries. It’s about having the resources to create new solutions. Have no fear. Besides, if you do a Google search for “GIS data” right now, you will end up with 3.7 million results. 3.7 million. If this is your worst nightmare it’s already happened. Just count your lucky stars that most people have no idea how to dive into a pool that large and return with something meaningful.

  36. ChrisW says:

    Here’s another angle on this debate ( http://www.guardian.co.uk/technology/2008/may/22/freeourdata ) : apparently the EU’s INSPIRE directive requires government agencies etc to make their spatial metadata/catalogues visible and interoperable, although it seems there will still be scope to charge for access to the data itself. The UK is still trying to catch up with the US on the idea of geoportals for public data, but I wonder how far the Google/ESRI/Geoserver etc approach to searchable spatial data/services will play a part in future developments here in Europe?

  37. shawno says:

    ChrisW…nice that you bring up the INSPIRE directive, because it’s a “cataloging” service in its first phases, not a “mapping” service. There is a big difference here. It’s based on the CS-W OGC standard. It’s an interoperable Web Service for Search, Update and Retrieval of a cataloged hierarchy of Aggregate and Dataset metadata, not the services to access the “data” itself. Step one is to deliver the ability to “discover” the appropriate data that you are trying to find.

    The EU sensor processing stations will “auto” populate a centralized catalog for all the sensors into a single, searchable and secure Geospatial Information Catalog. Direct from the sensor -> Catalog -> “Pay Per Access” Interoperable Web Services Model.

    This provides a very “detailed” discovery capability, with the ability to search any ISO metadata entity for the information regarding the data collected by the sensor (spatial search, Time Extent, Processing Lineage, Sensor Information, cloud cover, processing level, anything really!!).

    This is leagues ahead of simply creating search results based on the “layer metadata” exposed by a mapping service. It will essentially provide “access” to 100′s of Petabytes worth of distributed data, real time.

    It’s really innovative and a totally different approach, emphasizing Internationally accepted standards and interoperability.

    The “search” is free, the access to the data is of course a revenue business model.

  38. ChrisW says:

    shawno: Thanks for the extra info on INSPIRE. This looks like a good time for a European like me to be getting into GIS. :-) )