Super fast geospatial analysis

Peter Batty poses an interesting question:

If you could do geospatial analysis 50 to 100 times faster than you can today, what compelling new things would this enable you to do? And yes, I mean 50 to 100 times faster, not 50 to 100 percent faster!

Just think about that for a minute, it blows one’s mind. I’m pretty sure someone reading this blog might have a good case study for Peter (below Peter says this isn’t hypothetical so if you’ve got a great need for such processing, email him your needs).

Wouldn’t it be better to be the Road Runner instead of being Wile E. Coyote when running spatial analysis?

About James Fee
Chief Evangelist for WeoGeo.com

29 Responses to Super fast geospatial analysis

  1. Lewis says:

    Turtles are more restfull

  2. Ed says:

    Well, it would certainly cut into my coffee breaks, and that’s not cool.

    Perhaps ArcGIS Server would actually work as advertised?

  3. n314 says:

    nothing else thant being succesfull with a dissolve operation which can’t be proceed without (it’s the esri support response) a machine with 16 go of ram…

  4. Peter Batty says:

    Hi James, thanks for the link! I thought I would just add a couple of things here that I have also added to my original post and comments. One is just to assure people that this is not just a hypothetical question, though I can’t go into more details just yet. And secondly to clarify that I’m really talking about database-centric analysis – the types of problem where a large portion of the analysis could be formulated in spatial SQL (spatial and non-spatial selection, buffers, intersection, aggregation, etc, etc). And I’m looking for specific applications involving large datasets (millions or billions of records) where there would be a high business value in being able to run these types of analysis 50 to 100 times faster than you could using current database technology (or perhaps where you wouldn’t even try to ask the question today!).

  5. Lefty says:

    A google search on the terms Peter Batty is throwing out results in:

    http://www.vertica.com/

    Are we talking about some sort of database appliance here?

  6. jxn says:

    this is what i do everyday-database-centric analysis instead of within a gis package

    160 million records enough?

  7. Dimitri says:

    That’s a very interesting question, and within, say, the next six months we will see real life examples of what people do with such capabilities.

    That’s about when the next wave of NVIDIA CUDA incorporations will come out of Manifold. For now, CUDA stuff within Manifold is limited to several dozen functions in surface processing. The next release will extend that throughout the system, including spatial SQL, database processing and, of course, vector processing.

    Given the companion extension of Manifold to multi hundred gigabyte data stores (if not terabyte storages) during this summer’s campaign we will see what happens in real life when people can do spatial analytics 50 to 100 times faster within very large databases, or for that matter, within GIS data in general whether it be vector, raster, database, etc.

    This should also be interesting because the extraordinarily low cost of NVIDIA hardware required to get such supercomputer performance makes it accessible to just about anyone. For a hardware cost under $1000 for 256 processors and a software cost for Manifold under $300, the very low cost of doing such work 50 to 100 times faster will allow very many people to try all sorts of different things that would not be economic to try if it cost more, like, say, tens of thousands of dollars. It will be interesting.

    My own view is that for all of the business benefit of doing spatial SQL 50 to 100 times faster in huge databases, which I do not deny will be useful, I think that such applications will be fewer in number than the many hundreds of thousands of unit volume opportunities to do more ordinary, but still highly useful, GIS tasks 50 to 100 or even 1000 times faster on a day to day basis, which the low cost of CUDA opens up.

    The data sets that ordinary folks doing GIS every day work with are getting bigger, with multi-hundred megabyte vector data sets becoming rather typical for drawings and gigabyte images routine. The ability to just open, edit, re-project, render and so forth such things 50 to 100 to perhaps even 1000 times faster will mean a lot to GIS operators working away in a county parcel office, drawing a new flood map, analyzing real estate locations for a commercial firm, etc., especially if getting that capability costs them just a fraction of what they are currently spending on annual maintenance alone.

  8. artlembo says:

    Just some experience out in the field with Manifold and the CUDA stuff Dimitri referenced. I just got a Shuttle computer (quad, 8GB RAM, nVidia 8800). I started playing with what I thought was a “big” dataset of 5M pixels. I quickly realized that was too small, so I had my students create a mosaic of 43M pixels. Here are the numbers for creating slope in Manifold with the 43M pixel DEM:

    64-bit CUDA: 4.6 seconds

    64-bit No CUDA: 11.2 seconds

    32-bit CUDA: 6.4 seconds

    32-bit No CUDA: 23.65 seconds

    So, we decided to make a bigger surface with a 143M pixel DEM and ran the slope on it . The results were:

    64-bit CUDA – 14.8 seconds
    32-bit CUDA – 20 seconds
    32-bit no CUDA – 89 seconds

    Thats still too fast, so we are working on a 750M pixel DEM. When I get those results, I’ll let people know.

  9. Lefty says:

    Dimitri, artlembo: I think Peter is talking about terrabytes of information. If I’m right about the type of company he is “representing” here, we are talking about big bucks for super fast analytic processes.

    Megabytes of data would be too small to run on such a system (and probably too cost prohibative). I suspect this kind of data analysis is for reinsurance companies who need to recalculate their exposure to storms that creap up in the last minute.

    Running surface analysis just isn’t mission critical enough or complex enough to pay hundreds of thousands of dollars for such analytical processing. The CUDA stuff is probably more applicable to what most people who read this blog are involved with, but probably irrelevant to what Peter has gotten himself involved in. (maybe he’s working with Teradata for all I know) As nice as shrinking 90 seconds into 15 seconds, we are talking about shrinking days and weeks into hours here.

    Its way over my head and what my organization needs/uses.

  10. artlembo says:

    Lefty,

    you could be correct. However, Peter talked about millions or billions or records, so perhaps he isn’t talking about the terrabyte level. Even so, those kinds of improvements will probably require a number of convergent activities (i.e. mult-processor, specialized indexes, etc.). Cool stuff, and cudos to Peter – it will be fun to see what he comes up with.

    Having said that, we’re certainly seeing some cool results using the nVidia card with tens of millions of records. In about a week we’ll be at the 3/4 billion mark, so it will be interesting to see what comes of it. I’ll let you know. I wish I had money for more RAM :-)

    As for surface analysis, sure, its not totally mission critical, but it is a start, and a glimpse on what we may see in the future. My nVidia card has around 120 streams. If those things get extended to vector analysis, then we are looking at some really cool opportunities.

    Alot of the watershed work I used to do was hampered by the fact that we really couldn’t bring in a DEM for an entire watershed – oftentimes, it was just too large. So, many of our results for hydrologically sensitive areas were suspect. Now, it looks as though we can make short work of this with 64-bit and CUDA.

    I’m hoping that if nothing else, the work we are doing now, and plan to write out, will motivate companies like Manifold to continue what they are doing, and cause other companies like ESRI and MapInfo to consider the next re-engineering efforts to include “true” multi-processor support – that is, of course if what we explore proves out to be of great benefit.

  11. Lefty says:

    artlembo: I don’t disagree with what you are saying. I’m just reading between the lines here and it feels like presentations we’ve gotten in the past from these “database appliance” companies who sell stuff at a large cost, but they provide performance that just blows you away. For most people this just isn’t cost effective.

    You are right though, we have a FORTRAN developer on staff to handle some of our GRID analysis because we can’t rely on “ordinary” GIS tools to provide our analysis. The simple fact that we pay a guy to run FORTRAN for us says there is a huge problem with GIS analysis.

  12. James Fee says:

    Lefty and Artlembo: I seem to be having database issues again so if your comment doesn’t show up right away, just bear with me and I’ll approve it.

    Both your comments showed up as written by me for some reason.

  13. Lefty says:

    Thanks James, I saw that after I hit submit it says you wrote that comment.

  14. James Fee says:

    It only shows up that way here on the page. In the database it shows up as your comment. It appears that it is working again.

    At least it isn’t as bad as it was last week when everything went missing. That made my heart skip a beat. :)

  15. Peter Batty says:

    Lefty, you are thinking along the right lines broadly speaking in terms of the type of things I’m looking at here – I’m definitely looking at pretty high end complex applications. Though with the current trend towards offering services “in the cloud”, there could be interesting opportunities for people to build large online databases of say detailed demographic information, and provide service offerings around doing analysis against those (combined with a user’s own data as appropriate), which obviously could open this type of technology up to a much broader market.

  16. Dimitri says:

    Hi guys,

    I think Peter is talking about terrabytes of information. If I’m right about the type of company he is “representing” here, we are talking about big bucks for super fast analytic processes.

    Terabytes is harder, but it is not out of reach, perhaps not even this summer. We’ll have to see how it goes.

    Lefty, you are right about data sizes and scaling. Art’s comments point this out as well.

    As interesting as routine 4x to 10x performance gains are with the use of CUDA for surface transforms there are two constraints revealed by both Lefty’s and Art’s comments:

    1. The highly limited case of a few dozen functions being implemented for surfaces is just the beginning. That has to be extended to vector processing, spatial SQL, and both local and remote DBMS. In progress.

    2. The biggest problem with using hordes of CUDA processors is keeping them fed with data. The computation part happens just about instantly – it is getting data out of storage and to and from those processors that is the bottleneck. That requires very large rewrites in Manifold to implement a completely new-from-the-ground up storage and data access architecture. It is about a million and a half lines of code and is also in progress. You need this to get from 10x gains to 50x or 100x gains.

    You need both of the two above to break the bottleneck and to also extend capability to the sort of analytics being discussed.

    Between the above two factors together there is no reason one could not get 50x to 100x increase in the speed of highly sophisticated spatial analytics [if you've seen the sorts of things people discuss in forum.manifold.net regarding spatial SQL in Manifold there is nothing that is more sophisticated than that stuff, not for any size enterprise] on very large databases, in the hundreds of gigabytes range. That covers an awful lot of customers.

    I grant you that getting into the multiple terabyte range requires much more thought about hardware and overall architecture than working in the hundreds of gigabyte range. But it’s getting very close and it could well be that this summer even the terabyte stuff will be as routine as the hundreds of gigabyte case.

    Initial performance results from the new software approaches are very encouraging, and it is clear that invoking parallelism and things like CUDA in non-traditional ways has so much potential that we are just scratching the surface now. Every day it seems there are even newer and better ideas for leveraging this stuff.

    We’ll see how it goes. What’s been delivered so far with CUDA is not talk, but there is still a lot of work to do in the months ahead before any of us can see for ourselves what a 50x or 100x performance gain with really big data makes possible.

    By the way, the shrinking days and weeks into hours bit really strikes home in that no matter what kind of performance increase you give people they get used to it instantly.

    The CUDA thing has been particularly instructive in that because, say, shrinking a 10 hour task into 1 hour is the sort of thing that is astonishing the first time it runs but then by the next day the customer has gotten totally used to it and wants you to shrink that 1 hour into 1 minute. :-)

  17. artlembo says:

    The biggest problem with using hordes of CUDA processors is keeping them fed with data

    yes, I’ve noticed this when you open up the Task Manager. Since I have a quad-core, for the first bunches of seconds you see 3 cores that are flat, and one that is really busy (this must be dividing up the data part you talk about). Then, a bunch of seconds later you hear this loud whirring in the computer, all 4 quads show major activity, and within seconds the entire process is done.

    Its pretty slick to watch :-) So yeah, I’ve notcied some of these things.

  18. Dimitri says:

    Art: That’s a good observation, because in all this excitement about CUDA with 128 or more stream processors per board it is good to remember that there is a proliferation of cores on the motherboard as well.

    Intel’s “Skull Trail” board now delivers at very low cost two quad-core sockets to enable eight very fast 64-bit cores with outstanding bandwidth to memory and to disk. Consider, say, 10 such systems in a cluster, each with fast arrays of terabyte disks and there you have *eighty* fast processors with, say, 80 terabytes of fast disk storage that is “cached” with 80 gigabytes of RAM. And all that for “consumer” prices, somewhere between $75,000 and $100,000 for the cluster depending on how well you can negotiate quantity discounts. Even a single such machine with 8 terabytes of disk and 8 GB RAM and 8 processors (especially if you add 512 stream processors) for under $10K is a fairly righteous computing engine.

    There is a lot of appeal to writing parallelized code that in an intelligent way can distribute function throughout such a cluster for DBMS and spatial analytic functioning. In fact, if your metric for current spatial analytic performance is something like SDE, well I can see how such an approach (assuming a modern upgrade throughout, of course, such as use of modern algorithms) could achieve 50x performance improvement or more all on its own.

    Skull Trail and similar also provides four simultaneous full-speed slots for CUDA, so you can install 512 stream processors per system at very low cost. That’s well over a teraflop, bona fide supercomputer performance. A cluster of ten such systems delivers well over 10 teraflops with 5120 stream processors supporting the 80 main processors [as Intel would put it... NVIDIA would say the 80 main processors are the ones "supporting" the 5120 stream processors... :-) ]. Those 5120 stream processors will cost you an additional $12,000 at today’s prices given typical quantity discounts for CUDA boards. $12K for 10 teraflops… not a bad deal!

    As a software guy I’m the last to say that software doesn’t matter. Sure, there is a lot of art to using the above hardware resources. No doubt about that. But revolutionary changes in price/performance in hardware will guide what is the best approach in software.

    A few years ago teraflop computational capacity cost big bucks. It now costs under $2000 quantity one. A few years ago terabyte spindles were for the super-rich. Terabyte hard disks are now consumer commodities. A few years ago 8 processors meant eight expensive machines. Now you get 8 processors on a single desktop motherboard in two inexpensive sockets. A few years ago 8 gigabytes of RAM cost more than a Mercedes. Now, 8 gigabytes of RAM can be had for well under $200. A few years ago spatial SQL cost tens of thousands for a limited system. Now, the most sophisticated spatial SQL ever seen costs under $250. Under the pressure of such revolutionary changes in price/performance it makes sense to me to implement the architecture of spatial analytics using a multi-tiered approach, to layer parallelism in pr0cessing and data access to utilize hordes of stream processors, gangs of main processors, arrays of inexpensive memory and as many machines containing those resources as you care to cluster.

    I think Peter put his finger right on the trend when he asks what sort of applications become possible when performance in spatial DBMS analytics increases by 50x to 100x. However, that is only half of the question, because the question cannot be answered without knowing what the price is for that.

    Price determines the answer because that guides the sorts of applications that can utilize such performance increases. If Peter is talking about a solution that is so costly that only nation-states or the super-rich can afford it, well, that changes things. It is like the old days when no one could afford a supercomputer so people developed centralized architectures to time-share the few supercomputers that funds allowed to be built.

    In contrast, if Peter is talking about an applications architecture more like I am, one that utilizes new possibilities for price performance, then you get the much more interesting situation like the revival in supercomputing made possible by teraflops on the desktop for a few thousand dollars. Suddenly, anyone and any organization can afford their own supercomputer and the resultant range of applications becomes very much broader.

    I can understand that taking a traditional line of reasoning might say that anyone who can afford to maintain terabytes of data can afford it (whatever “it” is). But these days private individuals accumulate terabytes of data and employ sophisticated spatial analytics, so it is a much wider constituency than before.

    That wider constituency also enables great variety in culture: in the old days when you rented time on a supercomputer within some time-shared supercomputer consortium no one used supercomputers to routinely do CGI rendering for Hollywood movies. Now that anyone can roll their own supercomputer clusters the rendering of Hollywood thriller eye-candy has become possibly the largest consumer of supercomputing cluster cycles. :-)

    Peter, to guide the sorts of applications you think might be relevant, do you have any projected price for the 50x to 100x performance increase product you have in mind? Perhaps a ball park like more or less than $10K or more or less than $100K? It could be that depending upon the price, whatever emerges as the hottest new application such performance gains will enable could be as unexpected within the spatial analytic business as the notion of supercomputers “going Hollywood” were to the supercomputer fraternity!

  19. Peter Batty says:

    Hi Dmitri, sounds like you are doing some cool stuff and it will be exciting to see how that develops. I think that as Lefty says though, we probably are talking about different scales of problem right now. The solution I am talking about is working with multiple terabytes today and that is where it is especially strong, though I expect that applications with somewhat smaller datasets but heavy analysis requirements may well find it compelling too. The system cost we are looking at is higher end than what you are talking about, but not “so costly that only nation-states or the super-rich can afford it”!

    But we are talking about large enterprises or government agencies in general. In cases where we have talked to people trying to run these type of applications today, our hardware is typically cheaper than the servers they are using to run their existing databases – in terms of orders of magnitude you are talking about systems starting in the six figure range. So I am looking for applications where this type of performance improvement would have business benefits in that range also. So far we have had strong interest from organizations like large retail, insurance and telecom companies, and we see strong potential in areas like intelligence, criminal investigation, and emergency planning and response.

    I smiled at your comment about “the old days when no one could afford a supercomputer so people developed centralized architectures to time-share the few supercomputers that funds allowed to be built”. Have you used Google recently? I don’t think you could afford their hardware, but that doesn’t mean you can’t take advantage of their capabilities. I think there will also be interesting opportunities for people to provide services based on the type of technology I am talking about, which obviously would broaden its reach significantly.

    Anyway, as someone commented over on my blog, it’s great that we are starting to see more in the way of serious computer science innovations applied to geospatial applications, from multiple directions, now that we are moving much more into the mainstream of IT.

  20. Paul says:

    Two words: Congressional Redistricting.

    Objective: optimized area allocation into ‘n’ voter districts for either competitive or non-competitive election cycles (depending on your view of the democratic process).
    Secondary by-product of above: model trend impacts based on media buy cost to optimize campaign spending and general predictive spatial consumer behavior based on “injected” catalysts.

    Input 1: partisan voting history by precinct areas (discrete partisan votes totals with precinct x precinct ratios for turnout, total registered voters, total eligible to register, and total population)

    Input 2: population migration (historical, current estimated, and predicted) by census blockgroup or even census block if you have the computational horses.

    Input 3: aggregated consumer behavior records by multiple dimensions – usually aggregated by zipcode or block group (not a fan of zipcode area analysis – but that’s usually where the biz data is scaled to)

    Input 4: discrete and aggregated campaign contributions from the FEC database

    Input 5: other spatial market segmentation data to control for on-demand measures (crime, education, vehicle type, etc.).

    Input 6: political subdivision boundary data to constrain allocation model behavior and enforce current county and municipal communities of interest.

    The exponential nature of allocation modeling at the state level for sub-county areas has generally been categorized as “computationally “NP hard”" …. ergo the value of Mr. Batty’s “machine”.

    If this working “on-demand” model were available it would be a gold-mine for political interest groups and consumer marketing firms.

    Of course there are models out there, but they frequently abstract or overly generalize the spatial dimension — this model would be spatially accurate to the neighborhood level at each given point in time loaded into the model. And the real value is in the predictive ability by location based on a combination of “injects” into the consumer environment (incl. gas prices, severe weather, approval ratings, etc.).

    I imagine it like a giant climate or weather model only the gradients represent a given spatial behavior across time instead of temperature, precipitation, etc.

    If you could put a simple GUI on the front end and have it ready by July or August (in time to have fun before the Nov. election) – that would be great! :)

  21. Ralphie says:

    Paul: good example, two equations and six unknowns.

    Bad deadline, however. Redistricting won’t take place until 2012.

  22. Paul says:

    @ Ralphie —

    Actually, the US Supreme court says congressional redistricting can occur whenever the state assembly wants it to (here).

    As far as “unknowns” —I would kindly offer that you vastly underestimate retail market data. Do you own a GM with OnStar or use MasterCard / Visa? Supermarket club card? There are no unknowns – only the un-calculated (apologies to Webster’s and my 7th grade English teacher).

  23. Dimitri says:

    Peter,

    I smiled at your comment about “the old days when no one could afford a supercomputer so people developed centralized architectures to time-share the few supercomputers that funds allowed to be built”. Have you used Google recently? I don’t think you could afford their hardware, but that doesn’t mean you can’t take advantage of their capabilities.

    That’s an analogy that cuts both ways. Sure,I agree that Google is a perfect example of a centralized architecture that time-shares a very expensive resource that otherwise no one could afford to use. But even Google agrees that is not right for everyone and for all applications.

    For example even Google understands that time sharing the service it provides is not right all the time, so Google sells their search appliances directly to those users who want to and who can afford to run their own search infrastructure. These are typically large organizations and especially large organizations dealing with sensitive data they would prefer not be accessible, not in any way, to an outsider, not even to a “trusted” outsider.

    In general, the more proprietary the data and algorithms the more an organization wants to be sure that no outsider can get their hands either on the data or on the algorithms.

    Historically, that’ s been one of the reason for the shift from time sharing of supercomputers to organizations running their own supercomputers. If you are a drug company with billions of dollars invested in your portfolio of molecules, or an investment house invested into computational finance or a political party investing a hundred million into computational electioneering you don’t want to risk anyone else finding out what data you think is worth exploring or how you analyze that data or what your results may be.

    Something else to consider is that as a technical matter Google is not very representative of, well, computing. Google does a handful of computationally very simple things on a massive scale. It is not remotely as complex as deep, custom analysis typical of, say, supercomputing or spatial analytics.

    So, sure, if you want a very algorithmically lightweight scan of a snapshot of enormously large data, yeah, Google is your thing. If you want to program a heavyweight algorithm like the election studies example given or, say, a protein folding and interaction algorithm making pairwise comparisons between a few million molecules, then Google is a terrible idea.

    Those latter cases show other examples of why people often prefer to have supercomputing under their own control instead of time sharing: it is not just privacy, it is the ability to attain much stronger computational capacity.

    In a sense, Google is about big disks and small brains. If I understand your target market, you see both big disks and significant brains as well. Historically, people who wanted massive computing intelligence, especially with very fast data access have increased their chances the more that they are able to keep such resources closer to them.

    In the present case, I get the feeling that perhaps technology is moving faster than expectations, for instance:

    we probably are talking about different scales of problem right now. The solution I am talking about is working with multiple terabytes today and that is where it is especially strong

    Well, the $100,000 cluster I advocated earlier was 80 terabytes with 5120 processors so it is true that perhaps I overspecified if you are aiming for smaller tasks, in the multiple terabytes instead of the many tens of terabytes. I apologize if I misunderstood and overshot the requirement.

    If all you need do is multiple terabytes, well, then you could get away with a much smaller cluster, say one or two machines for a total of, say, 8 terabytes and 1024 processors. That would keep it well under $20,000, a much more affordable solution.

    But if the complaint that what I’m suggesting is so expensive that it must be time shared, well, I’d offer two observations:

    First, the sorts of customers you mention have plenty of money and can easily afford $20,000 or $100,000. They won’t hesitate to spend that even if a time-shared solution is less expensive, because no way, no how will they give up the power of having 1024 to 5120 and more processors all for themselves.

    Second, although I grant you that spending $100,000 to get 5120 processors and 80 terabyte capacity sounds like a lot, that’s only at today’s prices. A year from now it will be $60,000 and the year after that $30,000.

    Large customers tend to have slow procurements so by the time a lot of these folks get going it will probably be down to $25,000 with storage and computing capacities even higher than today. Heck, in two or three years we could be talking 300 terabytes and 15,000 processors for that money! :-)

    So no, I don’t knock the idea of time-sharing an expensive resource to make it available to more people. But I am saying that both security and performance are strong reasons for not time sharing and that the costs of direct, distributed technology are getting so low that it is already probably too inexpensive (as has become the case with supercomputing) to bother time sharing anyway.

    Could you comment whether your plans are for specialized hardware, or is it software not tied to a specific hardware platform?

  24. Peter Batty says:

    Dmitri, I was merely pointing out that you made it sound like “time sharing” was a concept from a historical age, and it isn’t, as we both agree it seems. And I agree with you that of course there are pros and cons to that approach. To reiterate what I said before, our main focus is the high end of the market where people would have their own dedicated system, but I also think that a service offering could be an interesting option for making some of these capabilities available to those who can’t afford their own dedicated system.

    When I said that we are looking at different scales of problem I was simply going by your comment above where you said “I grant you that getting into the multiple terabyte range requires much more thought about hardware and overall architecture than working in the hundreds of gigabyte range. But it’s getting very close and it could well be this summer …”. In a later comment you specified a system with 80TB of disk storage, but that is a completely different question from how you can effectively run complex spatial analysis across that much data. I agree with you again when you said that this “requires much more thought about hardware and overall architecture”. So overall I think we’re agreeing about a lot of things :) .

  25. Dan S. says:

    A belated message to Paul: I hate to be a nit-picker, but a 1000x speed increase, or even 10,000x, is mostly useless for truly solving NP-hard problems. They wouldn’t really be ‘hard’ then, would they?

    ((The classic example is the traveling salesperson problem, where nobody has figured a substantially better way to find the shortest route that visits each city in a list than checking all possible routes between them. The time it takes will grow with the number of possible routes, which is the factorial of the number of cities. Thus if you have 1000 cities, adding a 1001th city will slow the search for the optimal solution by a factor of 1001. It’s clear that constant-multiplier speed improvements, even big ones, are mostly helpless in the face of this sort of performance characteristic.))
    There are many heuristic and approximate ways of tackling these when you don’t need the absolute perfect answer, and faster geoprocessing could be very helpful there…

    Since you’re interested in redistricting, you might be interested in a similar problem I have some experience in: automated conservation reserve design. You can google up MARXAN for a tool which is widely used in an attempt to identify areas that preserve the most biodiversity while still doing so efficiently. Not very different from trying to capture voting-behavior-similar blocs. (Note: I’m the author of a very similar and now rather out-of-date tool called SPOT.)

    I’m certain that similar methods are used to optimize forestry yield/profits, and probably ditto for other agricultural fields. Actually, similar methods are used all over the place by all kinds of industries.

    As a practical matter, most of the time this sort of thing is tackled by first using geoprocessing and database crunching to boil things down to a Big Fat Matrix which is then fed into custom-written tools.

  26. MTBMaven says:

    I am a mere grasshopper compared to the experts contributing to this very interesting topic. My graduate research revolves around the validation of ESRI’s viewshed algorithm when compared to field derived viewsheds using LiDAR derived DEMs in an urban core.

    In the process of my research I have developed a conceptual algorithm for the calculation of viewsheds on TIN surfaces. To my knowledge no commercial GIS can compute a viewshed on a TIN or return the results as a TIN, yet the literature suggests line of sight calculations (a core function of viewshed calculations) are more accurate when computed on TIN surfaces. My algorithm utilizes brute force line of sight calculations to determine visibility. When conducted on very large TIN surfaces, this would require large amounts of processing power.

    Is this the type of computations you are interested in?

  27. Mars Sjoden says:

    Oh lord please!

    I may actually have a life if I could run my queries, identities, analysis, complex statistic computations.

    I’m working on an Ecosystem Based Management plan ( very large areas ) in norther BC and I sure could use another 10, 20… 100 cpu’s running for me.

    *sigh*… whoops, an identity just finished, I better run the next one…

    Yah, I could easily do with some extra horsepower.

  28. Ho Nguyen says:

    Hi all,

    I’m interresting in way to find the shortest route on map.

    I use MapDotNet and SQL server 2008.

    Do you guys have any ebook or article talk about this?

    Thanks,
    Ho Nguyen