validate the css validate the xhtml

HackerMoJo.com


Ceci n'est pas une blog
by Glenn Franxman, Django Developer / Stunt Programmer.

Spatial Search

posted: 2004-07-02 21:30:54 perma-link, RSS comments feed

So I'm working on a personal project that involves spatial components to the queries. Right now, it works, but doesn't scale very well, so I'm precomputing all of the possible answers. You can see it here. It will tell you the distance betwen two zip codes.

I'm precomputing all of the zipcodes within 5, 15, 25 and 50 miles of each other. At the current rate, it will take about 40 hours more on this machine. If it reports "1" as the zipcodes near the given zips, that just means that it hasn't gotten around to precomputing it. When it is complete, it should be very fast. Then I'll combine it with my previous work on the inverted text indicies.

The calculations are based upon data from the 2000 census which is last time they mapped gps coordintes to the "centroid" of all of the zipcodes. ( Did you know that of the 100000 possible zipcodes, only 42193 exist? ). As fuzzy as determining the center of gerymandered zip code regions would seem, you might take solice in the math for computing the distance between them being stright forward. Well, you'd be wrong. Tthe math involved in determing the distance between lattitude-longintude pairs involves some pretty nasty trig.

Since the coordinates are given latitude ( angular distance parallel to the center plane at the equator) and longitude ( angular distance around the poles ) we don't have a nice cartesian grid to deal with.

We'll first have to figure out the change in latitude and change in longitude.
We have to convert them to radians because these coordinates are currently in units of degrees, as in 0-360, and we'll want use trig functions that things in fractions of π/2 .

At first, I was converting the positions of the two coordintes to a square space ( x,y,z ) and finding the distance between them, but guess what? That's a stright line. I'm not talking as the crow flies, a straight line between points on the surface of a sphere go through the sphere! To get the distance along the arc of the surface connecting two points you use this equation:

distance = EARTH_RADIUS * 2 * atan2(sqrt(pow(sin(delta_lat / 2.0), 2) + cos(lat1) * cos(lat2) * pow(sin(delta_long / 2.0), 2)),sqrt(1 - pow(sin(delta_lat / 2.0), 2) + cos(lat1) * cos(lat2) * pow(sin(delta_long / 2.0), 2)));

That's pretty good. There won't be a quiz on that one. My first inclination was to try to tune that to run faster ( which is possible, many of those terms are repeated ), but I think this caching scheme will pay off more in the long run.



glenn said on 2004-07-04 14:33:27:
So now, the fast results ( distance as the crow flies across a theoretically smooth ellipsoid ) as reported immediately, as is the much more realistic, but much slower calculation for distance - as a man walks across a theoretically smooth ellipsoid. After looking at the different routes taken by mapquest and yahoo maps, and at the zip code densities of places like Knoxville, Pittsburg, Beverly Hills and New York, I realized there is a strogn correlation between zipcode densities and population densities. I made my own hypothesis that road densities follow population densities, and thus relate zipcode densities to road densities. Picking a path that chooses waypoints based on distance to destiantion and zip density, I get paths that approximate the major highways and correlate with my own driving experiences. Now I need to optimize that process somehow...


glenn said on 2004-07-03 16:02:12:
It could be wrong, but I don't think that it is. The distance calculations should be correct, the real horse power is being used to calculate the neighboring zips. You entered 37932 ( knoxville, tn ) and 15221 ( pittsbug, pa ) Mapquest estimates that drive at 500 miles ( 119 more than as the crow flies ). Yahoo takes a much more western route and spends 553 miles on the trip. In either case, these are driving distances which are much more meandering. Errors creep into my calculations from my assumption that the early is a smooth sphere and that it's radius is 3956 miles. If I knew the min and max radius from equator to pole, then I could possible stretch the sphere into more of an ellipsoid which would be more accurate.


Benz said on 2004-07-03 11:31:24:
Very interesting, though I have to admit you lost me after graph three. I can see definite applications for this. I entered a zip in knoxville and one in pittsburgh and it claimed they were 381 miles apart. Clearly wrong. I'm assuming that's because it's not quite done yet?

Comments

Post a comment


Based upon your reading habits, might I recommend:

Or, you might like:

Copyright © 2003,2004,2005,2006,2007,2008 GFranxman. All Rights Reserved


hosting: slicehost.com. powered by: django. written in: python. controlled by: bzr. monsters by: monsterID.

You've been exposed to: {'Programming': 1}