tom@macwright.org

The answers to spatial problems tend to converge towards certain techniques that everyone uses. Here are a few.

If you’ve used PostGIS, it’s likely you’ve run into the all-important index problem. Adding a spatial index to a table means the difference between multi- and sub-section queries on a table. In software, the same principle goes: indexes are typically required for performance if you’re asking anything about spatial intersections.

Typically spatial indexes are R-trees: the data structure provides a nice balance of space and performance, and is fairly simple to implement.

In JavaScript, use rbush, Vladimir Agafonkin’s library based on recent research. Closer to the metal, spatialindex is essential.

R-Trees contain rectangles and points, not polygons or lines. What if you need a more specific answer? This leads us to an important technique.

A useful and simple technique for asking questions like ‘is this point in any of these 2 million polygons?’ is that you can break it down into steps based on assumptions:

- If a point is in a polygon, it must be in the bounding box for that polygon.
- Therefore, we can answer this question in two steps:
- Which polygon bounding boxes is this point within?
- Of those polygon bounding boxes, how many of the actual polygons does this point fall into?

This takes advantage of the fact that the first question can be extremely fast to answer, thanks to R-Tree, whereas the second question, whether a point is in a specific polygon, usually cannot be fast. The canonical algorithm for point-in-polygon intersections looks at every point in the polygon, even in highly optimized implementations like GEOS.

In JavaScript substack’s point-in-polygon module and my leaflet-pip expose this algorithm. It’s an extremely simple technique: barring naive implementation mistakes, the performance is pretty predictable and scales relative to the number of vertices in the shape.

There are many ways to calculate the area of a polygon on the earth. The most complex and most accurate is rarely used - taking into account the minute surface variations and elevation differences of the geoid.

A much more common approach is to calculate the area of a ring using some simple trigonometry popularized by Robert Chamberlain at NASA. There are many implementations of that approach, like geojson-area and geojson-utils.

One neat side-effect of the algorithm to calculate polygon area is that it also answers the question of the polygon’s winding number: a clockwise loop will yield area greater than or equal to zero, while a counterclockwise loop has negative area.

What about finding the closest points to a certain point? The old-fashioned
answer from my MySQL days would be something like a bounding box query
with `>`

and `<`

: you can do better if you’re writing an application.

Using a k-d tree, we can solve the nearest neighbor problem without having to look at every entry. In JavaScript, sphere-knn is a tested and awesome implementation of this by Dark Sky.

Many questions distill into these: for instance, which shapes interact with a bounding box, a common question in tiled maps, is some combination of indexing and point in polygon checking.

An interesting new approach that we’ve been exploring at Mapbox and that has been implemented in Google’s S2 Geometry Library and Morgan Herlocker’s tile-cover library, is to use cell indexes rather than trees to answer these questions.

It’s a promising approach because it flattens the index structure, making it possible to distribute the index across a network and express spatial queries as range queries. Since the derivation of cells from a shape is very similar to rasterization, we can steal tricks from graphics programming and use ray-tracing techniques to make it super-fast.

Want to read something else I've written about maps and geo?