2dsphere, GeoJSON, and Doctrine MongoDB

It seems that GeoJSON is all the rage these days. Last month, Ian Bentley shared a bit about the new geospatial features in MongoDB 2.4. Derick Rethans, one of my PHP driver teammates and a renowned OpenStreetMap aficionado, recently blogged about importing OSM data into MongoDB as GeoJSON objects. A few days later, GitHub added support for rendering .geojson files in repositories, using a combination of Leaflet.js, MapBox, and OpenStreetMap data. Coincidentally, I visited a local CloudCamp meetup last week to present on geospatial data, and for the past two weeks I’ve been working on adding support for MongoDB 2.4’s geospatial query operators to Doctrine MongoDB.

Doctrine MongoDB is an abstraction for the PHP driver that provides a fluent query builder API among other useful features. It’s used internally by Doctrine MongoDB ODM, but is completely usable on its own. One of the challenges in developing the library has been supporting multiple versions of MongoDB and the PHP driver. The introduction of read preferences last year is one such example. We wanted to still allow users to set slaveOk bits for older server and driver versions, but allow read preferences to apply for newer versions, all without breaking our API and abiding by semantic versioning. Now, the setSlaveOkay() method in Doctrine MongoDB will invoke setReadPreference() if it exists in the driver, and fall back to the deprecated setSlaveOkay() driver method otherwise.

Query Builder API

Before diving into the geospatial changes for Doctrine MongoDB, let’s take a quick look at the query builder API. Suppose we had a collection, test.places, with some OpenStreetMap annotations (key=value strings) stored in a tags array and a loc field containing longitude/latitude coordinates in MongoDB’s legacy point format (a float tuple) for a 2d index. Doctrine’s API allows queries to be constructed like so:

$connection = new \Doctrine\MongoDB\Connection();
$collection = $connection->selectCollection('test', 'places');

$qb = $collection->createQueryBuilder()
    ->field('loc')
        ->near(-73.987415, 40.757113)
        ->maxDistance(0.00899928);
    ->field('tags')
        ->equals('amenity=restaurant');

$cursor = $qb->getQuery()->execute();

This above example executes the following query:

{
    "loc": {
        "$near": [-73.987415, 40.757113],
        "$maxDistance": 0.00899928
    },
    "tags": "amenity=restaurant"
}

This simple query will return restaurants within half a kilometer of 10gen’s NYC office at 229 West 43rd Street. If only it was so easy to find good restaurants near Times Square!

Supporting New and Old Geospatial Queries

When the new 2dsphere index type was introduced in MongoDB 2.4, operators such $near and $geoWithin were changed to accept GeoJSON geometry objects in addition to their legacy point and shape arguments. $near was particularly problematic because of its optional $maxDistance argument. As shown above, $maxDistance previously sat alongside $near and was measured in radians. It now sits within $near and is measured in meters. Using a 2dsphere index and GeoJSON points, the same query takes on a whole new shape:

{
    "loc": {
        "$near": {
            "$geometry": {
                "type": "Point",
                "coordinates" [-73.987415, 40.757113]
            },
            "$maxDistance": 500
        }
    },
    "tags": "amenity=restaurant"
}

This posed a hurdle for Doctrine MongoDB’s query builder, because we wanted to support 2dsphere queries without drastically changing the API. Unfortunately, there was no obvious way for near() to discern whether a pair of floats denoted a legacy or GeoJSON point, or whether a number signified radians or meters in the case of maxDistance(). I also anticipated we might run into a similar quandry for the $geoWithin builder method, which accepts an array of point coordinates.

Method overloading seemed preferable to creating separate builder methods or introducing a new “mode” parameter to handle 2dsphere queries. Although PHP has no language-level support for overloading, it is commonly implemented by inspecting an argument’s type at runtime. In our case, this would necessitate having classes for GeoJSON geometries (e.g. Point, LineString, Polygon), which we could differentiate from the legacy geometry arrays.

Introducing a GeoJSON Library for PHP

A cursory search for GeoJSON PHP libraries turned up php-geojson, from the the MapFish project, and geoPHP. I was pleased to see that geoPHP was available via Composer (PHP’s de facto package manager), but neither library implemented the GeoJSON spec in its entirety. This seemed like a ripe opportunity to create such a library, and so geojson was born a few days later.

At the time of this writing, 2dsphere support for Doctrine’s query builder is still being developed; however, I envision it will take the following form when complete:

use GeoJson\Geometry\Point;

// ...

$qb = $collection->createQueryBuilder()
    ->field('loc')
        ->near(new Point([-73.987415, 40.757113]))
        ->maxDistance(0.00899928);
    ->field('tags')
        ->equals('amenity=restaurant');

All of the GeoJson classes implement JsonSerializable, one of the newer interfaces introduced in PHP 5.4, which will allow Doctrine to prepare them for MongoDB queries with a single method call. One clear benefit over the legacy geometry arrays is that the GeoJson library performs its own validation. When a Polygon is passed to geoWithin(), Doctrine won’t have to worry about whether all of its rings are closed LineStrings; the library would catch such an error in the constructor. This helps achieve a separation of concerns, which in turn increases the maintainability of both libraries.

I look forward to finishing up 2dsphere support for Doctrine MongoDB in the coming weeks (things are a bit busy with MongoNYC right around the corner). In the meantime, if you happen to fall in the fabled demographic of PHP developers in need of a full GeoJSON implementation, please give geojson a look and share some feedback.