Designing the Next PHP and HHVM MongoDB Drivers

In the beginning Kristina created the MongoDB PHP driver. Now the PECL mongo extension was new and untested, write operations tended to be fire-and-forget, and Boolean parameters made more sense than $options arrays. And Kristina said, “Let there be MongoCollection,” and there was basic functionality.

Since the PHP driver first appeared on the scene, MongoDB has gone through many changes. Replica sets and sharding arrived early on, but things like the aggregation framework and command cursors were little more than a twinkle in Eliot’s eye at the time. The early drivers were designed with many assumptions in mind: write operations and commands were very different; the largest replica set would have no more than a dozen nodes; cursors were only returned by basic queries. In 2015, we know that these assumptions no longer hold true.

Beyond MongoDB’s features, our ecosystem has also changed. When the PHP driver, a C extension, was first implemented, there wasn’t yet a C driver that we could utilize. Therefore, the 1.x PHP driver contains its own BSON and connection management C libraries. HHVM, an alternative PHP runtime with its own C++ extension API, also did not exist years ago, nor was PHP 7.0 on the horizon. Lastly, methods of packaging and distributing libraries have changed. Composer has superseded PEAR as the de facto standard for PHP libaries and support for extensions (currently handled by PECL) is forthcoming.

During the spring of 2014, we worked with a team of students from Facebook’s Open Academy program to prototype an HHVM driver modeled after the 1.x API. The purpose of that project was twofold: research HHVM’s extension API and determine the feasibility of building a driver atop libmongoc (our then new C driver) and libbson. Although the final result was not feature complete, the project was a valuable learning experience. The C driver proved quite up to the task, and HNI, which allows an HHVM extension to be written with a combination of PHP and C++, highlighted critical areas of the driver for which we’d want to use C.

This all leads up to the question of how best to support PHP 5.x, HHVM, and PHP 7.0 with our next-generation driver. Maintaining three disparate, monolithic extensions is not sustainable. We also cannot eschew the extension layer for a pure PHP library, like mongofill, without sacrificing performance. Thankfully, we can compromise! Here is a look at the architecture for our next-generation PHP driver:

"Driver architecture"

At the top of this stack sits a pure PHP library, which we will distribute as a Composer package. This library will provide an API similar to what users have come to expect from the 1.x driver (e.g. CRUD methods, database and collection objects, command helpers) and we expect it to be a common dependency for most applications built with MongoDB. This library will also implement common specifications, in the interest of improving API consistency across all of the drivers maintained by MongoDB (and hopefully some community drivers, too).

Sitting below that library we have the lower level drivers (one per platform). These extensions will effectively form the glue between PHP and HHVM and our system libraries (libmongoc and libbson). These extensions will expose an identical public API for the most essential and performance-sensitive functionality:

  • Connection management
  • BSON encoding and decoding
  • Object document serialization (to support ODM libraries)
  • Executing commands and write operations
  • Handling queries and cursors

By decoupling the driver internals and a high-level API into extensions and PHP libraries, respectively, we hope to reduce our maintainence burden and allow for faster iteration on new features. As a welcome side effect, this also makes it easier for anyone to contribute to the driver. Additionally, an identical public API for these extensions will make it that much easier to port an application across PHP runtimes, whether the application uses the low-level driver directly or a higher-level PHP library.

GridFS is a great example of why we chose this direction. Although we implemented GridFS in C for our 1.x driver, it is actually quite a high-level specification. Its API is just an abstraction for accessing two collections: files (i.e. metadata) and chunks (i.e. blocks of data). Likewise, all of the syntactic sugar found in the 1.x driver, such as processing uploaded files or exposing GridFS files as PHP streams, can be implemented in pure PHP. Provided we have performant methods for reading from and writing to GridFS’ collections – and thanks to our low level extensions, we will – shifting this API to PHP is win-win.

Earlier I mentioned that we expect the PHP library to be a common dependency for most applications, but not all. Some users may prefer to stick to the no-frills API offered by the extensions, or create their own high-level abstraction (akin to Doctrine MongoDB for the 1.x driver), and that’s great! Hannes has talked about creating a PHP library geared for MongoDB administration, which provides an API for various user management and ops commands. I’m looking forward to building the next major version of Doctrine MongoDB ODM directly atop the extensions.

While we will continue to maintain and support the 1.x driver and its users for the foreseeable future, we invite everyone to check out our next-generation driver and consider it for any new projects going forward. You can find all of the essential components across GitHub and JIRA:

Project GitHub JIRA
PHP Library mongodb/mongo-php-library PHPLIB
PHP 5.x Driver (phongo) mongodb/mongo-php-driver PHPC
HHVM Driver (hippo) mongodb/mongo-hhvm-driver HHVM

The existing PHP project in JIRA will remain open for reporting bugs against the 1.x driver, but we would ask that you use the new projects above for anything pertaining to our next-generation drivers.