Three Underused MongoDB Features

Published: Jan 28, 2021

By Mark Smith


As a Developer Advocate for MongoDB, I have quite a few conversations with developers. Many of these developers have never used MongoDB, and so the conversation is often around what kind of data MongoDB is particularly good for. (Spoiler: Nearly all of them! MongoDB is a general purpose database that just happens to be centered around documents instead of tables.)

But there are lots of developers out there who already use MongoDB every day, and in those situations, my job is to make sure they know how to use MongoDB effectively. I make sure, first and foremost, that these developers know about MongoDB's Aggregation Framework, which is, in my opinion, MongoDB's most powerful feature. It is relatively underused. If you're not using the Aggregation Framework in your projects, then either your project is very simple, or you could probably be doing things more efficiently by adding some aggregation pipelines.

But this article is not about the Aggregation Framework! This article is about three other features of MongoDB that deserve to be better known: TTL Indexes, Capped Collections, and Change Streams.

#TTL Indexes

One of the great things about MongoDB is that it's so easy to store data in it, without having to go through complex steps to map your data to the model expected by your database's schema expectations.

Because of this, it's quite common to use MongoDB as a cache as well as a database, to store things like session information, authentication data for third-party services, and other things that are relatively short-lived.

A common idiom is to store an expiry date in the document, and then when retreiving the document, to compare the expiry date to the current time and only use it if it's still valid. In some cases, as with OAuth access tokens, if the token has expired, a new one can be obtained from the OAuth provider and the document can be updated.

2 {
3 "name": "Professor Bagura",
4 # This document will disappear before 2022:
5 "expires_at": datetime.fromisoformat("2021-12-31 23:59:59"),
6 }
9# Retrieve a valid document by filtering on docs where `expires_at` is in the future:
10if (doc := coll.find_one({"expires_at": {"$gt": datetime.now()}})) is None:
11 # If no valid documents exist, create one (and probably store it):
12 doc = create_document()
14# Code to use the retrieved or created document goes here.

Another common idiom also involves storing an expiry date in the document, and then running code periodically that either deletes or refreshes expired documents, depending on what's correct for the use-case.

1while True:
2 # Delete all documents where `expires_at` is in the past:
3 coll.delete_many({"expires_at": {"$lt": datetime.now()}})
4 time.sleep(60)

An alternative way to manage data that has an expiry, either absolute or relative to the time the document is stored, is to use a TTL index.

To use the definition from the documentation: "TTL indexes are special single-field indexes that MongoDB can use to automatically remove documents from a collection after a certain amount of time or at a specific clock time." TTL indexes are why I like to think of MongoDB as a platform for building data applications, not just a database. If you apply a TTL index to your documents' expiry field, MongoDB will automatically remove the document for you! This means that you don't need to write your own code for removing expired documents, and you don't need to remember to always filter documents based on whether their expiry is earlier than the current time. You also don't need to calculate the absolute expiry time if all you have is the number of seconds a document remains valid!

Let me show you how this works. The code below demonstrates how to create an index on the created_at field. Because expiresAfterSeconds is set to 3600 (which is one hour), any documents in the collection with created_at set to a date will be deleted one hour after that point in time.

1coll = db.get_collection("ttl_collection")
3# Creates a new index on the `created_at`.
4# The document will be deleted when current time reaches one hour (3600 seconds)
5# after the date stored in `created_at`:
6coll.create_index([("expires_at", 1)], expireAfterSeconds=3600)
9 {
10 "name": "Professor Bagura",
11 "created_at": datetime.now(), # Document will disappear after one hour.
12 }

Another common idiom is to explicitly set the expiry time, when the document should be deleted. This is done by setting expireAfterSeconds to 0:

1coll = db.get_collection("expiry_collection")
3# Creates a new index on the `expires_at`.
4# The document will be deleted when
5# the current time reaches the date stored in `expires_at`:
6coll.create_index([("expires_at", 1)], expireAfterSeconds=0)
9 {
10 "name": "Professor Bagura",
11 # This document will disappear before 2022:
12 "expires_at": datetime.fromisoformat("2021-12-31 23:59:59"),
13 }

Bear in mind that the background process that removes expired documents only runs every 60 seconds, and on a cluster under heavy load, maybe less frequently than that. So, if you're working with documents with very short-lived expiry durations, then this feature probably isn't for you. In practice, in most cases, rounding the expiry down by at least 60 seconds has no significant issues for data that is valid for many minutes or hours. An alternative is to continue to filter by the expiry in your code, to benefit from finer-grained control over document validity, but allow the TTL expiry service to maintain the collection over time, removing documents that have very obviously expired.

If you're working with data that has a lifespan, then TTL indexes are a great feature for maintaining the documents in a collection.

#Capped Collections

Capped collections are an interesting feature of MongoDB, useful if you wish to efficiently store a ring buffer of documents.

A capped collection has a maximum size in bytes and optionally a maximum number of documents. (The lower of the two values is used at any time, so if you want to reach the maximum number of documents, make sure you set the byte size large enough to handle the number of documents you wish to store.) Documents are stored in insertion order, without the use of an index, and so can handle higher throughput than an indexed collection. When either the collection reaches the set byte size, or the max number of documents, then the oldest documents in the collection are purged.

Capped collections can be useful for buffering recent operations, and these can be queried when an error state occurs, in order to have a log of recent operations leading up to the error state.

Or, if you just wish to efficiently store a fixed number of documents in insertion order, then capped collections are the way to go.

Capped collections are created with the createCollection method, by setting the capped, size, and optionally the max parameters:

1# Create acollection with a large size value that will store a max of 3 docs:
2coll = db.create_collection("capped", capped=True, size=1000000, max=3)
4# Insert 3 docs:
5coll.insert_many([{"name": "Chico"}, {"name": "Harpo"}, {"name": "Groucho"}])
7# Insert a fourth doc! This will evict the oldest document to make space (Zeppo):
8coll.insert_one({"name": "Zeppo"})
10# Print out the docs in the collection:
11for doc in coll.find():
12 print(doc)
14# {'_id': ObjectId('600e8fcf36b07f77b6bc8ecf'), 'name': 'Harpo'}
15# {'_id': ObjectId('600e8fcf36b07f77b6bc8ed0'), 'name': 'Groucho'}
16# {'_id': ObjectId('600e8fcf36b07f77b6bc8ed1'), 'name': 'Zeppo'}

If you want a rough idea of how big your bson documents are in bytes, for calculating the value of size, you can either use your driver's bsonSize method in the mongo shell, on a document constructed in code, or you can use MongoDB 4.4's new bsonSize aggregation operator, on documents already stored in MongoDB.

Note that with the improved efficiency that comes with capped collections, there are also some limitations. It is not possible to delete a document from a capped collection, only to insert. You can't shard a capped collection. There are some other limitations around replacing and updating documents and transactions. Read the documentation for more details.

It's worth noting that this pattern is similar in feel to the Bucket Pattern, which allows you to store a capped number of items in an array, and automatically creates a new document for storing subsequent values when that cap is reached.

#Change Streams and the watch method

And finally, the biggest lesser-known feature of them all! Change streams are a live stream of changes to your database. The watch method, implemented in most MongoDB drivers, streams the changes made to a collection, a database, or even your entire MongoDB replicaset or cluster, to your application in real-time. I'm always surprised by how few people have not heard of it, given that it's one of the first MongoDB features that really excited me. Perhaps it's just luck that I stumbled across it earlier.

In Python, if I wanted to print all of the changes to a collection as they're made, the code would look a bit like this:

1with my_database.my_collection.watch() as stream:
2 for change in stream:
3 print(change)

In this case, watch returns an iterator which blocks until a change is made to the collection, at which point it will yield a BSON document describing the change that was made.

You can also filter the types of events that will be sent to the change stream, so if you're only interested in insertions or deletions, then those are the only events you'll receive.

You could combine this feature in interesting ways with either a TTL Index or a Capped Collection, to notify when these features removed documents from your collection, and store the data in a different collection for longer-term storage, or maybe notify the application that data has been evicted.

I've used change streams (which is what the watch method returns) to implement a chat app, where changes to a collection which represented a conversation were streamed to the browser using WebSockets.

But fundamentally, change streams allow you to implement the equivalent of a database trigger, but in your favourite programming language, using all the libraries you prefer, running on the servers you specify. It's a super-powerful feature and deserves to be better known.

#Further Resources

If you don't already use the Aggregation Framework, definitely check out the documentation on that. It'll blow your mind (in a good way)!

Further documentation on the topics discussed here:

If you have questions, please head to our developer community website where the MongoDB engineers and the MongoDB community will help you build your next big idea with MongoDB.

MongoDB Icon
  • Developer Hub
  • Documentation
  • University
  • Community Forums

© MongoDB, Inc.