Schema Performance Tuning in MongoDB
This article by Ricardo Pires, Senior Software Engineer at 3T Software Labs, looks at how to easily make MongoDB schema changes using a tool called Reschema in Studio 3T. Learn what to consider before undertaking a schema change and best practices for using Reschema.
Schema-less doesn’t mean never managing the schema. In the case of MongoDB, there are a number of instances where schema adjustments will significantly improve query performance.
Some schema changes will be straightforward, driven by a functional requirement. You might need to change a field from a double to a decimal128 type, for instance, in order to accommodate financial data, or to allow for more complex computations. You may need to remove a sensitive field to comply with certain policies or regulations. Other schema changes will be less-obvious optimizations intended to improve performance. For example, you might need to keep an eye on document size, splitting off embedded documents into different collections to avoid reaching the 16MB limit.
SEE ALSO: Introduction to NoSQL
These types of tune ups are normally accomplished with a script, often using the update() method. Other types of optimizations, like extracting a particular element inside a nested array, can be complex to write. If several updates are required, they often must be performed in a specific order to prevent breaking the data, unless you have a tool like Reschema.
Reschema is a schema management tool in the MongoDB IDE Studio 3T. There are 10 different operations possible with Reschema, all without writing scripts:
- Add a field
- Remove a field
- Rename a field
- Change a field type
- Move a field up or down
- Embed or move a field to a parent
- Flatten an object/array
- Unwind an object/array
- Extract a field to another collection
- Relate a selected collection with another
Unless you’re building the database from scratch, the schema design and indexing strategy is likely already in place. If this is the case, most schema changes will revolve around improving performance. This is where a little planning and a tool like Studio 3T’s Reschema come in.
Planning which schema changes to make
Knowing which adjustments to make will require first understanding where the bottlenecks are. The first step is to analyze common query patterns.
There are a number of different tools you can use to do this. Studio 3T has built-in visual explain plans you can use to see detailed information about how a query is resolved, or you can use MongoDB’s Query Profiler to identify slow-running queries.
Look especially at queries that use an in-application or in-database $lookup between collections. These are areas where it might be advantageous to merge data between collections or embed the results of frequent sub-queries into the same document for faster retrieval. This can also make it easier to update related data in a single write operation.
Check also for queries that use $unwind to un-pivot array elements into separate documents. These are areas that might benefit from splitting data into multiple documents to reduce document size or to flatten data for faster querying. This can also make it more efficient to update and index the data.
The specific adjustments necessary will depend on the common query patterns. In addition to avoiding resource-intensive operations like $lookup and $unwind, consider the following causes of performance degradation Reschema can address:
- Clean up arrays that were left to grow unbounded
- Fix arrays nested too deeply
- Avoid excessively large documents by splitting data
- Reduce an excessive number of collections by merging data
Configuring your new schema with Reschema
Once you’ve identified what schema changes are needed, you can make them all at once in Reschema. Here’s what you need to know.
First, make sure you have the latest version of Studio 3T installed (if you don’t already use Studio 3T, you can trial the software for 30 days for free, including the Reschema tool). Specifics about how to configure operations can be found in Reschema’s feature documentation.
Changes you make in Reschema won’t be committed to the data until you choose to execute them by pressing the green play button. This provides several advantages.
- You don’t have to worry too much about the order in which you make schema changes. Reschema handles that automatically on execution.
- You can begin to map out schema changes one day, then save your progress as a Task and return to it another day.
Certain types of schema changes will naturally require more than one data source, or more than one target. Reschema can join data from multiple source collections in the same database. It’s also possible to specify more than one target, such as dividing some fields from a source array into one collection, and the remaining fields into another.
Lastly, you’ll need to decide if you want to overwrite an existing target with the data in its new schema, or if you want to copy the data to a new target. Copying data to a new collection (or set of collections, as the case may be), will leave your source data intact. It will also give you an opportunity to thoroughly test the impact of your schema changes before deploying them.
Anticipating future schema changes
Schema management is not a “one and done” activity. As your application continues to evolve, the data will evolve with it, and it may not just be performance problems that motivate your schema changes. You may need to fix mistakes, adapt to new data requirements, or clean up data after a migration. You might upgrade to a new version of MongoDB and need to restructure the data to take advantage of new features.
By using a database schema management tool like Reschema, you gain an incremental and reversible way to apply schema and data changes that keep your MongoDB instances compliant, clean, and most of all, performing optimally.