6 rules of thumb for mongodb schema design_ part 2 _ mongodb

This is the second stop on our tour of modeling One-to-N relationships in MongoDB. Data recovery program Last time I covered the three basic schema designs: embedding, child-referencing, and parent-referencing. 7 data recovery 94fbr I also covered the two factors to consider when picking one of these designs:

With these basic techniques under our belt, I can move on to covering more sophisticated schema designs, involving two-way referencing and denormalization.


If you want to get a little bit fancier, you can combine two techniques and include both styles of reference in your schema, having both references from the “one” side to the “many” side and references from the “many” side to the “one” side.

For an example, let’s go back to that task-tracking system. Database languages There’s a “people” collection holding Person documents, a “tasks” collection holding Task documents, and a One-to-N relationship from Person -> Task. Database ranking The application will need to track all of the Tasks owned by a Person, so we will need to reference Person -> Task.

On the other hand, in some other contexts this application will display a list of Tasks (for example, all of the Tasks in a multi-person Project) and it will need to quickly find which Person is responsible for each Task. Data recovery youtube You can optimize this by putting an additional reference to the Person in the Task document.

This design has all of the advantages and disadvantages of the “One-to-Many” schema, but with some additions. 911 database Putting in the extra ‘owner’ reference into the Task document means that its quick and easy to find the Task’s owner, but it also means that if you need to reassign the task to another person, you need to perform two updates instead of just one. Data recovery download Specifically, you’ll have to update both the reference from the Person to the Task document, and the reference from the Task to the Person. Data recovery after factory reset (And to the relational gurus who are reading this – you’re right: using this schema design means that it is no longer possible to reassign a Task to a new Person with a single atomic update. Database schema design This is OK for our task-tracking system: you need to consider if this works with your particular use case.)

Beyond just modeling the various flavors of relationships, you can also add denormalization into your schema. Database uml This can eliminate the need to perform the application-level join for certain cases, at the price of some additional complexity when performing updates. Data recovery ntfs An example will help make this clear.

For the parts example, you could denormalize the name of the part into the ‘parts[]’ array. Database error For reference, here’s the version of the Product document without denormalization.

Denormalizing would mean that you don’t have to perform the application-level join when displaying all of the part names for the product, but you would have to perform that join if you needed any other information about a part.

Denormalizing saves you a lookup of the denormalized data at the cost of a more expensive update: if you’ve denormalized the Part name into the Product document, then when you update the Part name you must also update every place it occurs in the ‘products’ collection.

Denormalizing only makes sense when there’s an high ratio of reads to updates. Database functions If you’ll be reading the denormalized data frequently, but updating it only rarely, it often makes sense to pay the price of slower updates – and more complex updates – in order to get more efficient queries. Top 10 data recovery As updates become more frequent relative to queries, the savings from denormalization decrease.

For example: assume the part name changes infrequently, but the quantity on hand changes frequently. Database job titles This means that while it makes sense to denormalize the part name into the Product document, it does not make sense to denormalize the quantity on hand.

Also note that if you denormalize a field, you lose the ability to perform atomic and isolated updates on that field. Data recovery linux live cd Just like with the two-way referencing example above, if you update the part name in the Part document, and then in the Product document, there will be a sub-second interval where the denormalized ‘name’ in the Product document will not reflect the new, updated value in the Part document.

However, if you’ve denormalized the Product name into the Part document, then when you update the Product name you must also update every place it occurs in the ‘parts’ collection. S pombe database This is likely to be a more expensive update, since you’re updating multiple Parts instead of a single Product. Database usa As such, it’s significantly more important to consider the read-to-write ratio when denormalizing in this way.

You can also denormalize the “one-to-squillions” example. Data recovery pro license key This works in one of two ways: you can either put information about the “one” side (from the ‘hosts’ document) into the “squillions” side (the log entries), or you can put summary information from the “squillions” side into the “one” side.

Here’s an example of denormalizing into the “squillions” side. Data recovery on android I’m going to add the IP address of the host (from the ‘one’ side) into the individual log message:

In fact, if there’s only a limited amount of information you want to store at the “one” side, you can denormalize it ALL into the “squillions” side and get rid of the “one” collection altogether:

On the other hand, you can also denormalize into the “one” side. Icare data recovery 94fbr Lets say you want to keep the last 1000 messages from a host in the ‘hosts’ document. Image database You could use the $each / $slice functionality introduced in MongoDB 2.4 to keep that list sorted, and only retain the last 1000 messages:

The log messages get saved in the ‘logmsg’ collection as well as in the denormalized list in the ‘hosts’ document: that way the message isn’t lost when it ages out of the ‘hosts.logmsgs’ array.

Note the use of the projection specification ( {_id:1} ) to prevent MongoDB from having to ship the entire ‘hosts’ document over the network. Database web application By telling MongoDB to only return the _id field, I reduce the network overhead down to just the few bytes that it takes to store that field (plus just a little bit more for the wire protocol overhead).

Just as with denormalizing in the “One-to-Many” case, you’ll want to consider the ratio of reads to updates. Database graphic Denormalizing the log messages into the Host document makes sense only if log messages are infrequent relative to the number of times the application needs to look at all of the messages for a single host. In databases a category of data is called a This particular denormalization is a bad idea if you want to look at the data less frequently than you update it.

banner