Contents
Usage
At its core MongoFrames provides base classes for mapping your MongoDB collections and documents (including embedded documents) to Python classes and instances. For example let's say we need to store information about a collection of Dragons (why not :?), we start by defining a Python class to represent the dragon collection:
from mongoframes import *
class Dragon(Frame):
# The fields each document in our collection will store
_fields = {
'name',
'breed',
'traits',
'nemesis'
}
By default the collection name will be the same as the class name (the collection doesn't need to exist on the database beforehand). If you want to override this behaviour you can specify the collection directly using the _collection class attribute.
With the Dragon frame defined we can now add documents to our MongoDB collection:
# Add two dragons to the database - Burt and Edison
burt = Dragon(
name='Burt',
breed='Fire-drake',
traits=['Lazy', 'Grouchy']
)
burt.insert()
edison = Dragon(
name='Edison',
breed='Ice-drake',
traits=['Energetic'],
nemesis=burt
)
edison.insert()
If you haven't defined a connection yet the above code will error, checkout Connecting to the database in the Getting started guide.
If we want to retrieve documents from the collection we can query the database:
# Select all dragons
dragons = Dragon.many()
# Select just one dragon - Burt
burt = Dragon.one({'name': 'Burt'})
# Count the population of fire-drakes
total_fire_drakes = Dragon.count({'breed': 'Fire-drake'})
And let's make Edison Burt's nemesis in return (only seems fair):
# We select the documents here using the query helper `Q`
burt = Dragon.one(Q.name == 'Burt')
edison = Dragon.one(Q.name == 'Edison')
burt.nemesis = edison
burt.update('nemesis')
In the code above we sent the name of the field we changed (nemesis) to the update function to indicate we only want to send that field in the update operation. If you don't specify one or more fields to update then every field in the document will be sent.
Projections
In MongoDB if you want to return a subset of fields when retrieving documents you can specify a projection. In MongoFrames projections also map references (dereferencing) and embedded documents to their associated Frame and SubFrame classes:
To be clear when I talk about reference fields in MongoFrames' projections I am describing a scenario where a field's value consists of one or more reference values (typically ObjectIds) that reference other documents in the same database by their _id field. It is of course perfectly possible in MongoDB for an ObjectId to not be a reference to another document. This is a construct of MongoFrames and not MongoDB.
# Select all dragons and the name of their nemesis
dragons = Dragon.many(projection={'nemesis': {'$ref': Dragon, 'name': True}})
for dragon in dragons:
print("{d.name}'s nemesis is {d.nemesis.name}".format(d=dragon))
>> Burt's nemesis is Edison
>> Edison's nemesis is Burt
To include a field you assign it a value of True within the projection (conversely to exclude a field you assign it a value of False), however in MongoFrames we can also assign a field a sub-projection to describe how to map a field's value to a Frame or SubFrame.
A reference can contain a single document ID, a list of document IDs or a dictionary where each value is a document ID.
Sub-projections must contain a special $ref or $sub key which indicates which Python class to map the field's value to. $ref and $sub must be assigned a subclass of Frame and SubFrame respectively.
Projections aren't limited to one tier, a sub-projection can contain additional sub-projections and so on, here's a more complex example from a project using MongoFrames:
# Select all documents from the Meeting collection
meetings = Meeting.many(projection={
'division': {'$ref': Division},
'attendees': {
'$ref': User,
'first_name': True,
'last_name': True,
'company_name': True
},
'room': {
'$ref': Room,
'event': {
'$ref': Event,
'name': True
}
}
})
In the code above we're selecting all meetings and for each document we select the following referenced information;
- the Division (all fields),
- the Attendees (first, last and company_name fields),
- the Room (all fields),
- and the Event the Room belongs to (name field only).
This projection generates 5 MongoDB queries, the initial query to select all Meetings and one for each sub-projection. Once the documents for all queries have been retrieved MongoFrames converts each document into an instance of its associated class (associated by $ref or $sub) and structures the results as per the projection.
Including and excluding fields
When you assign one or more fields a value of True in a projection only those fields are returned and all other fields are excluded, the same is not true for fields assigned a sub-projection. Fields assigned a sub-projection are included but do not indicate that other fields should be excluded. In this way you can define sub-projections for referenced documents and embedded documents while still selecting all other fields.
# Select just the start time for all meetings
Meeting.many(projection={'start_time': True})
# Select all fields for all meetings, where division is a reference to another
# collection.
Meeting.many(projection={'division': {'$ref': Division}})
# Select just the division and its name for all meetings
Meeting.many(projection={
'_id': True,
'division': {'$ref': Division, 'name': True}
})
The _id field is a special case because it's always included in a projection and has to be specifically excluded (e.g _id: False).
Handling embedded documents
Embedded documents in MongoFrames are represented in Python by SubFrames, to demonstrate let's imagine we need to model a web order:
class Order(Frame):
_fields = {
'items'
}
@property
def total(self):
return sum([i.total for i in self.items])
class Item(SubFrame):
_fields = {
'desc',
'qty',
'unit_price'
}
@property
def total(self):
return self.unit_price * self.qty
Now we'll order some items that will be useful in battling a dragon:
sword = Item(
desc='Large sword',
qty=1,
unit_price=10
)
underwear = Item(
desc='Fire resistant underwear',
qty=10,
unit_price=2
)
order = Order(items=[sword, underwear])
order.insert()
Having created our Order we select it from the database and print the total:
# If we select the order without a projection for the SubFrame `items` will be a
# list of dictionaries.
order = Order.one()
# This will fail as dictionaries have no `total` attribute
print(order.total)
# To map the items to the `Item` class we need to use a projection
order = Order.one(projection={'items': {'$sub': Item}})
# This will work as `items` now contains a list of `Item` instances
print(order.total)
>> 30
To map the embedded documents within the items field to Item class instances we need to use a projection that specifies this mapping. If we want to map Items in this way whenever we retrieve an order we can set a default projection for the Order class:
Order._default_projection = {'items': {'$sub': Item}}
Performance
Having introduced sub-projections and their role in dereferencing ObjectIds to documents now is a good time to talk a little bit about performance.
MongoFrames was originally written to resolve performance issues we had using MongoEngine, this is not to say that MongoFrames is a better choice than MongoEngine. MongoEngine is a relatively mature, feature rich library that we have successfully deployed on a number of projects and I am very grateful to the developers behind it. Many of the features we love from MongoEngine made their way across in some form to MongoFrames.
Mapping document data to class instances
If a lot of work is done in the __init__ method of the document class then as the number of documents a query selects increases (keeping in mind each document might have a number of referenced documents to initialize too) the overhead can quickly become a problem.
Initialization in MongoFrames by default only performs a single assignment allowing thousands of document classes to be created with minimal overhead.
Short story - On a Tornado project I worked on with MongoEngine this bottleneck caused significant headaches. Tornado is a non-blocking server that uses a queue to manage requests, if those requests take too long (excluding asynchronous IO) the queue builds and requests take ever longer to process. Initially I suspected that using a synchronous ODM to query the database was causing the backlog and so considered using MotorEngine (an asynchronous ODM with a similar interface to MongoEngine), but after profiling we discovered the bottleneck was in the Document.__init__ method. Querying through PyMongo instead of MongoEngine eliminated the performance issue (though we were forced to deal with dictionaries). I ended up replacing so much of the project's code to use PyMongo that MongoFrames was conceived.
Dereferencing
References to other documents are stored as ObjectIds, dereferencing documents is the responsibility of the ODM and there are a number of patterns for doing this. MongoFrames uses the following approach:
- For each reference in the projection build a set of document IDs from the reference field.
- Select referenced documents using the ID set and the class defined by the $ref key.
- Map the returned documents to the reference field's data structure (e.g ObjectId, list, dictionary).
For each referenced field (including references within references) the database is queried using the list of all IDs for that field in the projection (e.g {_id: {$in: ids}
). In the Meeting example we gave earlier there are 4 referenced fields in the projection and so 5 queries will be executed, the initial query to select the Meetings and then a query for each reference (Division, User, Room and Event).
An alternative approach to this is to query for referenced documents only when they are accessed (e.g when meeting.division.name is called for the first time against a Meeting instance the referenced Division will be retrieved from the database). In my experience with ORM/ODMs this approach has the following disadvantages:
- A lot of queries can be generated, consider the example from earlier, if we list 10 Meetings in a template and access the Room and Event documents for each we'll generate 1 + (10 * 10) = 101 queries (as opposed to just 3 using the ID list approach).
- There's no way to set the projection for a document you dereference via an attribute meaning that the entire document is selected.
Immediate execution
This isn't strictly performance related but it's worth discussing and this is as good a place as any. Database methods that you call against MongoFrame classes will immediately execute and return results.
It's more common for ORM/ODMs to return an interface that will execute in a lazy fashion, for example when you iterate over it or select a document by index. The difference is subtle but important, consider the following:
# Select the first two dragons (don't do this in MongoFrames)
dragons = Dragon.many()[:2]
In some ORM/ODMs this will execute a single query limiting the query to two results, in MongoFrames though this will execute a single query without any limit and then select the first two items. The end results are identical but the MongoFrames query will likely take significantly longer. To limit the query to two results in MongoFrames we use the limit keyword argument:
# Select the first two dragons (do this instead)
dragons = Dragon.many(limit=2)
Events
The following events are triggered against the Frames when documents are inserted, updated or deleted:
- insert
- inserted
- update
- updated
- delete
- deleted
Each event has a before and after phase, e.g insert is before and inserted is after. MongoFrames uses the excellent Blinker library to allow you to listen for and react to these events:
# Define a function that will be called every time a dragon is inserted
def on_inserted(sender, frames):
for frame in frames:
print(frame.name + ' inserted')
# Bind the `on_inserted` function to the `inserted` event
Dragon.listen('inserted', on_inserted)
Events can be used to implement a wide variety of behaviours and the Frame class provides a set of class methods for some of the most common ones:
# Setting a created/modified timestamp (class must support created and modified
# fields.)
Dragon.listen('insert', Dragon.timestamp_insert)
Dragon.listen('update', Dragon.timestamp_update)
# For the following examples we imagine that the `nemesis` field now references
# documents in a new collection called `Knight` (as this makes it easier to see
# what's going on).
# Cascading deletes for references
def on_delete(sender, frames):
Dragon.cascade(Knight, 'nemesis', frames)
Dragon.listen('delete', on_delete)
# Nullifying referenced fields
def on_delete(sender, frames):
Knight.nullify(Dragon, 'nemesis', frames)
Dragon.listen('delete', on_delete)
# Pulling references from a list field (nemesis is now nemeses)
def on_delete(sender, frames):
Knight.pull(Dragon, 'nemeses', frames)
ComplexDragon.listen('delete', on_delete)
Reference
Frame
frame = Frame(*args, **kwargs)
Frames allow documents to be wrapped in a class instance adding support for dot notation access to attributes and numerous short-cut/helper methods.
Frames can be initialized with either a dictionary as the first argument or by specifying keywords, for example these two statements are equivalent:
# Using a dictionary
dragon = Dragon({'name': 'Burt', 'breed': 'Fire-drake'})
# Using keyword arguments
dragon = Dragon(name='Burt', breed='Fire-drake')
Class attributes (Frame)
Frame._client
The MongoDB client used to interface with the database. Typically this is manually set against the Frame class and is therefore shared by all classes inheriting from Frame.
# Set up the database client for MongoFrames
Frame._client = MongoClient('mongodb://localhost:27017/mydb')
To be clear about this, until the _client attribute is set no database operations can be performed.
Frame._collection
The name of the database collection the class represents. If not specified this will default to the name of the class.
Frame._db
The database the collection resides in. By default this is set to null and the default database will be selected.
Frame._default_projection
The default projection used when selecting documents for this class. By default no default projection is set and so all defined fields are selected (without any reference to Frame or SubFrame mappings).
Frame._fields
A set of fields that define the document. An _id field will be automatically included (though if you choose to add one then it won't make any difference).
Any field you define against the class will be accessible as an attribute (e.g using dot notation) against instances of the class.
If you inherit from another Frame based class and wish to add additional fields then remember that _fields is a set not a list:
class Child(Parent):
# WRONG - this won't work
_fields = Parent._fields + {'new_field'}
# RIGHT - unions are the way to go
_fields = Parent._fields | {'new_field'}
Frame._private_fields
A set of private fields that will be excluded from the output of to_json_type.
Database access (Frame)
Frame.get_collection()
Return a reference to the database collection for the class.
Frame.get_db()
Return the database for the collection.
Operations (Frame)
Frame.insert_many(documents)
Insert multiple documents in a single operation. The documents argument can either be a list of dictionaries, Frame instances, or a mixture.
Frame.unset_many(documents, *fields)
Unset one or more fields against multiple documents in a single operation. The documents argument can either be a list of dictionaries, Frame instances, or a mixture.
Frame.update_many(documents, *fields)
Update multiple documents in a single operation. The documents argument can either be a list of dictionaries, Frame instances, or a mixture. Optionally a specific list of fields to update can be specified as arguments.
Frame.delete_many(documents)
Delete multiple documents in a single operation. The documents argument can either be a list of dictionaries, Frame instances, or a mixture.
frame.delete()
Delete the document from the database.
frame.insert()
Insert the document into the database.
frame.unset(*fields)
Unset one for more fields against a document on the database.
frame.update(*fields)
Update the document against the database. Optionally a specific list of fields to update can be specified as arguments.
frame.upsert(*fields)
Insert or update the document depending on whether it exists in the database or not. The value of the _id field is used to determine if the document exists (e.g if there's no ID defined then it is assumed not to exist).
As with the update method a specific list of fields to update can be specified as arguments. Inserts ignore the fields arguments and send all data for the document.
This method is not the same as specifying the upsert flag when calling MongoDB. When called for a document with an _id value, this method will call the database to see if a record with that Id exists, if not it will call insert, if so it will call update. This operation is therefore not atomic and much slower than the equivalent MongoDB operation.
Queries (Frame)
When querying the database using MongoFrames you can send any keyword argument to the method that you can send to the PyMongo.find method, these are the most common:
- limit - the maximum number of results to return.
- projection - the fields to return for each document, see projections.
- skip - the number of results to skip.
- sort - the order to apply to the results of the query.
Methods that accept a filter argument (e.g Frame.many()) will select all documents by default if filter is not specified.
Frame.by_id(id, **kwargs)
Get a document by id.
Frame.count(filter, **kwargs)
Return a count of documents matching the filter.
Frame.ids(filter, **kwargs)
Return a list of IDs for documents matching the filter.
Frame.one(filter, **kwargs)
Return the first document matching the filter.
Frame.many(filter, **kwargs)
Return a list of documents matching the filter.
frame.reload(**kwargs)
Reload the document from the database.
Signals and signal helpers (Frame)
Helper class methods do not themselves trigger events (to prevent infinite loops).
Frame.listen(event, func)
Register a callback function for a named event.
Frame.stop_listening(event, func)
Remove a previously registered callback function for a named event.
Frame.cascade(ref_cls, field, frames)
Apply a cascading delete. Documents in the referenced collection will be deleted where the specified field matches one of the given frames.
Frame.nullify(ref_cls, field, frames)
Nullify a reference field. Documents in the referenced collection will have the specified field's value set to None/null where the value matches in the given frames.
Frame.pull(ref_cls, field, frames)
Pull references from a list field. Documents in the referenced collection will have the ID of each of the specified frames pulled from the given field.
Frame.timestamp_insert(sender, frames)
Timestamp the created and modified fields for all documents. The class must have created and modified fields.
Frame.timestamp_update(sender, frames)
Timestamp the modified field for all documents. The class must have a modified field.
I/O (Frame)
frame.to_json_type()
Return a dictionary for the frame with values converted to JSON safe types.
SubFrame
sub_frame = SubFrame(*args, **kwargs)
Sub-frames allow embedded documents to be wrapped in a class adding support for dot notation access to attributes.
SubFrames can be initialized with either a dictionary as the first argument or by specifying keywords, for example these two statements are equivalent:
# Using a dictionary
sword = Item({'desc': 'Sword', 'qty': 1, 'unit_price': 10})
# Using keyword arguments
sword = Item(desc='Sword', qty=1, unit_price=10)
SubFrame._fields
A set of fields that define the embedded document. Any field you define against the class will be accessible as an attribute (e.g using dot notation) against instances of the class.
SubFrame._private_fields
A set of private fields that will be excluded from the output of to_json_type.
sub_frame.to_json_type()
Return a dictionary for the sub-frame with values converted to JSON safe types.