Contents

Usage

@@ What are factories for, generating data for user testing, code testing, load testing, anonymity.

@@ Recommend reading the tutorial and link to it

Makers, Blueprints and Factories

@@ Describe the hieracy structure of Factories, Blueprints and Makers.

Importing

Building a set of Blueprints for your database collections will often mean using the majority of Makers and Quotas, we recommend the following approach for importing them:

from mongoframes factory import Factory
from mongoframes.factory import blueprints
from mongoframes.factory import makers
from mongoframes.factory.makers import dates as date_makers
from mongoframes.factory.makers import images as image_makers
from mongoframes.factory.makers import numbers as number_makers
from mongoframes.factory.makers import selections as selection_makers
from mongoframes.factory.makers import text as text_makers
from mongoframes.factory import quotas​

Piggy backing on faker

We piggy back on the excellent faker package heavily, the Faker maker class allows any faker provider to be used as a maker, for example:

class User(Blueprint):

    first_name = Faker('first_name')
    ...​

The faker package features many providers for generating data and can be localised for different countries. There is some overlap between MongoFrames makers and faker providers (in some cases we decided that we wanted a different interface for the maker, e.g the Lorem maker), but for the most part Makers do not provide functionality already provided by faker.

Reference

Factory

factory = Factory()

# Assemble documents for 100-200 Dragons
docs = factory.assemble(DragonBlueprint, quota.Random(100, 200))

# Finish the documents and populate the dabase with dragons
dragons = factory.populate(DragonBlueprint, docs)​

factory = Factory()

The Factory class is responsible for production of fake data for.

Production of fake data is a two (well three but we'll get to that) stage process:

Stage 1 - Assembly

A Quota of documents is assembled based on a Blueprint.

At this stage the documents contain a mixture of static and dynamic data. Dynamic data is data that will be transformed during finishing, for example a field might contain a value of ['now', 'tomorrow'] which on finish will be converted to a date/time between now and tomorrow.

Once assembled the generated documents are returned as a list and can be either used immediately to populate the database or saved out as JSON for future use (for example when building a set of test data).

It's recommended that data generated in the assembly stage should be JSON safe so that it can be easily stored - all MongoFrames makers return JSON safe data.

Stages 2 & 3 - Finishing and population

A database is populated based on a Blueprint and preassembled list of documents.

During this stage dynamic data is converted to static data (this process is call finishing) and inserted into the database.

Before populate inserts the finished documents into the database it converts each document into a Frame instance and calls the on_fake method against the Blueprint, after the documents are inserted into the database it calls the on_faked method against the Blueprint.

The Factory class has both a finish and a populate method however the populate method calls the finish method and so we lump these two stages together.

factory.assemble(blueprint, quota)

Assemble a quota of documents using the given blueprint.

factory.finish(blueprint, documents)

Finish a list of pre-assembled documents using the given blueprint.

factory.populate(blueprint, documents)

Populate the database with fake data using the given blueprint and preassembled documents.

factory.reassemble(blueprint, fields, documents)

Reassemble the given set of fields for a list of preassembled documents using the specified blueprint.

Reassembly is done in place, since the data you send the method should be JSON type safe, if you need to retain the existing documents it is recommended that you copy them using copy.deepcopy.

blueprints module

Blueprint

Blueprint

Blueprints provide the instructions for producing a fake document for a collection via a Frame class. The Blueprint class should not be used directly but inherited from, in addition Blueprint classes should not be initialized and should only be used as static classes.

lass User(Frame):

    _fields = {
        'first_name',
        'last_name',
        'email',
        'password_hash',
        'password_salt'
        }

    @property
    def password(self):
        return ''

    @password.setter
    def password(self, value):
        self.password_salt = str(os.urandom(64))
        self.password_hash = encrypt_password(value, self.password_salt, 10000)


class UserBlueprint(Blueprint):
    
    _frame_cls = User
    _meta_fields = {'password'}

    first_name = makers.Faker('first_name')
    last_name = makers.Faker('last_name')
    email = makers.Lambda(
        lambda doc: '{first_name}.{last_name}@example.com'.format(**doc))
    password = makers.Static('password')​

The _frame_cls property defines which Frame class the Blueprint will generate data for, The _meta_fields property, which should be a set, determines which, if any, of the fields should be set against the Frame instead of included in the document used to initialize it. Any property defined as a Maker is automatically added to a list of instructions for the Blueprint.

Blueprint.get_frame_cls()

Return the Frame class for the Blueprint (defined as _frame_cls against the Blueprint class).

Blueprint.get_instructions()

Return the instructions for the Blueprint (defined by the properties with Maker values set against the Blueprint class).

Blueprint.get_meta_fields()

Return the meta-fields for the Blueprint (defined as _meta_fields against the Blueprint class).

Blueprint.assemble()

Assemble a single document using the Blueprint.

Blueprint.finish(document)

Take a assembled document and convert all assembled values to finished values.

Blueprint.reassemble(fields, documents)

Take a previously assembled document and reassemble the given set of fields for it in place.

Blueprint.reset()

Reset the Blueprint.

Blueprints are typically reset before being used to assemble a quota of documents. Resetting a Blueprint will in turn reset all the Maker instances defined as instructions for the Blueprint allowing internal counters and alike to be reset.

Blueprint.on_fake(frames)

The on_fake method of the blueprint is called before frames are inserted by the Factory.populate method. It can be overridden to modify frame data before insertion. By default it triggers a fake event against the blueprint's frame_cls.

Blueprint.on_faked(frames)

The on_faked method of the blueprint is called after frames are inserted by the Factory.populate method. It can be overridden to modify frame data afte insertion. By default it triggers a faked event against the blueprint's frame_cls.

quotas module

Quota

quota = Quota(quantity)

A base class for implementing variable quota. The Quota class can be safely used as an argument for Factorys and Makers.

# Using a quota to determine the number of documents a factory should assemble
factory.assemble(quotas.Gauss(50, 10))

# Using a quota to determine the number of sentences in a paragraph
Markov('paragraph', quota.Random(3, 10))​

The base class provides a fixed value and is no different than using an integer or float value.

Gauss

gauss = Gauss(mu, sigma)

Return a random quota using a Gaussian distribution where mu represents the mean and sigma represents the standard deviation.

Random

random = Random(min_quantity, max_quantity)

Return a random quota between two values (min_quantity and max_quantity).

makers module

Maker

A base class for all Maker classes.

maker.__call__(*args)

The __call__ method is how makers are typically interacted with, if no argument is are passed the the _assemble method is called, if an argument is passed then the _finish method is called.

maker.document

The document (dictionary) the maker is currently generate a value for.

maker.reset()

Reset the maker instance.

maker.target(document)

The target method should be called using with and sets (unsets) the current document for the maker:

with maker(document):
    maker()​

maker._assemble()

The _assemble method is called during the assemble stage of generating data, it should return a value that is JSON safe.

maker._finish(value)

The _finish method is called during the finish stage of generating data, the value argument contains the value generated by the maker in the assemble stage.

DictOf

maker = DictOf({
    'desc': Lorem('sentence', quota.Random(1, 5)),
    'worth': 10
    })

...

{'desc': 'Mauris volutpat.', 'worth': 10}
{'desc': 'Gravida sed suscipit sit amet.', 'worth': 10}
{'desc': 'Praesent ut tempus.', 'worth': 10}​

dict_of = DictOf(table)

Make a dictionary of key/values where each value is a set generated using a maker or is the given non-maker value. The table argument should be a dictionary containing a set of keys to generate for each dictionary and the associated maker/value to assign the key.

Faker

maker = Faker('name')

...

'Adaline Reichel'
'Noemy Vandervort'
'Gracie Weber'​

faker = Faker(provider, assembler=True, locale=None, **kwargs)

Use any faker provider to generate a value (see http://fake-factory.readthedocs.io/).

The assembler argument determines the stage at which the value is generated, if True then the value is returned on assemble else on finish.

The locale argument is used to configure the locale of the faker.Factory instance created by the maker internally, if not specified it will default to the Faker.default_maker value.

Faker.default_locale

The default locale (us_EN) used when generating values using the faker library.

Faker.get_fake(locale=None)

Return a shared faker.Factory factory used to generate fake data for the given locale.

Lambda

maker = Lambda(lambda doc: '{first_name}.{last_name}@example.com'.format(**doc))

...

'Mollie.Mathews@example.com'
'Anne.Lysan@example.com'
'Geoffrey.Dudley@example.com'​

lambda_maker = Lambda(func, assembler=True, finisher=False)

Use a function to generate a value. The assembler and finisher arguments determine if the function is called in the in assemble and/or finish. If the finisher argument is True the assembled value will be passed to the function, for example:

def my_func(doc, *args):
    if len(args):
        # If an additional argument has been sent the function is being called
        # in the finish stage and the argument is the assembled value... 
        return args[0] + ' and finished'
    else:
        # ...otherwise the function is being called in the assemble stage.
        return 'assembled'

maker = Lambda(my_func, finisher=True)

...

'assembled and finished'​

ListOf

maker = ListOf(Code(4), 3)

...

['G9FF', '00I3', '0HSZ']
['RQGA', 'BDVF', 'HHFU']
['G39I', '44SY', 'QK2N']​

list_of = ListOf(maker, quantity, reset_maker=False)

Generate a list of values of the given quantity using the specified maker. By default associated maker will be reset only when the parent Blueprint class is reset, however if the reset_maker argument is set to True then the reset will be called against the maker each time the list is generated.

As an example of where setting the reset_maker argument to True can be, if you want to generate a list of unique codes per document not unique to the set of documents you could do the following:

maker = ListOf(Unique(Code(4)), reset_maker=True)​

Static

maker = Static('foo')

...

'foo'
'foo'
'foo'​

static = Static(value, assembler=True)

A maker that returns a fixed value. Optionally the stage at which the value is generated can be set using the assembler argument, if True then the value is returned on assemble else on finish.

If the static value isn't a JSON safe type then the assembler argument should be set to False.

SubFactory

class Item(Blueprint):
    
    desc = Lorem('sentence', quota.Random(1, 5))
    worth = Int(1, 100)

maker = SubFactory(Item)

...

Item(desc='Mauris volutpat.', worth=55)
Item(desc='Gravida sed suscipit sit amet.', worth=13)
Item(desc='Praesent ut tempus.', worth=92)​

sub_factory = SubFactory(blueprint)

A maker that generates SubFrames (embedded documents) based on the given blueprint.

sub_factory.reset()

Reset the blueprint for the maker.

Unique

maker = Unique(Code(4))

...

'G9FF'
'00I3'
'0HSZ'​

unique = Unique(maker, exclude=None, assembler=True, max_attempts=1000)

Ensure that unique values are generated by a maker. Optionally the exclude argument allows a set of existing values that do not count as unique.

The assembler argument determines the stage at which the unique test is applied, if True the test is applied on assemble otherwise it is applied on finish.

The max_attempts argument determines the number of times the Unique will attempt to generate a unique value calling the associated maker before raising giving up and raising an error.

unique.reset()

Reset the set of used values to the initially excluded set (or an empty set if one isn't provided).

makers.dates module

DateBetween

maker = DateBetween('today', 'today+7')

...

date(2016, 1, 4)
date(2016, 1, 2)
date(2016, 1, 5)​

date_between = DateBetween(min_date, max_date)

Return a date between two dates. Dates can be specified either as datetime.date instances or as strings of the form, {yesterday|today|tomorrow}{+|-}{no_of_days}.

date_between.parse_date(d)

If d is a date then it is returned, if d is a string of the form {yesterday|today|tomorrow}{+|-}{no_of_days} then a date is returned based on the base date (yesterday, today, tomorrow) and the offset (+/- the number of days).

makers.images module

ImageURL

maker = ImageURL(600, quota.Random(400, 800))

...

'//fakeimg.pl/600x430'
'//fakeimg.pl/600x788'
'//fakeimg.pl/600x713'​

image_url = ImageURL(
    width,
    height,
    background='CCCCCC',
    foreground='8D8D8D',
    options=None,
    service_url='//fakeimg.pl',
    service_formatter=None
    )

Return a fake image URL (by default we use the fakeimg.pl service).

Depending on the image service provider the image URL can be configured using the background, foreground, and options arguments. The options argument should be a dictionary, by default each item in the dictionary will be converted to a named parameter within the URL. For example the fakeimg.pl service allows a text argument to be passed that will append a label to the returned image.

The service_url and service_formatter arguments allow new provider services to be configured, the service_formatter argument should be a function that returns a valid URL, for example:

def default_service_formatter(service_url, width, height, background, \
        foreground, options):
    """Generate an image URL for a service"""

    # Build the base URL
    image_tmp = '{service_url}/{width}x{height}/{background}/{foreground}/'
    image_url = image_tmp.format(
        service_url=service_url,
        width=width,
        height=height,
        background=background,
        foreground=foreground
        )

    # Add any options
    if options:
        image_url += '?' + urlencode(options)

    return image_url​

makers.numbers module

Counter

maker = Counter(5, 10)

...

5
15
25​

counter = Counter(start_from=1, step=1)

Generate a sequence of numbers. The optional start_from and step values can be integers or Quotas.

counter.reset()

Reset the counter to start again from the start_from value.

Float

maker = Float(5.0, 10.0)

...

5.52615217015
9.82389432730
8.32649122930​

_float = Float(min_value, max_value)

Generate a random float between two values. The min/max values can be a float or Quota.

Int

maker = Int(5, 10)

...

5
9
8​

_int = Int(min_value, max_value)

Generate a random integer between two values. The min/max values can be an integer or Quota.

makers.selections module

Cycle

maker = Cycle(['foo', 'bar', Code(4)])

...

'foo'
'bar'
'G9FF'
'foo'
'bar'
'00I3'​

cycle = Cycle(items)

Pick the next item from a list of makers and/or values cycling through the list and repeating when we reach the end.

cycle.reset()

Reset the item index (start the cycle over from the first item).

OneOf

maker = OneOf(['foo', 'bar', Code(4), ...])

...

'G9FF'
'bar'
'0HSZ'​

one_of = OneOf(items, weights=None)

Pick one item from a list of makers and/or values.

A set weights for the items can be specified to determine the probability of each item being selected, if no weights are given then each item is assigned an equal probability. Weights can specified as integers, floats or Quotas.

RandomReference

maker = RandomReference(Dragons)

...

ObjectId('507f1f77bcf86cd799439011')
ObjectId('507f191e810c19729de860ea')
ObjectId('54759eb3c090d83494e2d804')​

random_reference = RandomReference(frame_cls, constraint=None)

Pick a reference document at random from a collection (determined by the given frame_cls) optionally applying a constraint.

SomeOf

maker = SomeOf(['foo', 'bar', Code(4), ...], 2)

...

['foo', 'bar']
['G9FF', 'foo']
['bar', 'foo']​

some_of = SomeOf(items, sample_size, weights=None, with_replacement=False)

Pick one or more items from a list of makers and/or values. The sample size can be an integer or a Quota.

A set weights for the items can be specified to determine the probability of each item being within the sample taken if no weights are given then each item is assigned an equal probability. Weights can  specified as integers, floats or Quotas.

By default items are drawn for the sample without replacement (e.g each item selected for the sample is removed from the items after being taken and can't be selected again), however this behaviour can be changed by setting with_replacement to True.

SomeOf.p(i, sample_size, weights)

(Staticmethod) Given a weighted set and sample size return the probabilty that the weight `i` will be present in the sample.

Created to test the output of the `SomeOf` maker class. The math was provided by Andy Blackshaw - thank you dad :)

SomeOf.weighted(weights, sample_size, with_replacement=False)

(Staticmethod) Return a set of random integers 0 <= N <= len(weights) - 1, where the weights determine the probability of each possible integer in the set.

makers.text module

Code

maker = Code(4)

...

'G9FF'
'00I3'
'0HSZ'​

code = Code(length, charset=None)

Generate a random code of the given length using either the default_charset or optionally a custom charset. The length specified can be an integer or Quota.

Code.default_charset

The default charset (string.ascii_uppercase + string.digits) used if a charset is not given when the maker is initialized.

Join

maker = Join(['SKU', Code(4)], sep='-')

...

'SKU-G9FF'
'SKU-00I3'
'SKU-0HSZ'​

join = Join(items, sep=' ')

Join the output of 2 or more items (makers and/or values) together with a separator string.

Lorem

maker = Lorem('sentence', 6)

...

'Lorem ipsum dolor sit amet, consectetur.'
'Integer quis ultricies risus aliquam id.'
'Nunc ac diam orci dis parturient.'​

lorem = Lorem(text_type, quantity)

Generate random amounts of lorem ipsum. To determine the amount of lorem ipsum generated the type of text structure to generate must be specified;

  • body,
  • paragraph,
  • sentence

along with the quantity;

  • paragraphs in a body,
  • sentences in a paragraph,
  • words in a sentence.

The quantity value can be an integer or Quota.

Markov

with open('moby_dick.txt') as f:
    Markov.init_word_db('moby_dick', f.read())

maker = Markov('moby_dick', 'sentence', 6)

...

'The larger whales had been introduced.'
'So important an upper hand of.'
'In my soul does Jonah\'s deep.'​

markov = Markov(db, text_type, quantity)

Generate random amounts of text using a Markov chain. To determine the amount of text generated the type of text structure to generate must be specified;

  • body,
  • paragraph,
  • sentence

along with the quantity;

  • paragraphs in a body,
  • sentences in a paragraph,
  • words in a sentence.

The quantity value can be an integer or Quota.

markov.database

(Read-only) Return the selected word database.

Markov.init_word_db(name, text)

Initialize a database of words for the maker with the given name. The specified text must contain at least 3 words.

Before you can create an instance of the class a word database must be initialized.

Sequence

maker = Sequence('username-{index}', start_from=100)

...

'username-100'
'username-101'
'username-102'​

sequence = Sequence(template, start_from=1)

Generate a sequence of values where a number is inserted into a template. The template should specify an index value, for example: 'prefix-{index}'

sequence.reset()

Reset the sequence back to the start_from value .