Build a blog with Django: Using Factory Boy and Faker to seed posts

The Django admin gave us a nice interface for adding posts to our blog but it's tedious to use when we need to enter fake content for testing.

Typically you'd be working on a feature, say, building and styling an index page to show all published posts. And, let's assume you want to have a nice empty state, a state with few posts (<= 10 published posts) and no pagination and finally a state with many posts (> 10 published posts) that causes the pagination to be shown.

You can certainly login to the admin and manually enter or delete posts until you have the right amount for the state you're testing. But, come on. That's labor intensive and would get boring pretty quickly. Not to mention if you need to see what 2 or 3 pages of posts would look like you'd need to enter around 20 to 30 posts.

I hope you're saying to yourself that there must be something we can do.

And, there is. But to my knowledge it's not built into Django.

In this post, I'd show you how to automate the task with Factory Boy and Faker.

Let's get started.

Factory Boy

Factory Boy is a Python package that allows you to create templates for producing valid objects.

It's based on thoughtbot's factory_girl gem and is designed to work well with the Django ORM.

Let's see what it can do.

(venv) $ pip install factory_boy
(venv) $ cd src && touch posts/factories.py

Then, edit posts/factories.py to contain:

import factory  
import factory.django


from .models import Post


class PostFactory(factory.django.DjangoModelFactory):  
    class Meta:
        model = Post

    is_published = False

    title = 'A title'

    slug = 'a-slug'

    excerpt = 'This is an excerpt.'

    body = 'This is a body.'

Now, let's fire up the shell

(venv) $ python manage.py shell

and play with it for a bit.

>>> from posts.factories import PostFactory

>>> # Return an unsaved Post instance
>>> post = PostFactory.build()
>>> post.title
'A title'  
>>> post.slug
'a-slug'  
>>> post.body
'This is a body.'  
>>> post.id is None
True

>>> # Return a saved Post instance
>>> post = PostFactory.create()
>>> post.id
1

>>> post = PostFactory.create()
...
django.db.utils.IntegrityError: duplicate key value violates unique constraint "posts_post_slug_key"  
DETAIL:  Key (slug)=(A slug) already exists.

>>> post = PostFactory.create(slug='another-slug')
>>> post.id
2  
>>> post.title
'A title'  
>>> post.slug
'another-slug'  

So everything works as expected except for that django.db.utils.IntegrityError at the end. The issue is that the slug field was declared to be unique but our template generates the same slug every time. I showed you one way of getting around the problem but here's a better way.

slug = factory.Sequence(lambda n: 'slug-{}'.format(n))  

And, back in the shell we get:

>>> post = PostFactory.create()
>>> post.slug
'slug-0'

>>> post = PostFactory.create()
>>> post.slug
'slug-1'

>>> post = PostFactory.create()
>>> post.slug
'slug-2'  

N.B.: We can still encounter problems if we exit the shell and return. That's because the counter will restart from 0. You see Factory Boy is intended to be used for automated testing scenarios like unit testing and integration testing. Each test case is run with a clean database so sequencing usually doesn't cause these problems. There is another way to generate the slug that I'll show you later on that will work better in all cases. In the meantime, if you need to start from fresh then Post.objects.all().delete() would delete all posts in the database.

We have another subtle problem. If we create a published post by doing PostFactory.create(is_published=True) then the published_at field remains unset. Let's fix that now.

from django.utils.timezone import now

# ...

class PostFactory(factory.django.DjangoModelFactory):  
    # ...

    @factory.lazy_attribute
    def published_at(self):
        return now() if self.is_published else None

Giving,

>>> post = PostFactory.create(is_published=True)
>>> post.published_at
datetime.datetime(2017, 1, 19, 10, 20, 29, 653411, tzinfo=<UTC>)  

Fantastic!

You can do a lot of other neat stuff with Factory Boy. So be sure to check out their docs to learn more.

Faker

Faker is a Python package that generates fake data for you.

When you install Factory Boy you also get Faker, so you don't need to do anything extra.

N.B.: Faker recently changed its package name from fake-factory to Faker so if you're installing it standalone be sure to do pip install Faker.

Let's play with it.

>>> from faker import Faker
>>> fake = Faker()
>>> fake.name()
'Destiny Robles'  
>>> fake.sentence()
'Dolor beatae consequuntur inventore cum.'  
>>> fake.text(max_nb_chars=300)
'Unde corporis tenetur accusamus velit. Aliquid laborum excepturi cum nostrum.\nReprehenderit at eius fugiat nulla libero dolorum possimus. Necessitatibus dignissimos nulla laborum aut. Rem ipsum laborum aperiam id culpa pariatur. Quidem ullam explicabo architecto architecto.'  

Cool, see how easy it is to use.

Check the docs to learn more.

However, in the context of Factory Boy we use it a bit differently. Edit posts/factories.py to change how the excerpt and body fields are set:

excerpt = factory.Faker('sentence', nb_words=20)

body = factory.Faker('text', max_nb_chars=5000)  

And, that's it. When we create posts from this factory we'd get different excerpts and body contents each time. Try it out to see what I mean.

What about the title?

Well, I couldn't find a provider that had a fake suitable to be used as a title. However, the sentence fake is pretty close to what we need. We just have to get rid of that period it adds to the end.

Faker allows you to write custom providers and fakes to suit your needs.

Just above the PostFactory class add the following:

from faker.providers.lorem.la import Provider as LoremProvider

class ExtendedLoremProvider(LoremProvider):  
    @classmethod
    def title(cls, nb_words=6, variable_nb_words=True):
        return cls.sentence(nb_words, variable_nb_words)[:-1]

factory.Faker.add_provider(ExtendedLoremProvider)  

Now we can fake the title.

title = factory.Faker('title')  

And, while we're at it let's improve upon the slug's generation by making it depend on the title field.

from django.utils.text import slugify

slug = factory.LazyAttribute(lambda p: slugify(p.title))  

Here's the final contents of the posts/factories.py file in all its glory:

from django.utils.text import slugify  
from django.utils.timezone import now

import factory  
import factory.django

from faker.providers.lorem.la import Provider as LoremProvider

from .models import Post


class ExtendedLoremProvider(LoremProvider):  
    @classmethod
    def title(cls, nb_words=6, variable_nb_words=True):
        return cls.sentence(nb_words, variable_nb_words)[:-1]


factory.Faker.add_provider(ExtendedLoremProvider)


class PostFactory(factory.django.DjangoModelFactory):  
    class Meta:
        model = Post

    is_published = False

    title = factory.Faker('title')

    slug = factory.LazyAttribute(lambda p: slugify(p.title))

    excerpt = factory.Faker('sentence', nb_words=20)

    body = factory.Faker('text', max_nb_chars=5000)

    @factory.lazy_attribute
    def published_at(self):
        return now() if self.is_published else None

Seed posts

Great!

We can generate posts with reasonable fake data.

Let's write a method that we can call and tell it how many unpublished and published posts we need.

Create a new file posts/seed.py and edit it to contain the following:

from .factories import PostFactory  
from .models import Post


def create_posts(unpublished, published):  
    for i in range(unpublished + published):
        is_published = i >= unpublished
        PostFactory.create(is_published=is_published)

Awesome!

Now we can just jump into the shell and do:

>>> from posts.models import Post
>>> Post.objects.all().delete()

>>> from posts.seed import create_posts
>>> create_posts(1, 5)

And that would put 1 unpublished post and 5 published posts into the database for us.

Wrap up

That completes what we wanted to do. Today you learned:

  • About Factory Boy and using it to make templates of valid objects
  • About Faker and using it to generate fake data
  • How to use Factory Boy and Faker together
  • How create custom providers for Faker
  • And, how to put it all together to seed the database

There are a few loose ends I want to tie up before I'd consider this feature done.

  1. The published_at timestamps that get generated are too close to each other. I'd rather have them spread out over a longer time span. That would be more realistic. I'd show you how to do that in my next post by freezing time.

  2. Even though it is arguably simpler to populate the database with lots of posts both unpublished and published I will still find it hard to jump into the shell and type all that out each time. Wouldn't it be better if we could just type something like python manage.py createposts --unpublished 1 --published 5. Well we can and I'd also show you how in a future post.

Go here to see all the changes along with the tests it wrote to ensure it all works as planned.

See you next time and remember to signup for my newsletter to be the first to be notified of new posts I publish.

P.S. Have you tried doing this before? How would you do it? Let me know in the comments below.