Why I hate ModelForm & ModelSerializer

I’ve been having a lot of conversations recently about Django ModelForm and Django Rest Framework (DRF) ModelSerializer, and to a lesser extent, DRF ModelViewSet. I think, perhaps maybe a little too zealously, that they are bad patterns and have no place in any reasonably large software project.

Now, I absolutely understand the appeal of them. It makes it much easier to write code quickly, and lots of times, our views are pretty tightly coupled to individual models in our database. If you’re in the know, you see one of these and think “wow, that’s so efficient”. You’re like “Yeah I can crank out this feature in like 30 minutes because it’s just a model and a form, easy peasy.” Long live developer productivity.

But if you’re not in the know, then you see one of these and think “How in the hell does this work?” and then you end up spending a bunch of time reading various docs, inspecting the parent classes to find out where the ✨ magic ✨ happens. Eventually, you’ll probably figure it out long enough to understand how to change the thing you need to change, and then you’ll be on your way. Don’t worry, you won’t forget how any of the magic happens by the next time you need to modify this code, right? RIGHT?

But the Zen of Python is that explicit is better than implicit, and I think these technically-explicit, tightly coupled tools introduce more problems in your code base than they are worth, and it’s easier to not use them at all than it is to make sure everyone is using them safely.

Footgun No. 1: fields vs exclude

In both ModelForm and ModelSerializer, you are given the option to define fields or exclude. If you define fields, it’s an explicit list of the model fields that you want to expose in the form/serializer. This isn’t where the problem lies. On the flip side, if you define exclude, it’s an explicit list of model fields that you don’t want included, everything else gets included by default. But, perhaps most important, ModelForm doesn’t enforce that one of these values be set – the default behavior is to expose the entire model. ModelSerializer, at least, has made a change to make fields or exclude mandatory, they removed the implicit default of all fields on the model.

In the short term, you write a new model and you write a form to submit or edit data on that model. You don’t declare all the fields, because you know the fields on the model and they are exactly what has to be returned to the browser. You get the feature out the door and move on. Or maybe you think you’re getting ahead of the problem by adding exclude = ['id'] because you know you don’t want to ship the internal database ID to the browser.

But the problem happens later, maybe months or years later, when another developer is tasked with modifying the model to include more information. Information that is meant for internal use. They update the model and hook up this new information to whatever they are using. But unless they explicitly know about the patterns in use and the complete context of everything using that model, they aren’t likely to realize that their new fields are automatically, implicitly included. They don’t know they need to make sure to exclude it from every ModelForm or ModelSerializer that might reference this model.

Example

Let’s say we have a User model in our application, it represents a user that signs in to our application. An overly simplified, extremely insecure version of this might look like this:

class User(models.Model):
    email = models.CharField(max_length=255)
    password = models.CharField(max_length=255)

Now you need to make your sign up page, so you reach for the Django-provided mechanism to do so. The ModelForm. You’ll even make a separate form to allow the user to change their email address.

class UserSignupForm(forms.ModelForm):
    class Meta:
        model = User

class UserChangeEmailForm(forms.ModelForm):
	class Meta:
		model = User
		exclude = ["password"] # it's not secure to show people their password when they want to change their email

Two years later, you get harassed by your security-conscious users for not offering two-factor authentication. Your business is booming and you’ve moved on, but your new developer is going to add 2FA support for your customers.

class User(models.Model):
    email = models.CharField(max_length=255)
    password = models.CharField(max_length=255)
    totp_seed = models.CharField(max_length=255)

This simple change to just the Model file has unexpected repercussions elsewhere in the system, because the UserChangeEmailForm will now automatically include the totp_seed value when it is rendered.

This problem is explicitly called out in the Django documentation, “Selecting the fields to use”. It “strongly recommends” explicitly setting all fields using the fields attribute, and it acknowledges there are potential security problems with not doing so. But even then, this same section of documentation goes on to tell you about a special value made specifically to include all fields, just in case you want explicit.

Footgun No. 2: all

In case you’re told that you need to include fields in your ModelForm or ModelSerializer, but you insist that you want the whole model included in the form, Django has you covered. Simply use fields = "__all__". Now your form will automatically include the whole model. Easy to write, easy to ship your feature today.

This footgun doesn’t need much more coverage. It’s bad for the same reason not including a fields value is bad, and bad for the same reason only using exclude is bad.

Footgun No. 3: depth

This footgun is unique to DRF’s ModelSerializer. No good deed goes unpunished, I suppose. They removed the implicit behavior of not defining fields, but they also offer a way to automatically serialize relationships. Again, a very handy feature that enables developers to quickly write functional code. But consider the following scenario:

We’re working on a new book review feature, which allows people to submit reviews for books, and then displays information about the reviews and the book. A simplified example of this follows.

class Book(models.Model):
    author_name = models.CharField(max_length=255)
    title = models.CharField(max_length=255)

class BookReview(models.Model):
    review_body = models.TextField()
    book = models.ForeignKey(Book, on_delete=models.CASCADE)

class BookReviewSerializer(serializers.ModelSerializer):
    class Meta:
        model = BookReview
        fields = ['id', 'review_body', 'book']
        depth = 1

This works great. As our site gains popularity, authors are reaching out to us and they want to get reviews automatically emailed to them in a daily digest. Thinking of the ways we can monetize this in the future, we rush to enable the feature.

class Book(models.Model):
    author_name = models.CharField(max_length=255)
    title = models.CharField(max_length=256)
    author_email = models.CharField(max_length=256) # For internal use only

class BookReview(models.Model):
    review_body = models.TextField()
    book = models.ForeignKey(Book)

class BookReviewSerializer(serializers.ModelSerializer):
    class Meta:
        model = BookReview
        fields = ['id', 'review_body', 'book']
        depth = 1

Sweet, we’ve added the author’s email to the Book field, and each day we will find all the book reviews posted that day for each book and then email the book’s author with that information.

But wait, simply adding the author_email to the Book model has implicitly caused us to start shipping the author’s email address to everyone who views a book review. We didn’t even change the serializer! And why would we? We weren’t making any change to book reviews at all. We even used the fields to specifically make sure we wouldn’t have a problem with our serializer later.

No bueno. Authors are now getting bombarded with angry emails about when the dragons are going to arrive, and we’re getting bombarded with angry emails from authors, asking why we started sharing their email address with the entire internet.

Models should always be internal.

These tools, ModelForm and ModelSerializer are designed to make developer’s lives easier by speeding up our ability to write code. They address a practical reality of a lot of web applications, which is that most problems can be solved with a model and a form. But they create this inherent coupling that treats database models like they are external objects, able to be directly edited and expected to be directly exposed. While that may be true when the model is created, if you’re in it for the long haul, it will inevitably become untrue.

In pursuit of safer guard rails, paved roads, fewer hazardous materials, I think it is crucial to decouple the database model from the serialization layers that ship data to the browser. It should be the default for engineers to be explicit about the values they are serializing, and as security professionals, we should make it hard to be anything less than explicit.

Paving the Road

The best way to approach this problem, in my opinion, is through the introduction of pre-commit linting rules (and enforcing these rules in your CI prior to merge). There exists a project already, flake8-django, that has rules like DJ06 “Do not use exclude with ModelForm, use fields instead”, and DJ07 “Do not use __all__ with ModelForm, use fields instead”. This is a great start. It could reasonably be expanded to add rules for not allowing depth on ModelSerializer, as well as not allowing exclude or __all__ on ModelSerializer.

My zealous approach is to consider banning the use of ModelForm and ModelSerializer altogether. It increases the code that developers have to write up front to get their feature working, but this level of explicit code is our best defense against future accidents. The behavior of each serializer relies on that which is defined in the serializer, and isn’t going to (or is much less likely to) change due to an unrelated change elsewhere in the system. We can make use of flake8-tidy-imports banned-modules if we’d like to ban the use of these classes, and provide a helpful message about why the use of the module is banned.

One thing that most certainly will not solve the problem, however, is banning these classes in your security policy. Address the problem in as tight a feedback loop as possible, and prevent the introduction of new problems.

If you are interested in curtailing these problems in your organization, or want help getting these types of guard rails and paved paths set up, please reach out to discuss a consultation.