Background

This blog is inspired by the article "Improve Serialization Performance in Django Rest Framework". The author compared various serializer solutions in the Django Rest Framework, such as simple functions and regular serializers. Unfortunately, after reading this article, I noted that the development packages used are somewhat outdated (versions: they use Python 3.7, Django 2.1.1, and Django Rest Framework 3.9.4.). Consequently, I have decided to recreate these environments using the latest versions of the packages to enhance their credibility.

Introduction

Before delving into the detailed experiments, I will list the methods I plan to compare, along with the versions of packages in my development environment:  

Serializer solutions

  • Data Class
  • Regular Serializer
  • Model Serializer
  • Simple function
  • Pydantic

In recent years, Pydantic has emerged as the most widely used data validation library for Python, which is why I've included it in my comparison list.

Versions in My Local Environment

  • Python 3.10
  • Django 5.0.6
  • Django Rest Framework 3.15.1
  • Pydantic 2.7.3

Environment Setup

I created the two models called Product and Order within a new Django project for these experiments, and there is a relation between the Product and Order which can make the experiments more relevant to daily operations.

models.py
from django.db import models


class Product(models.Model):
    id = models.AutoField(primary_key=True)
    name = models.CharField(max_length=100)
    price = models.DecimalField(max_digits=10, decimal_places=2)
    stock = models.IntegerField()
    remark = models.TextField()
    created_at = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return self.name

class Order(models.Model):
    id = models.AutoField(primary_key=True)
    product = models.ForeignKey(Product, on_delete=models.CASCADE)
    quantity = models.IntegerField()
    order_date = models.DateTimeField(auto_now_add=True)

    def __str__(self):
        return f"Order {self.id} for {self.product.name}"

And I also create a Django command for populating data.

populate_data.py
import random
from django.utils import timezone
from django.core.management.base import BaseCommand
from myapp.models import Product, Order

class Command(BaseCommand):
    help = 'Populate the database with test data'

    def handle(self, *args, **kwargs):
        Product.objects.all().delete()
        Order.objects.all().delete()
        
        # Create products
        products = []
        for i in range(1000):
            product = Product(
                name=f'Product {i}',
                price=random.uniform(10.0, 100.0),
                stock=random.randint(1, 100),
                remark='Remark for product',
                created_at=timezone.now()
            )
            products.append(product)
        
        # Bulk create products with a batch size of 500
        Product.objects.bulk_create(products, batch_size=500)

        # Fetch all products to get their IDs
        all_products = list(Product.objects.all())
        
        # Create orders
        orders = []
        for i in range(10000):
            order = Order(
                
                product=random.choice(all_products),
                quantity=random.randint(1, 10),
                order_date=timezone.now()
            )
            orders.append(order)
        
        # Bulk create orders with a batch size of 500
        Order.objects.bulk_create(orders, batch_size=500)
        
        self.stdout.write(self.style.SUCCESS('Successfully populated the database with test data'))

Next, we need to run the following commands to complete data preparation.

$ python manage.py makemigrations
$ python manage.py migrate
$ python manage.py populate_data

Experiment

As previously mentioned, we have various methods for comparison. In this section, we will implement the necessary serializer solutions.

Data Class

from dataclasses import dataclass
from datetime import datetime
from typing import Optional

@dataclass
class ProductData:
    id: int
    name: str
    price: float
    stock: int
    remark: str
    created_at: datetime

@dataclass
class OrderData:
    id: int
    product: ProductData
    quantity: int
    order_date: datetime

Regular Serializer

from rest_framework import serializers

class ProductSerializer(serializers.Serializer):
    id = serializers.IntegerField()
    name = serializers.CharField(max_length=100)
    price = serializers.DecimalField(max_digits=10, decimal_places=2)
    stock = serializers.IntegerField()
    remark = serializers.CharField()
    created_at = serializers.DateTimeField()
    
class OrderSerializer(serializers.Serializer):
    id = serializers.IntegerField()
    product = ProductSerializer()
    quantity = serializers.IntegerField()
    order_date = serializers.DateTimeField()

Model Serializer

from rest_framework import serializers
from .models import Product, Order

class ProductModelSerializer(serializers.ModelSerializer):
    class Meta:
        model = Product
        fields = ['id', 'name', 'price', 'stock', 'remark', 'created_at']

class OrderModelSerializer(serializers.ModelSerializer):
    product = ProductModelSerializer()

    class Meta:
        model = Order
        fields = ['id', 'product', 'quantity', 'order_date']

Simple Function

from typing import Dict, Any

def serialize_product(product: Product) -> Dict[str, Any]:
    return {
        'id': product.id,
        'name': product.name,
        'price': float(product.price),
        'stock': product.stock,
        'remark': product.remark,
        'created_at': product.created_at.isoformat() if product.created_at else None,
    }

def serialize_order(order: Order) -> Dict[str, Any]:
    return {
        'id': order.id,
        'product': serialize_product(order.product),
        'quantity': order.quantity,
        'order_date': order.order_date.isoformat() if order.order_date else None,
    }

Pydantic

from pydantic import BaseModel
from typing import List
from datetime import datetime

class ProductDataPydantic(BaseModel):
    id: int
    name: str
    price: float
    stock: int
    remark: str
    created_at: datetime

class OrderDataPydantic(BaseModel):
    id: int
    product: ProductDataPydantic
    quantity: int
    order_date: datetime

Experiment Results

For this experiment I prepare a Django command and use the  line_profiler to help me measure the performance for different methods. As you can see below I have the five functions corresponding to different methods for performance evaluation.

profile_serialization.py
from django.core.management.base import BaseCommand
from django.db import connection
from myapp.models import Product, Order
from myapp.dataclasses import ProductData, OrderData
from myapp.serializers import ProductSerializer, ProductModelSerializer, OrderSerializer, OrderModelSerializer, serialize_product, serialize_order
from myapp.pydantic_models import ProductDataPydantic, OrderDataPydantic

class Command(BaseCommand):
    help = 'Profile serialization performance'

    def handle(self, *args, **kwargs):
        # Read data from the database with select_related
        orders = list(Order.objects.select_related('product').all())
        print("length of orders:", len(orders))

        # Dataclass serialization
        self.profile_dataclass_serialization(orders)
        
        # Django Serializer
        self.profile_django_serializer(orders)
        
        # Django ModelSerializer
        self.profile_django_model_serializer(orders)
        
        # Simple Function-based Serialization
        self.profile_simple_function(orders)

        # Pydantic Serialization
        self.profile_pydantic_serialization(orders)

    @profile
    def profile_dataclass_serialization(self, orders):
        order_data = [
            OrderData(
                id=order.id,
                product=ProductData(
                    id=order.product.id,
                    name=order.product.name,
                    price=float(order.product.price),
                    stock=order.product.stock,
                    remark=order.product.remark,
                    created_at=order.product.created_at
                ),
                quantity=order.quantity,
                order_date=order.order_date
            ) for order in orders
        ]

    @profile
    def profile_django_serializer(self, orders):
        order_serializer = OrderSerializer(orders, many=True)
        order_data = [order for order in order_serializer.data]

    @profile
    def profile_django_model_serializer(self, orders):
        order_model_serializer = OrderModelSerializer(orders, many=True)
        order_data = [order for order in order_model_serializer.data]

    @profile
    def profile_simple_function(self, orders):
        simple_serialized_orders = [serialize_order(order) for order in orders]

    @profile
    def profile_pydantic_serialization(self, orders):
        order_data = [
            OrderDataPydantic(
                id=order.id,
                product=ProductDataPydantic(
                    id=order.product.id,
                    name=order.product.name,
                    price=float(order.product.price),
                    stock=order.product.stock,
                    remark=order.product.remark,
                    created_at=order.product.created_at
                ),
                quantity=order.quantity,
                order_date=order.order_date
            ) for order in orders
        ]

You can run the following command to get the final results, and the table shows the results of different methods.

$ kernprof -l -v manage.py profile_serialization
Method Result (seconds) Data Validation
Dataclass serialization 0.05273 s N
Django Serializer 0.443061 s Y
Django ModelSerializer 0.423694 s Y
Simple Function-based Serialization 0.032002 s N
Pydantic Serialization 0.086857 s Y

It's not surprising that simple function-based serialization performed the best in this comparison due to its simplicity and lack of data validation capabilities. However, I was somewhat surprised to find that the performance of Django's impression that ModelSerializer would perform worse, but the test showed nearly the same performance for both (Serializer and ModelSerializer). This is intriguing, and we will explore this scenario in the next section.

Upon reviewing the results table again, the most appealing solution to me is Pydantic Serialization; it performs the best among all serialization methods with data validation functionality. Remarkably, there is not much difference in performance between Pydantic and serializations without data validation. This is why Pydantic has recently become the most popular serialization/data validation tool in Python.

Further Discussion

In this section, we will discuss the performance results of the Serializer and ModelSerializer. Upon investigation, both have similar implementation approaches. There are two parts to the implementation:

  • Field Handling: Both Serializer and ModelSerializer handle fields in a similar manner once they are defined. For read operations, both serializers iterate over the fields and generate the output dictionary.
  • Field Definitions: While Serializer requires explicit field definitions, ModelSerializer introspects the model and automatically creates the fields

I believe Django and DRF have addressed performance issues with ModelSerializer, which is reflected in today's experiment results. ModelSerializer has a slight overhead during the initialization phase due to model introspection. However, this overhead is negligible during read operations because it happens only once, resulting in performance that is almost identical to that of Serializer.

Conclusion

I took some time to complete this experiment, but I acknowledge there may be some omissions or deficiencies. I welcome your corrections. At the end of this post, I believe there is a key takeaway for you.

💡
Stop using Django Serializer solutions including both regular serializer and Model Serializer. Pydantic would be a better choice for implementing serialization functionality.

Reference