Best Practices: Normalize while designing, denormalize while optimizing
Let's say we have the first data structure like this.
| Name | Origin | Power | Latitude | Longitude | Country | Time |
|---|---|---|---|---|---|---|
| Blitz | Alien | Freeze | 10 | -15 | USA | 2023/04/29 |
| Blitz | Alien | Flight | 10 | -15 | USA | 2021/01/02 |
| Hexa | Scientist | Telekinesis | 35 | 139 | Japan | 2010/09/01 |
| Hexa | Scientist | Telekinesis | 35 | 139 | Japan | 2012/01/08 |
| Traveller | Billionaire | Time Travel | 43 | 1 | France | 2010/11/09 |
Since Name and Origin are distinct, they can be taken out as a seperate table.
Origin Table
| Name | Origin |
|---|---|
| Blitz | Alien |
| Hexa | Scientist |
| Traveller | Billionaire |
Powers Table
| Name | Power | Latitude | Longitude | Country | Time |
|---|---|---|---|---|---|
| Blitz | Freeze | 10 | -15 | USA | 2023/04/29 |
| Blitz | Flight | 10 | -15 | USA | 2021/01/02 |
| Hexa | Telekinesis | 35 | 139 | Japan | 2010/09/01 |
| Hexa | Telekinesis | 35 | 139 | Japan | 2012/01/08 |
| Traveller | Time Travel | 43 | 1 | France | 2010/11/09 |
Also, notice that Latitude and Longitude are dependant on the country.
So we can create a
Locations table
| Location_id | Latitude | Longitude | Country |
|---|---|---|---|
| 1 | 10 | -15 | USA |
| 2 | 35 | 139 | Japan |
| 3 | 43 | 1 | France |
Powers table can be simplified as
| user_id | power | location_id | time |
|---|---|---|---|
| 2 | Freeze | 1 | 2023/04/29 |
| 2 | Flight | 1 | 2021/01/02 |
| 4 | Telekinesis | 2 | 2010/09/01 |
| 4 | Telekinesis | 2 | 2012/01/08 |
| 7 | Time Travel | 3 | 2010/11/09 |
Noramalized tables are easier to see as a user of the database and it becomes easier to cherry-pick the desired data.
However, if I need the first nomal form, it requires a lot of joins, which slows the performance.
In Django, we can define it was unique_together in the Meta class of the model.
class Locations(models.Model):
...
class Meta:
unique_together = ("latitude", "longitude")
When designing a data schema, we often face duplicate fields, violating the DRY principle.
Django provides an Abstract Base Classes.
All we need is an abstract = True statement in class Meta.
A visual exmaple will be easy to understand.
class ContactInfo(models.Model):
name = models.CharField(max_length=30)
email = models.EmailField(max_length=30)
address = models.CharField(max_length=30)
class Meta:
abstract = True
class Customer(ContactInfo):
phone = models.CharField(max_length=30)
class Staff(ContactInfo):
position = models.CharField(max_length=30)
Staff and Customer both inherit from ContactInfo.
ContactInfo is the abstract class, which is not created as a new table in the database, but Customer and Staff tables will have fields such as, name, email and address.
Django models can become very chuncky. It is ideal to refactor methods that do not interact directly with the database.
Let's say we have a model that checks if the user is part of another service.
models.py
class Profile(models.Model):
...
def is_registered(self):
url = "http://api.regcheck.com/?q={0}".format(
self.user.username)
return webclient.get(url)
This can be refactored to
models.py
from .services import RegisterAPI
def is_registered(self):
return RegisterAPI.is_registered(self.user.username)
services.py
API_URL = "http://api.regcheck.com/?q={0}"
class RegisterAPI:
...
def is_registered(self, username):
url =API_URL.format(username)
return webclient.get(url)
Don't store unnecessary data.
This is an obvious advice but during developement, we often find ourselves, making columns for every field we need.
To avoid this, use @property
In the polls tutorial, we've used
class Question(models.Model):
...
def was_published_recently(self):
return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
If this attribute does not have to be stored in db, we can simply add @property
class Question(models.Model):
...
@property
def was_published_recently(self):
return self.pub_date >= timezone.now() - datetime.timedelta(days=1)
In Question table, was_published_recently will not be stored, but accessing Question.was_published_recently() will return the same expected value.
If the property itself is computationally expensive, use @cached_property.
This value will be cached in Python as an instance memory.
The implementation method is identical with @property.
Access the data using only one dot.
Rather than calling Question.pub_date.year, define a property of Question.pub_year.
This adds security benefits of hiding the datastructure of your models.
The post was written based on
Django Design Patterns and Best Practices: Easily Build Maintainable Websites with Powerful and Relevant Django Design Patterns - Auran Ravidran
Recommend checking this book for more detailed explanation.