Normalization is good design

Photo by CHIA-HAO HSU on Unsplash
One of my favorite presentations at the SLC SQL Saturday 2025 event on May 23rd was by McKay Salisbury about database normalization. A big hats-off and thank-you to McKay π his fantastic presentation jumpstarted my interest in something I haven't given much thought to since college.
Here are some of my notes and takeaways βοΈ
Normalization is good design, and good design is beneficial
This excellent article about entropy includes a quote which I think can be really helpful to start with:
The ... ultimate purpose of life, mind, and human striving: to deploy energy and information to fight back the tide of entropy and carve out refuges of beneficial order.
β Steven Pinker
This switched on a lightbulb in my head about the purpose and identity of order: it has a benefit. ['a', 'b', 'c', 'd', 'e']
is meaningfully more useful than the same letters randomized because, if I know which system of order has been applied, I can find any letter in the list instantly (without even needing to look at the list!). Order makes tasks simpler. It's easy to take this order for granted, until you have to perform tasks on a database table with 150+ columns π°
πΈ The Instagram login page is actually really impressive.
Speaking of taking order for granted, have you ever considered how Instagram is able to log in to a single user account out of roughly 2 billion total users in under 10 seconds? I certainly didn't, until I learned about the mechanics of the b-tree index. Now I can't stop being amazed by it.
Good design is about creating systems of order to make comprehension and work easier. Examples include:
- Highway design
- Interior design
- Industrial design
- User interface design
These design systems are built on rules, establishing a shared understanding to facilitate tasks.
- Traffic lights effectively prevent collisions between 2-ton vehicles and orchestrate traffic flow using simple symbols and colors.
- Interior design balances concerns such as square footage, walkways, lighting, and more, in order to create a space tailored to a specific purpose.
- User interface design can be messy, but has largely coalesced around rules of its own. In fact, this has happened so much that πΎ and π carry intrinsic value (even newcomer β¨ with an interesting story).
Database normalization is a formalized approach to good database design. In a normalized database, it's easier to:
- Restructure the database in order to store additional data.
- Understand a database's purpose.
- Query for data.
- Generally expect good performance for most queries.
- Visualize data.
π₯ McKay's Hot Takes:
Everyone should normalize as much as reasonably possible, and denormalization is only rarely worthwhile.
Normalization and performance
It's tempting to view database normalization purely through the lens of performance. But performance problems are just a symptom of bad database design. They can take years to manifest, even if the design has been sub-optimal all along. "If it ain't broke, don't fix it" β which ultimately means "it ain't fixed until it's broke".
When the system eventually scales enough to encounter slow reads and writes, engineers take a closer look at the database structure in search of performance improvements. But while normalization does bring some performance benefits, it generally shouldn't be considered a performance solution by itself. For instance, database normalization isn't concerned at all with indexes: one of the most important performance tools (see use-the-index-luke.com β second reference so far!).
Rather than being measured in query execution milliseconds or database CPU compute, the benefits of database normalization are realized in the speed at which software engineers can make changes and the quality of the development experience. This in turn has higher-order effects across virtually all metrics: time-to-market, cycle/lead time, incident resolution, you name it. These benefits are a direct result of building a database using good design principles.
β‘ TL;DR on normalization and performance
Database normalization won't necessarily make your database faster, but it will make it easier to make the changes that will make it faster.
The table is not the data
Another "aha!" moment from McKay's presentation came with this insight:
A table is a visualization of data, not the data itself.
This requires a more theoretical approach to thinking about data, which took a moment for my brain to comprehend. McKay also reminded me that database normalization (and database design in general) has always been about modeling the real world. At best, we are working with a representation which only approximates reality (the actual "data").
Let's look at an example - here's an order spreadsheet for a new computer accessories brand:
order_id | customer_id | customer_name | product_id | product_name | state_id | state_name |
---|---|---|---|---|---|---|
001 | 125 | James | 7293 | Screen Protector | 44 | Utah |
001 | 125 | James | 9844 | Phone Case | 44 | Utah |
002 | 126 | Emily | 1657 | Laptop Charge Cord | 22 | Michigan |
003 | 127 | Alex | 8023 | Noise-Cancelling Headphones | 47 | Washington |
004 | 127 | Alex | 3332 | Mouse | 47 | Washington |
004 | 127 | Alex | 7590 | Mechanical Keyboard | 47 | Washington |
What state does the customer Alex live in? The data in the table has the answer listed in three places, but the data the table is visualizing (representing, modeling) really only has one answer: Washington.
Anybody who looks at this table will probably understand this intrinsically, which is part of the problem. We don't even notice that our brains are seamlessly translating between the table data and the underlying data, and before we know it we are optimizing for duplicate state values without stopping to question if there is a better way to model the data.
There is a better way! Let's take this original spreadsheet (which, lucky for us, is already in first normal form) and separate it out into second normal form by removing partial dependencies:
customers
table:
id | name | state_id | state_name |
---|---|---|---|
125 | James | 44 | Utah |
126 | Emily | 22 | Michigan |
127 | Alex | 47 | Washington |
products
table:
id | name |
---|---|
1657 | Laptop Charge Cord |
3332 | Mouse |
7293 | Screen Protector |
7590 | Mechanical Keyboard |
8023 | Noise-Cancelling Headphones |
9844 | Phone Case |
orders
table
id | customer_id |
---|---|
001 | 125 |
002 | 126 |
003 | 127 |
order_products
table
order_id | product_id |
---|---|
001 | 7293 |
001 | 9844 |
002 | 1657 |
003 | 8023 |
004 | 3332 |
004 | 7590 |
From here (in this example), it's not too much more work to get to third normal form. We just need to remove the transitive dependencies in the customers
table by splitting out state data into a separate table:
customers
table:
id | name | state_id |
---|---|---|
125 | James | 44 |
126 | Emily | 22 |
127 | Alex | 47 |
states
table:
id | name |
---|---|
44 | Utah |
22 | Michigan |
47 | Washington |
These changes affect how the data is modeled, but the underlying real-world data has remained the same. The difference is, we're now representing the data using formal good design principles.
Check out these articles for more information about the first three normal forms:
- First Normal Form (1NF) on GeeksforGeeks
- Second Normal Form (2NF) on GeeksforGeeks
- Third Normal Form (3NF) on GeeksforGeeks
It's really easy to overestimate your own level of skill
In college, one of my good friends was a math major. She would often tease me when I referred to myself as a software engineer: "Okay, but you're not a real engineer." To be honest, she had a point!
Other engineering disciplines (such as mechanical, civil, chemical) have tighter enforced constraints, success metrics, and design principles than software engineering currently does. Physics often determines whether or not a new design will work, which is closely tied to how successful the design is. Software engineering is newer, broader in some ways, and has more separation between the design decisions and the success of those designs. A poorly designed software product can be wildly successful, but one day the technical debt (accrued over years) may come due.
I'm not sure if we can ever expect this dynamic to change. Theoretically, it would be great to live in a world where every startup followed the best design principles from day one. But if that were a required software engineering standard, many tech companies probably would not have survived to the present day. It's part of why tech has been so successful β software is much more forgiving than skyscrapers or airplanes.
What we can do is maintain an attitude of learning about good design and encourage it wherever possible. Philip Greenspun sums up this sentiment in a quote I think about a lot:
That's one reason I don't miss IT, because programmers are very unlikable people... in aviation, for example, people who greatly overestimate their level of skill are all dead.
So don't be unlikable, and don't overestimate your skill! There is always more to learn about best practices and good design, and there are always more engineers who can benefit from sharing what you learn. While this also needs to be balanced by the other forces in play (such as the need to ship features quickly), building with a basic understanding of good design principles is much more effective than unintentionally re-treading the steps of bad design.