Normalization is good design

A traffic light, covered in recent snowfall.

One of my favorite presentations at the SLC SQL Saturday 2025 event on May 23rd was by McKay Salisbury about database normalization. A big hats-off and thank-you to McKay 👏 his fantastic presentation jumpstarted my interest in something I haven't given much thought to since college.

Here are some of my notes and takeaways ✍️

Normalization is good design, and good design is beneficial

This excellent article about entropy includes a quote which I think can be really helpful to start with:

The ... ultimate purpose of life, mind, and human striving: to deploy energy and information to fight back the tide of entropy and carve out refuges of beneficial order.

— Steven Pinker

This switched on a lightbulb in my head about the purpose and identity of order: it has a benefit. ['a', 'b', 'c', 'd', 'e'] is meaningfully more useful than the same letters randomized because, if I know which system of order has been applied, I can find any letter in the list instantly (without even needing to look at the list!). Order makes tasks simpler. It's easy to take this order for granted, until you have to perform tasks on a database table with 150+ columns 😰

📸 The Instagram login page is actually really impressive.
Speaking of taking order for granted, have you ever considered how Instagram is able to log in to a single user account out of roughly 2 billion total users in under 10 seconds? I certainly didn't, until I learned about the mechanics of the b-tree index. Now I can't stop being amazed by it.

Good design is about creating systems of order to make comprehension and work easier. Examples include:

Highway design
Interior design
Industrial design
User interface design

These design systems are built on rules, establishing a shared understanding to facilitate tasks.

Traffic lights effectively prevent collisions between 2-ton vehicles and orchestrate traffic flow using simple symbols and colors.
Interior design balances concerns such as square footage, walkways, lighting, and more, in order to create a space tailored to a specific purpose.
User interface design can be messy, but has largely coalesced around rules of its own. In fact, this has happened so much that 💾 and 🔕 carry intrinsic value (even newcomer ✨ with an interesting story).

Database normalization is a formalized approach to good database design. In a normalized database, it's easier to:

Restructure the database in order to store additional data.
Understand a database's purpose.
Query for data.
Generally expect good performance for most queries.
Visualize data.

🔥 McKay's Hot Takes:
Everyone should normalize as much as reasonably possible, and denormalization is only rarely worthwhile.

Normalization and performance

It's tempting to view database normalization purely through the lens of performance. But performance problems are just a symptom of bad database design. They can take years to manifest, even if the design has been sub-optimal all along. "If it ain't broke, don't fix it" — which ultimately means "it ain't fixed until it's broke".

When the system eventually scales enough to encounter slow reads and writes, engineers take a closer look at the database structure in search of performance improvements. But while normalization does bring some performance benefits, it generally shouldn't be considered a performance solution by itself. For instance, database normalization isn't concerned at all with indexes: one of the most important performance tools (see use-the-index-luke.com — second reference so far!).

Rather than being measured in query execution milliseconds or database CPU compute, the benefits of database normalization are realized in the speed at which software engineers can make changes and the quality of the development experience. This in turn has higher-order effects across virtually all metrics: time-to-market, cycle/lead time, incident resolution, you name it. These benefits are a direct result of building a database using good design principles.

⚡ TL;DR on normalization and performance
Database normalization won't necessarily make your database faster, but it will make it easier to make the changes that will make it faster.

The table is not the data

Another "aha!" moment from McKay's presentation came with this insight:

A table is a visualization of data, not the data itself.

This requires a more theoretical approach to thinking about data, which took a moment for my brain to comprehend. McKay also reminded me that database normalization (and database design in general) has always been about modeling the real world. At best, we are working with a representation which only approximates reality (the actual "data").

Let's look at an example - here's an order spreadsheet for a new computer accessories brand:

order_id	customer_id	customer_name	product_id	product_name	state_id	state_name
001	125	James	7293	Screen Protector	44	Utah
001	125	James	9844	Phone Case	44	Utah
002	126	Emily	1657	Laptop Charge Cord	22	Michigan
003	127	Alex	8023	Noise-Cancelling Headphones	47	Washington
004	127	Alex	3332	Mouse	47	Washington
004	127	Alex	7590	Mechanical Keyboard	47	Washington

What state does the customer Alex live in? The data in the table has the answer listed in three places, but the data the table is visualizing (representing, modeling) really only has one answer: Washington.

Anybody who looks at this table will probably understand this intrinsically, which is part of the problem. We don't even notice that our brains are seamlessly translating between the table data and the underlying data, and before we know it we are optimizing for duplicate state values without stopping to question if there is a better way to model the data.

There is a better way! Let's take this original spreadsheet (which, lucky for us, is already in first normal form) and separate it out into second normal form by removing partial dependencies:

customers table:

id	name	state_id	state_name
125	James	44	Utah
126	Emily	22	Michigan
127	Alex	47	Washington

products table:

id	name
1657	Laptop Charge Cord
3332	Mouse
7293	Screen Protector
7590	Mechanical Keyboard
8023	Noise-Cancelling Headphones
9844	Phone Case

orders table

id	customer_id
001	125
002	126
003	127

order_products table

order_id	product_id
001	7293
001	9844
002	1657
003	8023
004	3332
004	7590

From here (in this example), it's not too much more work to get to third normal form. We just need to remove the transitive dependencies in the customers table by splitting out state data into a separate table:

customers table:

id	name	state_id
125	James	44
126	Emily	22
127	Alex	47

states table:

id	name
44	Utah
22	Michigan
47	Washington

These changes affect how the data is modeled, but the underlying real-world data has remained the same. The difference is, we're now representing the data using formal good design principles.

Check out these articles for more information about the first three normal forms:

It's really easy to overestimate your own level of skill

In college, one of my good friends was a math major. She would often tease me when I referred to myself as a software engineer: "Okay, but you're not a real engineer." To be honest, she had a point!

Other engineering disciplines (such as mechanical, civil, chemical) have tighter enforced constraints, success metrics, and design principles than software engineering currently does. Physics often determines whether or not a new design will work, which is closely tied to how successful the design is. Software engineering is newer, broader in some ways, and has more separation between the design decisions and the success of those designs. A poorly designed software product can be wildly successful, but one day the technical debt (accrued over years) may come due.

I'm not sure if we can ever expect this dynamic to change. Theoretically, it would be great to live in a world where every startup followed the best design principles from day one. But if that were a required software engineering standard, many tech companies probably would not have survived to the present day. It's part of why tech has been so successful — software is much more forgiving than skyscrapers or airplanes.

What we can do is maintain an attitude of learning about good design and encourage it wherever possible. Philip Greenspun sums up this sentiment in a quote I think about a lot:

That's one reason I don't miss IT, because programmers are very unlikable people... in aviation, for example, people who greatly overestimate their level of skill are all dead.

So don't be unlikable, and don't overestimate your skill! There is always more to learn about best practices and good design, and there are always more engineers who can benefit from sharing what you learn. While this also needs to be balanced by the other forces in play (such as the need to ship features quickly), building with a basic understanding of good design principles is much more effective than unintentionally re-treading the steps of bad design.