What I Learned About Secondary Index in NoSQL
Last month I read a book called Designing Data-Intensive Applications, 2nd Edition by Martin Kleppmann. I actually read the second edition this time. I had read the first edition before, but back then I did not have much work experience. Because of that, I did not really understand what I was reading. The ideas felt too far away from my daily work.
So this time I tried again with the second edition. And it was a very different feeling. Now I have more experience, and the book finally made sense to me. I also liked that the author updated this edition with the latest technologies. The examples felt closer to the tools we use today. They were right when they said this book is very good. It is full of useful ideas, and one small part really caught my attention.
That part was about secondary index in NoSQL databases.
At first I did not think much about it. But then something clicked in my head. At my work, we have a container in Cosmos DB. Its name starts with SecondaryIndex. I see this name almost every day. But here is the funny thing: I never really knew what it does. I just used it and moved on.
So the book pushed me to stop and ask myself a simple question. What is a secondary index, really? And why do we have one in my code base?
What is a secondary index?
Let me explain it in an easy way.
Imagine you have a big table of users. Each user has an id. When you want to find a user, you search by this id. This is the primary key. It is fast, because the database is built around it.
But what if you want to find users by their email instead of their id? The database does not know where to look. It must check every single row, one by one. When you have millions of users, this is very slow.
A secondary index solves this problem. It is like a second “map” for your data. This map is sorted by email. Now, when you search by email, the database can find the user quickly.
A simple example
Let’s say we store orders. The primary key is orderId.
orderId customer status
1001 Anna shipped
1002 Ben pending
1003 Anna pending
If I ask: “Give me order 1002”, that is easy and fast.
But if I ask: “Give me all orders from Anna”, the database must scan the whole table. With a secondary index on customer, it builds something like this:
customer orderId
Anna 1001, 1003
Ben 1002
Now finding all of Anna’s orders is fast. This is the simple idea behind a secondary index.
Back to my Cosmos DB container
After reading the book, I went back to look at our SecondaryIndex container. Now the name makes sense. It exists to help us search our data by a different field, not just the main key. Before, this name was just a word to me. Now I understand the reason behind it.
I plan to dig deeper into this container soon. I want to know which fields it indexes and why the team made this choice.
Reading books helps
This is a small story, but it taught me something bigger. Reading this book gave me real benefit at work. I read one chapter, and a few days later I understood a container I had ignored for months. That is a good trade for a few hours of reading.
Sometimes we use tools and names at work without asking “why”. A good book can wake up that curiosity again.
So my advice is simple: keep reading. You never know which page will help you understand your own work better. And if a book feels too hard now, that is okay. You can come back to it later, when you have more experience. It may make much more sense the second time.