MetaData for secondary indexes in sharded cluster

I am on chapter 3 of M103 course about Sharding. From my understanding the metadata for the shardkey is maintained in the config server. My question is about the secondary indexes on a sharded collection. Where is the metadata for those indexes maintained in a sharded cluster? is it maintained in the config server or the individual shards?
If it is maintained in the individual shards, does that mean when I lookup for data using a secondary index, it would have to lookup in each shard? How much can that affect a query?

1 Like

Hi!

What do you mean by secondary index? Compound indexes like {a:1, b:-1} ?

From my understanding though, if the query does not contain the shard key, or it’s only the subkey of a compound shard key, mongos broadcasts the query, whatever the rest of the fields are. See sharding broadcasting here.

If it does contain the shard key, it only sends the query to those nodes. It holds no data about other indexes. These are held on each collection in the shards.

Hey Santiago,

Thanks for the clarification. As I finished the course I got a better understanding. I’d like to test out the impacts for scatter gather queries on a large sharded database. Do you have any experience with working with larger datasets on mongoDB, if so can you please share your experience on the performance impact of a scatter gather query.

Hi,

Not really, for such questions I’d recommend the developers forum..

When the sharded key points to one shard though, queries are pretty fast. This is specially true if the query is a covered query.

If both sharded key and an index are specified, but it has to search data in 2 different shards, data has to be merged. This takes some time.


PS: m121 has a good analysis of situations at the end of the last chapter, I believe.

1 Like