Fetching index metadata

TopherGopher · March 3, 2020, 5:11am

I’m attempting to aggregate some stats in our mongo database using the go driver. Ultimately, I’m trying to find out:

the number of databases with a prefix of BLURG
the number of collections in the databases prefixed with BLURG
the number of indices in the collections in the DBs prefixed with BLURG
the size of the indices in the collections in the DBs prefixed with BLURG
(BLURG is a sample use-case with error handling removed - I’m interested in more prefixes in the real code)

	dbs := map[string](interface{}){}
	conn.GetDBWithoutIndexing("admin").RunCommand(context.Background(), bson.M{
		"listDatabases": 1,
	}).Decode(&dbs)

	for _, database := range dbs["databases"].(primitive.A) {
		d := database.(map[string]interface{})
		dbName= d["name"].(string)
		fmt.Println("Working on ", dbName)
		// Find out the number of collections
		cols, _ := conn.GetDBWithoutIndexing(dbName).ListCollectionNames(context.Background(), bson.M{})
		for _, col := range cols {
			cur, _ := conn.GetDBWithoutIndexing(dbName).C(col).Indexes().List(context.Background())
			indices := []mongo.IndexModel{}
			cur.All(context.Background(), &indices)
			for _, index := range indices {
				fmt.Printf("%#v\n", index.Keys)
			}
		}
		switch {
		case strings.HasPrefix(dbName, "BLURG_"):
			devDBs++
		default:
			nonEnvDBs++
		}

Now a couple issues with this code:

Indexes refuse to unpack into IndexModel from Indexes().List(). No errors, but the Keys interface is nil
I couldn’t figure out how to obtain the index size
Biggest issue, I will be making tens of thousands of calls to our production DB cluster Admittedly, I can target a slave, but still, I’m hoping that maybe someone can offer a better optimization.

Divjot_Arora · March 4, 2020, 3:38am

Hi Christopher,

The mongo.IndexModel type is meant to be used for creating indexes, not decoding into when fetching existing ones. I tried running IndexView.List and printing out cursor.Current on every iteration and it seems like the server returns documents in the form

{"v": {"$numberInt":"2"},"key": {"_id": {"$numberInt":"1"}},"name": "_id_"}

Given this, you could define a struct with Key and Name fields and use that for decoding. We have an open GODRIVER ticket to add a type for decoding the results from Database.ListCollections (https://jira.mongodb.org/browse/GODRIVER-903) and could potentially repurpose this ticket to include a type for indexes as well. I’m currently talking to the team about whether or not we think this is a good idea to have in the driver and can get back to you on this tomorrow.

TopherGopher · March 4, 2020, 6:06pm

Alright - with the output you provided I was able to get a code snippet working, but man, it’s not pretty -

type KV struct {
	Key   map[string]interface{} `bson:"key"`
	Value interface{}            `bson:"v"`
	Name  string                 `bson:"name"`
}
	dbs := map[string](interface{}){}
	err := conn.GetDBWithoutIndexing("admin").RunCommand(context.Background(), bson.M{
		"listDatabases": 1,
	}).Decode(&dbs)
	if err != nil {
		panic(fmt.Errorf("Could not run command: %#v", err))
	}

	var dbName string
	var found bool
	for _, database := range dbs["databases"].(primitive.A) {
		d := database.(map[string]interface{})
		if dbName, found = d["name"].(string); !found {
			noNameDBs++
			continue
		}
		fmt.Println("Working on ", dbName)
		// Find out the number of indexes
		cols, err := conn.GetDBWithoutIndexing(dbName).ListCollectionNames(context.Background(), bson.M{})
		if err != nil {
			panic(fmt.Errorf("Could not run command: %#v", err))
		}
		for _, col := range cols {
			cur, err := conn.GetDBWithoutIndexing(dbName).C(col).Indexes().List(context.Background())
			if err != nil {
				panic(fmt.Errorf("could not get the indices for %s.%s: %v", dbName, col, err))
			}
			indices := []KV{}

			err = cur.All(context.Background(), &indices)
			if len(indices) == 0 {
			}
			if err != nil {
				panic(fmt.Errorf("could not get the indices for %s.%s: %v", dbName, col, err))
			}
			for _, compoundIndex := range indices {
				for index := range compoundIndex.Key {
					fmt.Printf("%s\n", index)
				}
			}
		}

So essentially, I call ListDatabases using an admin command, parse that raw output, call ListCollectionNames, then for each collection, call Indexes().List(). Whew!
It would be so nice if I could do this all like this:

for _, db := range conn.ListDatabases() {
  for _, coll := range db.ListCollections() {
    for _, index := range coll.ListIndexes() {
        fmt.Println(index.GetName(), index.GetKeys())
        index.Drop()
    }
  }
}

For listing database metadata, I don’t need access to the raw cursor - it’s not enough data slices to worry about in memory.

Divjot_Arora · March 4, 2020, 6:24pm

We’re not in a position to change the API quite that drastically and maintain separate APIs for Collection.ListIndexes and IndexView.List. Also, by the nature of them being commands that do network round trips, all of those function calls need to return errors, making them not directly usable in a for loop.

For listing databases, there is already a Client.ListDatabases method which returns a mongo.ListDatabasesResult so you shouldn’t have to use RunCommand for this.

I do think having access to the raw cursor is useful in some circumstances. For example, if the server adds new fields, a user upgrade server versions without necessarily upgrading driver versions and have those new fields through the raw cursor results. However, I do think we can talk about adding something like IndexView.ListObjects which could return a slice of index specifications. From there, you could easily get the names and keys of the index and call IndexView.DropOne(index.Name).

TopherGopher · March 5, 2020, 3:21am

Ahhhh - that’s how you use ListDatabases - I couldn’t for the life of me figure out the type it should unpack - thank you
How would you feel about adding Client.ListDatabaseObjects()? Which would be an ease of use wrapper around ListDatabases() that would just return []mongo.ListDatabaseResult{}? For when we don’t need the fine-grained control that a cursor affords?
I 100% agree having the ability to get at the raw cursor can be super useful, but it’s rare that I really need to use it. When I have to choose between short readable code and something super clever that only saves a couple fractions of a millisecond, I usually go the short route to promote maintainable code. Personal preference though.

I could see us spawning go routines as the cursor returns or using REST-stream or something nifty like that for a really large result, but usually we’re unpacking the full result into a slice or map that has to be returned to another function using the default connection values.
I would LOVE IndexView.ListObjects(), that would be amazing.

Divjot_Arora · March 5, 2020, 2:47pm

I’m a little confused about your request for ListDatabaseObjects. Currently, ListDatabsaesResult is defined as mongo package - go.mongodb.org/mongo-driver/mongo - Go Packages and contains an array of database specifications. You should be able to iterate over result.DatabaseSpecifications and get the relevant information you need for each database. Am I missing something?

TopherGopher · March 5, 2020, 5:21pm

Oh. Darn it. I didn’t see it hanging off of the connection. I didn’t realize it was on the client, but that makes sense. I’ll change to using that for the top-level loop. I still think a ListCollectionObjects() though would be nice.