Finding and removing unused indexes is a pretty common technique to improve overall performance of relational databases. Less indexes means faster insert and updates but also less disk space used. The usual way to do it is to log all queries’ execution plans and then get a list of those indexes that are not used. Same theory applies to MongoDB and TokuMX so in this blog post I’m going to explain how to find those.
Profiling in MongoDB
To understand what profiling is you only need to think about MySQL’s slow query log, it is basically the same idea. It can be enabled with the following command:
db.setProfilingLevel(level, slowms)
There are three different levels:
0: No profiling enabled.
1: Only those queries slower than “slowms” are profiled.
2: All queries are profiled, similar to query_long_time=0.
Once it is enabled you can use db.system.profile.find().pretty() to read it. You would need to scan through all profiles and find those indexes that are never used. To make things easier there is a javascript program that will find the unused indexes after reading all the profile information. Unfortunately, it only works with mongodb 2.x.
The javascript is hosted in this github project https://github.com/wfreeman/indexalizer You just need to start mongo shell with indexStats.js and run db.indexStats() command. This is an sample output:
scanning profile {ns:"test.col"} with 2 records... this could take a while. { "query" : { "b" : 1 }, "count" : 1, "index" : "", "cursor" : "BtreeCursor b_1", "millis" : 0, "nscanned" : 1, "n" : 1, "scanAndOrder" : false } { "query" : { "b" : 2 }, "count" : 1, "index" : "", "cursor" : "BtreeCursor b_1", "millis" : 0, "nscanned" : 1, "n" : 1, "scanAndOrder" : false } checking for unused indexes in: col this index is not being used: "_id_" this index is not being used: "a_1"
So “a_1” is not used and could be dropped. We can ignore “_id_” because that one is needed
There is a problem with profiling. It will affect performance so you need to run it only for some hours and usually during low peak. That means that there is a possibility that not all possible queries from your application are going to be executed during that maintenance window. What alternative TokuMX provides?
Finding unused indexes in TokuMX
Good news for all of us. TokuMX doesn’t require you to enable profiling. Index usage statistics are stored as part of every query execution and you can access them with a simple db.collection.stats() command. Let me show you an example:
> db.col.stats() [...] { "name" : "a_1", "count" : 5, "size" : 140, "avgObjSize" : 28, "storageSize" : 16896, "pageSize" : 4194304, "readPageSize" : 65536, "fanout" : 16, "compression" : "zlib", "queries" : 0, "nscanned" : 0, "nscannedObjects" : 0, "inserts" : 0, "deletes" : 0 }, { "name" : "b_1", "count" : 5, "size" : 140, "avgObjSize" : 28, "storageSize" : 16896, "pageSize" : 4194304, "readPageSize" : 65536, "fanout" : 16, "compression" : "zlib", "queries" : 2, "nscanned" : 2, "nscannedObjects" : 2, "inserts" : 0, "deletes" : 0 } ], "ok" : 1 }
There are our statistics without profiling enabled. queries means the number of times that index has been used on a query execution. b_1 has been used twice and a_1 has never been used. You can use this small javascript code I’ve written to scan all collections inside the current database:
db.forEachCollectionName(function (cname) { output = db.runCommand({collstats : cname }); print("Checking " + output.ns + "...") output.indexDetails.forEach(function(findUnused) { if (findUnused.queries == 0) { print( "Unused index: " + findUnused.name ); }}) });
An example using the same data:
> db.forEachCollectionName(function (cname) { ... output = db.runCommand({collstats : cname }); ... print("Checking " + output.ns + "...") ... output.indexDetails.forEach(function(findUnused) { if (findUnused.queries == 0) { print( "Unused index: " + findUnused.name ); }}) ... ... }); Checking test.system.indexes... Checking test.col... Unused index: a_1
Conclusion
Finding unused indexes is a regular task that every DBA should do. In MongoDB you have to use profiling while in TokuMX nothing needs to be enabled because it will gather information by default without impacting service performance.
The post Find unused indexes on MongoDB and TokuMX appeared first on MySQL Performance Blog.