Find unused indexes on MongoDB and TokuMX

Finding and removing unused indexes is a pretty common technique to improve overall performance of relational databases. Less indexes means faster insert and updates but also less disk space used. The usual way to do it is to log all queries’ execution plans and then get a list of those indexes that are not used. Same theory applies to MongoDB and TokuMX so in this blog post I’m going to explain how to find those.

Profiling in MongoDB

To understand what profiling is you only need to think about MySQL’s slow query log, it is basically the same idea. It can be enabled with the following command:

db.setProfilingLevel(level, slowms)

There are three different levels:

0: No profiling enabled.
1: Only those queries slower than “slowms” are profiled.
2: All queries are profiled, similar to query_long_time=0.

Once it is enabled you can use db.system.profile.find().pretty() to read it. You would need to scan through all profiles and find those indexes that are never used. To make things easier there is a javascript program that will find the unused indexes after reading all the profile information. Unfortunately, it only works with mongodb 2.x.

The javascript is hosted in this github project https://github.com/wfreeman/indexalizer You just need to start mongo shell with indexStats.js and run db.indexStats() command. This is an sample output:

scanning profile {ns:"test.col"} with 2 records... this could take a while.
{
	"query" : {
		"b" : 1
	},
	"count" : 1,
	"index" : "",
	"cursor" : "BtreeCursor b_1",
	"millis" : 0,
	"nscanned" : 1,
	"n" : 1,
	"scanAndOrder" : false
}
{
	"query" : {
		"b" : 2
	},
	"count" : 1,
	"index" : "",
	"cursor" : "BtreeCursor b_1",
	"millis" : 0,
	"nscanned" : 1,
	"n" : 1,
	"scanAndOrder" : false
}
checking for unused indexes in: col
this index is not being used:
"_id_"
this index is not being used:
"a_1"

 

So “a_1” is not used and could be dropped. We can ignore “_id_” because that one is needed :)

There is a problem with profiling. It will affect performance so you need to run it only for some hours and usually during low peak. That means that there is a possibility that not all possible queries from your application are going to be executed during that maintenance window. What alternative TokuMX provides?

Finding unused indexes in TokuMX

Good news for all of us. TokuMX doesn’t require you to enable profiling. Index usage statistics are stored as part of every query execution and you can access them with a simple db.collection.stats() command. Let me show you an example:

> db.col.stats()
[...]
{
"name" : "a_1",
"count" : 5,
"size" : 140,
"avgObjSize" : 28,
"storageSize" : 16896,
"pageSize" : 4194304,
"readPageSize" : 65536,
"fanout" : 16,
"compression" : "zlib",
"queries" : 0,
"nscanned" : 0,
"nscannedObjects" : 0,
"inserts" : 0,
"deletes" : 0
},
{
"name" : "b_1",
"count" : 5,
"size" : 140,
"avgObjSize" : 28,
"storageSize" : 16896,
"pageSize" : 4194304,
"readPageSize" : 65536,
"fanout" : 16,
"compression" : "zlib",
"queries" : 2,
"nscanned" : 2,
"nscannedObjects" : 2,
"inserts" : 0,
"deletes" : 0
}
],
"ok" : 1
}

 

There are our statistics without profiling enabled. queries means the number of times that index has been used on a query execution. b_1 has been used twice and a_1 has never been used. You can use this small javascript code I’ve written to scan all collections inside the current database:

db.forEachCollectionName(function (cname) {
	output = db.runCommand({collstats : cname });
	print("Checking " + output.ns + "...")
	output.indexDetails.forEach(function(findUnused) { if (findUnused.queries == 0) { print( "Unused index: " + findUnused.name ); }})
});

 

An example using the same data:

> db.forEachCollectionName(function (cname) {
... output = db.runCommand({collstats : cname });
... print("Checking " + output.ns + "...")
... output.indexDetails.forEach(function(findUnused) { if (findUnused.queries == 0) { print( "Unused index: " + findUnused.name ); }})
...
... });
Checking test.system.indexes...
Checking test.col...
Unused index: a_1

 

Conclusion

Finding unused indexes is a regular task that every DBA should do. In MongoDB you have to use profiling while in TokuMX nothing needs to be enabled because it will gather information by default without impacting service performance.

The post Find unused indexes on MongoDB and TokuMX appeared first on MySQL Performance Blog.

Read more at: http://www.mysqlperformanceblog.com/

Getting EXPLAIN information from already running queries in MySQL 5.7

When a new version of MySQL is about to be released we read a lot of blog posts about the performance and scalability improvements. That’s good but sometimes we miss some small features that can help us a lot in our day-to-day tasks. One good example is the blog post that Aurimas wrote about a new small feature in MySQL 5.6 that I didn’t know about until I read it: the Automatic InnoDB transaction log file size change. How cool is that?

I plan to write a series of blog posts that will show some of those small new features in MySQL 5.7 that are going to be really useful. I’m going to start with EXPLAIN FOR CONNECTION.

This feature allows us to run an EXPLAIN for an already running statement. Let’s say that you find a query that has been running for a long time and you want to check why that could be happening. In 5.7 you can just ask MySQL to EXPLAIN the query that a particular connection is running and get the execution path. You can use it if the query is a SELECT, DELETE, INSERT, REPLACE or UPDATE. Won’t work if the query is a prepared statement though.

Let me show you an example of how it works.

We have a long running join.

mysql [localhost] {msandbox} ((none)) > show processlist G
*************************** 1. row ***************************
     Id: 9
   User: msandbox
   Host: localhost
     db: employees
Command: Query
   Time: 49
  State: Sending data
   Info: select count(*) from employees, salaries where employees.emp_no = salaries.emp_no

Let’s see the execution plan for the query:

mysql [localhost] {msandbox} ((none)) > explain for connection 9 G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: employees
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 299540
     filtered: 100.00
        Extra: NULL
*************************** 2. row ***************************
           id: 1
  select_type: SIMPLE
        table: salaries
   partitions: NULL
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 2803840
     filtered: 100.00
        Extra: Using where; Using join buffer (Block Nested Loop)

The join between those tables is not using any index at all so there is some room for improvement here :)

Conclusion

You can use this feature to see why a query is running for too long and based on the info decide how to fix it and how to proceed. This is going to be a very useful feature for DBAs who want to diagnose performance problems and slow queries.

The post Getting EXPLAIN information from already running queries in MySQL 5.7 appeared first on MySQL Performance Blog.

Read more at: http://www.mysqlperformanceblog.com/

How to add an existing Percona XtraDB Cluster to Percona ClusterControl

In my last blog post I explained how to use Percona ClusterControl to create a new Percona XtraDB Cluster from scratch. That’s a good option when you want to create a testing environment in just some mouse clicks. In this case I’m going to show you how to add your existing cluster to Percona ClusterControl so you can manage and monitor it on the web interface.

The environment will be pretty similar, we will have UI, CMON and 3 XtraDB Cluster nodes. The cluster should be already running and Percona ClusterControl also installed.

Adding an existing Cluster

The ClusterControl web interface is empty, there are no clusters on it. To add an existing one we need to click on “Add existing Galera Cluster.” (Click on the image for an enlarged view).

 

 

Add Existing ClusterA new form will be shown pretty similar to the one we saw last time when we were creating a new cluster. We can divide the form in two parts. First we need to give information about our Cluster. The info requires is the Linux distribution and version, IP of PXC nodes and MySQL root passwords. Pretty easy:

 

 

 

 

FromIn the second part we have the SSH configuration. There is one pre-requisite, the UI server should be able to connect to all servers using a SSH key. Therefore, our first step is to create a SSH key pair in our UI server and copy the public one to all other servers.

 

 

It is also necessary to add the private key in the web interface. You can do it using the form shown after clicking on “Add Key Pair”:

Add key pair

Once the key is added, we can verify the access:

Access Check

As we can see here, everything works as expected and all servers are reachable by SSH. The parameter “Create shared SSH key” also needs to be enabled. That option will make ClusterControl to create a new SSH key pair on CMON node so this one can also connect to PXC nodes with passwordless SSH.

Now everything is prepared. We can proceed with the deployment. Just click on “Add cluster” and the installation process will start. While the installation is in progress you will see this notification:

 

 

Deployment notification 

 

Clicking on it we can see the progress of the deployment:

Progress

After some minutes our PXC is shown in the Percona ClusterControl UI:

Cluster

Now we can monitor it, get alerts, clone, run backups and everything from the web interface. You can also add multiple clusters and create new ones.

The post How to add an existing Percona XtraDB Cluster to Percona ClusterControl appeared first on MySQL Performance Blog.

Read more at: http://www.mysqlperformanceblog.com/