Prometheus as an Engine for MySQL Monitoring

prometheusWhen I first discovered Graphite years ago, I was very impressed with its monitoring capabilities.  Compared to many RRD-based tools that were popular at the time (like Cacti), Graphite separated the captured data and graphs, allowing you to do all kinds of math and transformations while visualizing data. For example, I could plot the relationship between system queries and disk IO, and capture how the system was getting more IO bound over time. It also had reasonably high performance, allowing me to capture high-resolution data for medium-sized systems.

Just last year I discovered Prometheus, and it also impressed me. I think it has the potential to take Graphite’s flexibility to the next level. Though I am in no way a Prometheus expert, I  want to share my understanding and thoughts on it so far.

Data Model

The data model is perhaps what attracted me to Prometheus the most. While it’s not obvious at first, when you do figure it out it has fantastic flexibility.

In the data model used by Whisper and Carbon in Graphite, you will use something like this to store MySQL data:

myapp.store.mysql.db01.status.questions = 5000

You can set up any hierarchy you like, but it has to have a hierarchy.

What Prometheus does instead is allow you to use a set of key-value pairs. The same data shown above could be presented like this:

questions_total{app=”myapp”,subsystem=”store”,engine=”mysql”,host=”db01”, source=”status”} = 5000

(You most likely wouldn’t use this exact structure, but it’s good for illustration.)

The difference between these approaches it that Prometheus provides you multiple dimensions on which you can filter and aggregate, plus you can add those dimensions later as you need them (without needing to redesign your tree hierarchy).

These labels are very dynamic, and I can change them in a second. For example, a MySQL server reporting as a “Master” might start reporting as a “Slave” in the very next second, and its data will be aggregated differently.

This is especially important in the modern, often cloud-based and virtualized world. For example, using Prometheus it is very easy to tag servers by their region or availability zones. I can also do things like compute MySQL space usage by both the database and storage engine. The possibilities are endless.

Data Capture

Unlike Graphite – where the main model is push and the hosts themselves choose what kind of information they want to push to monitoring system and at which intervals – with Prometheus you set up “Exporters” that have the ability to export the data. It is up to the Prometheus server configuration to choose what data to sample and how frequently.

The clear benefit of Prometheus’ approach is that you can have as many servers as you like pulling the data, so it is very easy to create a development version of your system and play around with it – without affecting production. It also provides a simple pathway to high availability.

(Both the push and pull approaches have their benefits and drawbacks. Brian Brazil wrote an excellent article advertising the pull model of monitoring.)

Prometheus does create a few challenges for me. Unless I want to set up Service Discovery, it is a hassle to monitor any development/test VMs I might spin up (that would otherwise not be open to external access at all). While this isn’t the main use case for Prometheus, it is helpful for me to test the dashboard’s behavior with different operating systems, workloads, etc.

A more significant issue I discovered is dealing with some data that can’t be captured to multiple locations, because the data capture causes the data to change.

Here is specific example: if I look at the

events_statements_summary_by_digest

 table in

PERFORMANCE_SCHEMA

, there is a

MAX_TIMER_WAIT

 field that shows me what the maximum query execution time is for the query pattern. If I want to get the maximum query execution time for every minute, for example, I would need to “truncate” the table to reset the statistics and let the maximum value be computed again. If I don’t perform that operation, the data becomes meaningless. If I make the exporter to reset the statistics during the poll, however, I can’t pull it from two Prometheus servers.

This is one instance where Prometheus’ performance schema design could be better. I could set up a Cron job or Event to clear out the statistics regularly and get a  proper maximum value for every five minutes, but that isn’t an overly convenient solution.

Another issue I discovered is that Prometheus doesn’t have any protection from bad (long) samples, or a very good method of detecting of them. Let’s imagine that I have a MySQL server and I’m sampling status data every second. For some reason the call to

SHOW GLOBAL STATUS

 took five seconds to execute. The truth is we don’t really know where in those five seconds the

SHOW GLOBAL STATUS

 output corresponds – it might be at very start, it might be at the very end. As such, you don’t really know how to process the counters. Whatever you do, you’re likely to be wrong. My preference in this case it to simply discard such samples, because even missing one percent of the samples is unlikely to change the whole picture. Constantly questioning whether you really had a couple of seconds where the QPS spiked to ten times the normal rate, or that it’s an invalid sample, is not something I on which I want to waste a lot of time!

My preferred approach is to configure the

SHOW GLOBAL STATUS

 capture so that if it takes more than ten percent of the capture interval, it will be discarded. For example, with a one second capture I would allow 100ms for the capture. If the system is not keeping up with this scale, I would be better to not fool myself and reduce the capture resolution to around five seconds.

The only protection Prometheus allows is to configure the scrape_timeout, but unfortunately it is only limited to one second resolution at this point. This is too coarse for any high-resolution capture.

Finally, it is also inconvenient to specify different resolutions for different data. In MySQL there is a often a lot of data that I want to capture, but the resolution needed for each capture is different. For example,

SHOW GLOBAL STATUS

 with one second resolution is must. At the same time, capturing the table size information from

INFORMATION_SCHEMA

 with a one second resolution would put too much load on MySQL, especially if there are a lot of tables. That level of resolution in this case isn’t really needed.

An attractive thing about Prometheus is that the Prometheus development team uses it a lot for MySQL monitoring, so the MySQL Exporter is really good. Most MySQL monitoring plugins I find resort to reporting just a few basics statistics, which is not nearly enough for advanced diagnostics. The Prometheus MySQL exporter gets tons of stuff and has been adding more in every version.

I also very much like that the Prometheus Exporters are designed using HTTP protocol. This means it is very easy to debug or see what kind of data they capture. They present it simply using a web-browser:

HTTP browser

Computational Model

I think the basic operations in Prometheus are pretty intuitive, but if you look at some of the advanced behaviors you’re going to find some inconveniences and some things that are likely to surprise you.

One inconvenience is that Prometheus is mainly designed for working with high resolution data. If there are more than five minute holes (by default) in the time series, they could disappear from the graphs. As I mentioned, for MySQL there is quite a lot of information that it makes sense to capture at a resolution lower than five minutes.

Prometheus functions are looking in the “past,” and designed in a way that the value of the function at any time (T) when it could be computed is not going to change. It all looks clean and logical, but it causes issues with holes in the data capture.  

As an example, let’s imagine following five seconds where the total number of questions from the start successfully scrapped some seconds but not others (due to a network issue, overload, etc.):

1  –  10
2  –  20
3  –  X
4  –  X
5  –  200

 When we capture data of “counter” type the most important value it has is not the actual counter value at the given time but the rate of change of the counter at different time intervals. If in this case, for example, the query rate was ten QPS for intervals one through two seconds, this can be clearly computed. But what was the query rate in the three through four second interval? We don’t really have exact data, but that is fine: we know there have been 180 queries during the two through five second interval, giving us 60 QPS (which we can use for the three through four seconds interval).

This is NOT, however, how Prometheus will compute it if you use a high irate() function (which is suppose to give you highest resolution possible). When you evaluate irate() at T=4, it doesn’t have access to the T=5 value, even if it is in the database. Instead, it will look back and find the matching previous interval (one through two) and use the corresponding value of ten QPS.

I find this pretty puzzling and inconvenient.

There is also the rate() function, which can be used to get the average rate for the period.  Unfortunately it can’t estimate the rate for a smaller period based on the available data for a longer period. So for example if I ask rate() function to compute a query rate at T=4, looking one second back, it will return no data. This isn’t a big deal when you’re working with data manually, but if you’re building zoomable dashboards it means you can zoom in to the point where the data will disappear (rather than stopping at the best possible value available).

Storage

Prometheus has its own high performance storage system which is based in part on LevelDB. It is highly optimized for time series and can achieve a very high level of compression. Be ready, though: all your label combinations will create a different time series on the low level, and will require a lot of files. This isn’t really a problem with SSD drives and modern file systems, but it something to look out for.

The capture engine and storage systems are rather efficient. Even though Prometheus does not have built in clustering for “scaling out,” you can reportedly get more than 300K metrics per second captured on a single node. You can also use multiple Prometheus servers as needed.

The problem I found with Prometheus’ storage is that it is very self contained: you can only use it from Prometheus or access it from the HTTP API. There are no tools at this point to export it for advanced analysis with R, or to dump the whole database into something like JSON format so it can be loaded into a different database engine. Some of these features might already be on roadmap.

Purging and Aggregation

Retention configuration in Prometheus is pretty spartan. You can set

storage.local.retention

 to the length you want to store the data, but that’s it. You can’t configure it to purge different data at different times. You can run multiple Prometheus instances to achieve this, but it’s quite a hassle.  It’s also not possible to instruct Prometheus to automatically build summaries in order to execute low resolution queries faster.

For example if I have MySQL’s query rate captured every second, but I want to view the data over a long time period (e.g., how it changed over last three months to estimate growth trends), data aggregated at hour intervals would be enough for that purpose.

There is support for recording rules to help achieve some of this, but it is not explicit or convenient in my opinion.

Looking at the Prometheus roadmap, some of these issues might not be fixed in Prometheus but achieved through integrating other systems such as InfluxDB (where experimental support already exists).

Purpose

A lot of these limitations make sense if you look at the purpose for which Prometheus was created: getting high-resolution data and being able to provide as much troubleshooting information as possible to its Alerting engine. It is not really designed for storing extensive history. Too bad! I would very much like to get both of those properties in the single system!

Visualization

As you install Prometheus, it has a built-in Expression Browser, which is great for debugging and interactive analyses. It also allows you to see what data you actually have in the database. It will disappoint you, however, if you’re looking for beautiful graphs!

HTTP installer

This shows I have the information about MySQL query rate from two servers, as well as the available and configured labels.

If I want to pick one server and look at the average rate of queries per five minutes, I can do this:

HTTP graphs

There are some tools available in the graph to chose the time range and resolution.

You should aware that visualizing data with the rate() function often shows you things that do not exist. In this case, it looks like the number of queries was gradually creeping up. In reality, I just started the benchmark so the number of queries jumped almost immediately. This is what the real situation looks like (using irate()):

HTTP graphs 2

As I explained before, irate() does not handle missing data points very well, plus it behaves somewhat bizarrely when you “zoom out” – providing instant rate information at sparse intervals (e.g., the instant rate computed every one second over 60 seconds) rather than smoothing things to averages.

There is also the PromDash tool available for Prometheus, which gives you nicer looking dashboards and supports a lot of Prometheus’ features. Now that Grafana has official support for Prometheus, it is my preferred tool to build dashboards – especially since it supports multiple data sources besides Prometheus.

Summary

I’m very excited about Prometheus. It allows me to get a lot of information easily and use it for Performance analyses in benchmarking or troubleshooting. It would be great if it also had a simple integrated solution for long term storage and trending. I am also looking forward to better integration with Grafana and better documentation on how to create Prometheus-based dashboards – especially with some Prometheus-based examples!  

Note: All above was written about Prometheus 0.16.1. Prometheus is rapidly evolving and may  change with newer versions.

Read more at: http://www.mysqlperformanceblog.com/

Tired of MySQL Making You Wait? Webinar: Questions and Answers

MySQLWe’d like to thank everybody for joining us on January 7th for our “Tired of MySQL Making You Wait?” webinar with Percona’s Alexander Rubin, Principal Consultant and SolarWinds’ Janis Griffin, Database Evangelist.

Too often developers and DBAs struggle to pinpoint the root cause of performance issues and then spend too much time in trying to fix them. In the webinar, we discussed how you can significantly increase the performance of your applications while also reducing database response time.

You can find an archived version of the webinar here.

Below are the questions that were asked during the webinar, with responses from Alex and Janis. If you need further clarification, please respond in the comments.

Thanks again, and we look forward to you joining us at our next webinar (with Percona CEO Peter Zaitsev), Compression In Open Source Databases!

 

Q: Are there special tuning tips for Galera Cluster?

A: Since Galera Cluster (Percona XtraDB Cluster) is based on MySQL, all query tuning tips will apply as well. There are a number of Galera Cluster configuration tips available: for example the blog post at this link talks about tuning the PXC for maximum write throughput: https://www.percona.com/blog/2015/06/03/optimizing-percona-xtradb-cluster-write-hotspots/

 

Q: Does DPA support Galera Cluster ?

A: Yes, DPA has the ability to group the cluster together to see load balancing, top 15 SQLs across the cluster, plus the top wait states.

 

Q: Can I create a covered index when I have “group by” and “order by” instructions together?

A: Yes, you can create a covered index and MySQL will use it to satisfy the query (you will see “using index”). If you have “group by” and “order by” on a different columns, however, MySQL will still have to perform a filesort and create a temp table. To create this index, specify all the following fields in your query in the index:

  1. All fields in the “where” condition
  2. The “group by” fields
  3. The “order by” fields
  4. The fields that the query is selecting.

Please note the limitations of such approach described here: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

 

Q: Can we use DPA with Azure MySQL?

A: Yes, DPA will monitor, tune and analyze the SQL server performance running on Microsoft Azure.

 

Q: Do you know if MariaDB has or is planning to follow with these virtual fields and/or SYS schema enhancements from MySQL 5.7?

A: MariaDB has had virtual or computed columns since version 5.2. I don’t believe MariaDB comes with the sys schema already installed, but you can download and install it.

 

Q: Does DPA support PostgreSQL? If not, is it in the roadmap?

A: Currently, DPA does not support PostgresSQL. However, we continually re-evaluate it with each new release.

 

Q: Does DPA support RDS instances?

A: Yes, DPA supports the monitoring of RDS instances.

 

Q: Does the performance schema show any information about how the load data is performing?

A: MySQL 5.5 performance_schema became available in 5.5.3 and has only 11 tables. Most of the tables deal with wait events and file information. In addition, you would need turn on the consumers and enable the instrumentation of the wait events. Once you’ve done that, you will be able to see the threads and what they are waiting on.

 

Q: I didn’t understand the reasoning that leads to the index on ORDER BY. I can’t link it to the previous slide query.

A: I assume this question is about the ORDER BY + LIMIT optimization. When you create an index on the ORDER BY field only, MySQL can start reading the whole table in the order of the index. As the index is sorted, it can start fetching the rows and filter out the rows that don’t match the ORDER BY condition. As there is a LIMIT N on the query, MySQL will stop after fetching N rows.

 

Q: How can I analyze parts of a stored procedure that runs nightly to see where by bottlenecks are? It has 100+ update queries that it performs every night to build a table with one million plus rows.

A: You can do it using the slow query log in Percona Server (5.5/5.6) and/or Performance Schema in MySQL 5.7. If you are running Percona Server, you can enable extended stored procedures logging as described here: https://www.percona.com/doc/percona-server/5.6/diagnostics/slow_extended.html. Another way is using a deprecated “show profile” method as described here: https://www.percona.com/blog/2009/01/19/profiling-mysql-stored-routines/

 

Q: How will DPA use the index when there are more than five columns in the “where” conditions? How would you create indexes?

A: I would suggest checking the “cardinality” of the fields (= number of unique values). Usually (unless you create a covered index or are optimizing the group by) it makes much more sense to limit the number of fields in an index, and only include the fields with the high cardinality. For example, PRIMARY KEY or UNIQUE INDEX works best, whereas the “gender” field (with only two unique values, “male” and “female”) would not be very useful.

 

Q: How would the analytics tool work in an open stack VM environment, where we have 100 database servers?

A: One installation of DPA can monitor hundreds of database servers. In fact, we have several very large companies that monitor 1000s of servers worldwide.

 

Q: If you have a small table with only 100 records, is it worth creating indexes on specific fields or just do a table scan?

A: If the table is only 100 records and you are not joining it with other tables, it usually does not make sense to add indexes. But because the table is so small it doesn’t really matter either way.

 

Q: Is the SolarWinds tool better than MONyog, and how expensive is the license cost for this?

A: MONyog is also a monitoring tool, but it doesn’t have the advisors, alarms, granularity, history, or customizations that DPA gives you. The retail cost per server is currently $1,995 per monitored server, but is heavily discounted the more you purchase.

 

Q: In many cases, due to the randomness and complexity of queries thrown at various tables, I end up creating a lot of indexes. At what point would there be too many indexes? Should I then create MySQL views instead of indexes? Should one use MySQL views at all to optimize searches?

A: First of all there are no “materialized views” in MySQL, so it is not a useful replacement for indexes. You can create “summary” tables manually, which will usually help a lot. Although it is hard to say when you have too many indexes, lots of indexes can decrease the performance of your insert/update/delete operations, as well as confuse MySQL. So a great many indexes might cause MySQL to start choosing a wrong index when doing selects.

 

Q: Sometime, we need to add indices for different queries for the same table. Eventually, the table has too many indices. Any suggestion for such cases?

A: See the response to the previous question.

 

Q: Is there a way in DPA to see what queries are currently running? In other words, to know about slow queries as they run rather than only knowing about them historically?

A: Yes. In the “Current” dashboard, click the “Currently Active Sessions” box. With this option, you can sort by longest running, etc.

 

Q: Why is delay indexed in the composite key? It only covers the query, but the temp table can be avoided by the first two fields?

A: You are referring to this example:

mysql> alter table ontime_2012
add key covered(dayofweek, Carrier, DepDelayMinutes);
explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2012 where dayofweek =7 group by CarrierG
...                    
possible_keys: DayOfWeek,covered
          key: covered
          key_len: 2
          ref: const
          rows: 905138
          Extra: Using where; Using index

The reason we add DepDelayMinutes is to make the index covered, so MySQL will be able to satisfy the query with an index only.

 

Read more at: http://www.mysqlperformanceblog.com/

Tools and tips for analysis of MySQL’s Slow Query Log

MySQL's Slow Query LogMySQL has a nice feature, slow query log, which allows you to log all queries that exceed a predefined about of time to execute. Peter Zaitsev first wrote about this back in 2006 – there have been a few other posts here on the MySQL Performance Blog since then (check this and this, too) but I wanted to revisit his original subject in today’s post.

Query optimization is essential for good database server performance and usually DBAs need to ensure the top performance possible for all queries. In MySQL, the desirable way is to generate a query log for all running queries within a specific time period and then run a query analysis tool to identify the bad queries. Percona Toolkit’s pt-query-digest is one of the most powerful tools for SQL analysis. That’s because pt-query-digest can generate a very comprehensive report that spots problematic queries very efficiently. It works equally well with Oracle MySQL server. This post will focus mainly on pt-query-digest.

Slow query log is great at spotting really slow queries that are good candidates for optimization. Beginning with MySQL 5.1.21, the minimum value is 0 for long_query_time, and the value can be specified to a resolution of microseconds. In Percona Server additional statistics may be output to the slow query log. You can find the full details here. For our clients, we often need to identify queries that impact an application the most. It does not always have to be the slowest queries – queries that runs more frequently with lower execution time per call put more load on a server than queries running with lower frequency. We of course want to get rid of really slow queries but to really optimize application throughput, we also need to investigate queries that generate most of the load. Further, if you enable option log_queries_not_using_indexes  then MySQL will log queries doing full table scans which doesn’t always reflect that the query is slow, because in some situations the query optimizer chooses full table scan rather than using any available index or probably showing all records from a small table.

Our usual recommendation is to generate the slow log with long_query_time=0. This will record all the traffic but this will be I/O intensive and will eat up disk space very quickly depending on your workload. So beware of running with long_query_time=0 for only a specific period of time and revert it back to logging only very slow queries. In Percona Server there is nice option where you can limit the rate of logging, log_slow_rate_limit is the option to handle it. Filtering slow query log is very helpful too in some cases e.g. if we know the main performance issue is table scans we can log queries only doing full table scans or if we see I/O is bottleneck we can collect queries doing full scans and queries creating on disk temporary tables. Again, this is only possible in Percona Server with the log_slow_filter option. Also, you may want to collect everything on slow query log and then filter with pt-query-digest. Depending on I/O capacity, you might prefer one or another way, as collecting everything in slow query log allows us to investigate other queries too if needed. Finally, use pt-query-digest to generate an aggregate report over slow query log which highlights the problematic part very efficiently. Again, pt-query-digest can bring up server load high so our usual recommendation on it is to move slow query log to some staging/dev server and run pt-query-digest over there to generate the report.

Note: changing the long_query_time parameter value only affects newly created connections to log queries exceeds long_query_time threshold. In Percona Server there is feature which changes variable scope to global instead of local. Enabling slow_query_log_use_global_control  log queries for connected sessions too after changing long_query_time parameter threshold. You can read more about this patch here.

I am not going to show you a detailed report of pt-query-digest and explain each part of it here, because it is well defined already by my colleague Ovais Tariq in this post. However, I will show you some of the other aspects of pt-query-digest tool here.

Let me show you code snippets that enable slow query log for only a specific time period with long_query_time=0 and log_slow_verbosity to ‘full’. log_slow_verbosity is a Percona Server variable which logs extra stats such as information on query cache, Filesort, temporary tables, InnoDB statistics etc. Once you are done collecting logs, revert back the values for long_query_time to the previous value, and finally run pt-query-digest on the log to generate report. Note: run the below code in same MySQL session.

-- Save previous settings
mysql> SELECT @@global.log_slow_verbosity INTO @__log_slow_verbosity;
mysql> SELECT @@global.long_query_time INTO @__long_query_time;
mysql> SELECT @@global.slow_query_log INTO @__slow_query_log;
mysql> SELECT @@global.log_slow_slave_statements INTO @__log_slow_slave_statements;
-- Keep this in safe place, we'll need to run pt-query-digest
mysql> SELECT NOW() AS "Time Since";
-- Set values to enable query collection
mysql> SET GLOBAL slow_query_log_use_global_control='log_slow_verbosity,long_query_time';
mysql> SET GLOBAL log_slow_verbosity='full';
mysql> SET GLOBAL slow_query_log=1;
mysql> SET GLOBAL long_query_time=0;
mysql> SET GLOBAL log_slow_slave_statements=1;
-- Verify settings are OK
mysql> SELECT @@global.long_query_time, @@global.slow_query_log, @@global.log_slow_verbosity;
-- wait for 30 - 60 minutes
-- Keep this one too, also for pt-query-digest
mysql> SELECT NOW() AS "Time Until";
-- Revert to previous values
mysql> SET GLOBAL slow_query_log=@__slow_query_log;
mysql> SET GLOBAL long_query_time=@__long_query_time;
mysql> SET GLOBAL log_slow_verbosity=@__log_slow_verbosity; -- if percona server
mysql> SET GLOBAL log_slow_slave_statements=@__log_slow_slave_statements;
-- Verify settings are back to previous values
mysql> SELECT @@global.long_query_time, @@global.slow_query_log, @@global.log_slow_verbosity, @@global.slow_query_log_file;
-- Then with pt-query-digest run like (replace values for time-since, time-until and log name)
$ pt-query-digest --since='<time-since>' --until='<time-until>' --limit=100% /path/to/slow_query_log_file.log > /path/to/report.out
-- If you're not using Percona Server then you need to remove all references to log_slow_verbosity, slow_query_log_use_global_control and log_slow_slave_statements (priot MySQL 5.6).

My colleague Bill Karwin wrote bash script that does almost the same as the above code. You can find the script to collect slow logs here. This script doesn’t hold connection to the database session while you wait for logs to accumulate and it sets all the variables back to the state they were before. For full documentation view this.

Further, you can also get explain output into the report from the pt-query-digest tool. For that you need to use –explain parameter similar to as follows.

$ pt-query-digest --explain u=<user>,p=<password>,h=<hostname> /path/to/slow.log > /path/to/report.out

Explain output in query report will get you all the information for query execution plan and explain output signal towards how that particular query going to be executed. Note that, if you execute pt-query-digest over slow query log other than originated server of slow query log as I mentioned above e.g. staging/dev you may get different execution path for the query in the report or lower number of rows to examined, etc., because usually staging/dev servers has different data distribution, different MySQL versions, or different indexes. MySQL explain adds overhead as queries needs to be prepared on the server to generate intended query execution path. For this reason, you may want to run pt-query-digest with –explain on a production replica.

It’s worth mentioning that logging queries with log_slow_verbosity in Percona Server is really handy as it shows lots of additional statistics and it is more helpful in situations when the explain plan reports a different execution path than when the query is executed. On that particular topic, you may want to check this nice post.

pt-query-digest also supports filters. You can read more about it here. Let me show you an example. The following command will discard everything apart from insert/update/delete queries in pt-query-digest output report.

$ pt-query-digest --filter '$event->{arg} =~ m/^(insert|update|delete)/i' --since='<time-since>' --until='<time-until>' --limit=100% /path/to/slow_query_log_file.log > /path/to/report.out

If you’re looking for some GUI tools for pt-query-digest then I would recommend reading this nice blogpost from my colleague Roman. Further, our CEO Peter Zaitsev also wrote a post recently where he shows the comparison between performance_schema and slow query log. Check here for details.

In related new, Percona recently announced Percona Cloud Tools, the next generation of tools for MySQL. It runs a client-side agent (pt-agent) which runs pt-query-digest on the server with some intervals and uploads the aggregated data to the Percona Cloud Tools API which process it further.  Query Analytics is one tool from the Percona Cloud Tools that provides advanced query metrics. It  is a nice visualization tool. You may be interested to learn more about it here, and it’s also worth viewing this related webinar about Percona Cloud Tools from our CTO Vadim Tkachenko.

Conclusion:
pt-query-digest from Percona Toolkit is a versatile (and free) tool for slow query log analysis. It provides good insight about every individual query, especially in Percona Server with log_slow_verbosity enabled, e.g. log queries with microsecond precision, log information about the query’s execution plan. On top of that, Percona Cloud Tools includes Query Analytics which provides you with good visuals about query performance and also provides a view of historical data.

The post Tools and tips for analysis of MySQL’s Slow Query Log appeared first on MySQL Performance Blog.

Read more at: http://www.mysqlperformanceblog.com/