Dealing with corrupted InnoDB data

MySQL

MySQLData corruption! It can happen. Maybe because of a bug or storage problem that you didn’t expect, or MySQL crashes when a page checksum’s result is different from what it expected. Either way, corrupted data can and does occur. What do you do then?

Let’s look at the following example and see what can be done when you face this situation.

We have some valuable data:

> select * from t limit 4;
+---+--------+
| i | c      |
+---+--------+
| 1 | Miguel |
| 2 | Angel  |
| 3 | Miguel |
| 4 | Angel  |
+---+--------+
> select count(*) from t;
+----------+
| count(*) |
+----------+
|  2097152 |
+----------+

One day the query you usually run fails and your application stops working. Even worse, it causes the crash already mentioned:

> select * from t where i=2097151;
ERROR 2006 (HY000): MySQL server has gone away

Usually this is the point when panic starts. The error log shows:

2016-01-13 08:01:48 7fbc00133700 InnoDB: uncompressed page, stored checksum in field1 2912050650, calculated checksums for field1: crc32 1490770609, innodb 1549747911, none 3735928559, stored checksum in field2 1670385167, calculated checksums for field2: crc32 1490770609, innodb 2416840536, none 3735928559, page LSN 0 130051648, low 4 bytes of LSN at page end 1476903022, page number (if stored to page already) 4651, space id (if created with >= MySQL-4.1.1 and stored already) 7
InnoDB: Page may be an index page where index id is 22
InnoDB: (index "PRIMARY" of table "test"."t")
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 4651.
InnoDB: You may have to recover from a backup.
InnoDB: It is also possible that your operating
InnoDB: system has corrupted its own file cache
InnoDB: and rebooting your computer removes the
InnoDB: error.
InnoDB: If the corrupt page is an index page
InnoDB: you can also try to fix the corruption
InnoDB: by dumping, dropping, and reimporting
InnoDB: the corrupt table. You can use CHECK
InnoDB: TABLE to scan your table for corruption.
InnoDB: See also http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html
InnoDB: about forcing recovery.
InnoDB: Database page corruption on disk or a failed
InnoDB: file read of page 4651.
InnoDB: You may have to recover from a backup.
2016-01-13 08:01:48 7fbc00133700 InnoDB: Page dump in ascii and hex (16384 bytes):
 len 16384; hex ad925dda0000122b0000122affffffff0000000007c06e4045bf00000000000000000
[...]

OK, our database is corrupted and it is printing the page dump in ASCII and hex. Usually, the recommendation is to recover from a backup. In case you don’t have one, the recommendation would be the same as the one given by the error log. When we hit corruption, first thing we should try is dumping the data and then re-importing to another server (if possible). So, how we can read a corrupted TABLE and avoid the crash? In most cases, the 

innodb_force_recovery

  option will help us. It has values from 1 to 6. They are documented here:

http://dev.mysql.com/doc/refman/5.6/en/forcing-innodb-recovery.html

The idea is to start with 1. If that doesn’t work, proceed to 2. If it fails again, then go to 3 . . . until you find a value that allows you to dump the data. In this case I know that the problem is a corrupted InnoDB page, so a value of 1 should be enough:

“Lets the server run even if it detects a corrupt page. Tries to make SELECT * FROM tbl_name jump over corrupt index records and pages, which helps in dumping tables.”

We add

innodb_force_recovery=1

 and restart the service. Now it’s time to try and dump our data with

mysqldump

. If the corruption is even worse you need to keep trying different modes. For example, I have this error:

> create table t2 like t;
> insert into t2 select * from t;
ERROR 1034 (HY000): Incorrect key file for table 't'; try to repair it
> insert into t2 select * from t;
ERROR 1712 (HY000): Index t is corrupted

innodb_force_recovery=1

 doesn’t work here. It doesn’t allow me to dump the data:

# mysqldump -uroot -pmsandbox --port 5623 -h 127.0.0.1 --all-databases > dump.sql
Error: Couldn't read status information for table t ()

but in my test server, it seems that

innodb_force_recovery=3

  helps.

This procedure sounds good and usually works. The problem is that the feature is mostly broken after 5.6.15.

innodb_force_recovery

 values greater or equal 4 won’t allow the database to start:

2015-07-08 10:25:25 315 [ERROR] Unknown/unsupported storage engine: InnoDB
2015-07-08 10:25:25 315 [ERROR] Aborting

Bug are reported and verified here: https://bugs.mysql.com/bug.php?id=77654

That means that if you have Insert Buffer, Undo Log or Redo log corruption (values 4, 5 and 6) you can’t continue. What to do?

  • You can install a older version of MySQL (previous to 5.6.15) to use higher values of
    innodb_force_recovery

    . Modes 4, 5 and 6 can corrupt your data (even more) so they are dangerous. If there are no backups this is our only option, so my recommendation would be to make a copy of the data we have now and then proceed with higher values of

    innodb_force_recovery

    .

or

  • If you are using Percona Server,
    innodb_corrupt_table_action

      can be used to dump the data. You can use the value “salvage”. When the option value is salvage, XtraDB allows read access to a corrupted tablespace, but ignores corrupted pages.

https://www.percona.com/doc/percona-server/5.6/reliability/innodb_corrupt_table_action.html

If you can’t still dump your data, then you should try more advance solutions like Undrop for InnoDB. Also, it would be good idea to start planning to create regular database backups.    :)

Tired of MySQL Making You Wait? Webinar: Questions and Answers

MySQLWe’d like to thank everybody for joining us on January 7th for our “Tired of MySQL Making You Wait?” webinar with Percona’s Alexander Rubin, Principal Consultant and SolarWinds’ Janis Griffin, Database Evangelist.

Too often developers and DBAs struggle to pinpoint the root cause of performance issues and then spend too much time in trying to fix them. In the webinar, we discussed how you can significantly increase the performance of your applications while also reducing database response time.

You can find an archived version of the webinar here.

Below are the questions that were asked during the webinar, with responses from Alex and Janis. If you need further clarification, please respond in the comments.

Thanks again, and we look forward to you joining us at our next webinar (with Percona CEO Peter Zaitsev), Compression In Open Source Databases!

 

Q: Are there special tuning tips for Galera Cluster?

A: Since Galera Cluster (Percona XtraDB Cluster) is based on MySQL, all query tuning tips will apply as well. There are a number of Galera Cluster configuration tips available: for example the blog post at this link talks about tuning the PXC for maximum write throughput: https://www.percona.com/blog/2015/06/03/optimizing-percona-xtradb-cluster-write-hotspots/

 

Q: Does DPA support Galera Cluster ?

A: Yes, DPA has the ability to group the cluster together to see load balancing, top 15 SQLs across the cluster, plus the top wait states.

 

Q: Can I create a covered index when I have “group by” and “order by” instructions together?

A: Yes, you can create a covered index and MySQL will use it to satisfy the query (you will see “using index”). If you have “group by” and “order by” on a different columns, however, MySQL will still have to perform a filesort and create a temp table. To create this index, specify all the following fields in your query in the index:

  1. All fields in the “where” condition
  2. The “group by” fields
  3. The “order by” fields
  4. The fields that the query is selecting.

Please note the limitations of such approach described here: http://dev.mysql.com/doc/refman/5.7/en/order-by-optimization.html

 

Q: Can we use DPA with Azure MySQL?

A: Yes, DPA will monitor, tune and analyze the SQL server performance running on Microsoft Azure.

 

Q: Do you know if MariaDB has or is planning to follow with these virtual fields and/or SYS schema enhancements from MySQL 5.7?

A: MariaDB has had virtual or computed columns since version 5.2. I don’t believe MariaDB comes with the sys schema already installed, but you can download and install it.

 

Q: Does DPA support PostgreSQL? If not, is it in the roadmap?

A: Currently, DPA does not support PostgresSQL. However, we continually re-evaluate it with each new release.

 

Q: Does DPA support RDS instances?

A: Yes, DPA supports the monitoring of RDS instances.

 

Q: Does the performance schema show any information about how the load data is performing?

A: MySQL 5.5 performance_schema became available in 5.5.3 and has only 11 tables. Most of the tables deal with wait events and file information. In addition, you would need turn on the consumers and enable the instrumentation of the wait events. Once you’ve done that, you will be able to see the threads and what they are waiting on.

 

Q: I didn’t understand the reasoning that leads to the index on ORDER BY. I can’t link it to the previous slide query.

A: I assume this question is about the ORDER BY + LIMIT optimization. When you create an index on the ORDER BY field only, MySQL can start reading the whole table in the order of the index. As the index is sorted, it can start fetching the rows and filter out the rows that don’t match the ORDER BY condition. As there is a LIMIT N on the query, MySQL will stop after fetching N rows.

 

Q: How can I analyze parts of a stored procedure that runs nightly to see where by bottlenecks are? It has 100+ update queries that it performs every night to build a table with one million plus rows.

A: You can do it using the slow query log in Percona Server (5.5/5.6) and/or Performance Schema in MySQL 5.7. If you are running Percona Server, you can enable extended stored procedures logging as described here: https://www.percona.com/doc/percona-server/5.6/diagnostics/slow_extended.html. Another way is using a deprecated “show profile” method as described here: https://www.percona.com/blog/2009/01/19/profiling-mysql-stored-routines/

 

Q: How will DPA use the index when there are more than five columns in the “where” conditions? How would you create indexes?

A: I would suggest checking the “cardinality” of the fields (= number of unique values). Usually (unless you create a covered index or are optimizing the group by) it makes much more sense to limit the number of fields in an index, and only include the fields with the high cardinality. For example, PRIMARY KEY or UNIQUE INDEX works best, whereas the “gender” field (with only two unique values, “male” and “female”) would not be very useful.

 

Q: How would the analytics tool work in an open stack VM environment, where we have 100 database servers?

A: One installation of DPA can monitor hundreds of database servers. In fact, we have several very large companies that monitor 1000s of servers worldwide.

 

Q: If you have a small table with only 100 records, is it worth creating indexes on specific fields or just do a table scan?

A: If the table is only 100 records and you are not joining it with other tables, it usually does not make sense to add indexes. But because the table is so small it doesn’t really matter either way.

 

Q: Is the SolarWinds tool better than MONyog, and how expensive is the license cost for this?

A: MONyog is also a monitoring tool, but it doesn’t have the advisors, alarms, granularity, history, or customizations that DPA gives you. The retail cost per server is currently $1,995 per monitored server, but is heavily discounted the more you purchase.

 

Q: In many cases, due to the randomness and complexity of queries thrown at various tables, I end up creating a lot of indexes. At what point would there be too many indexes? Should I then create MySQL views instead of indexes? Should one use MySQL views at all to optimize searches?

A: First of all there are no “materialized views” in MySQL, so it is not a useful replacement for indexes. You can create “summary” tables manually, which will usually help a lot. Although it is hard to say when you have too many indexes, lots of indexes can decrease the performance of your insert/update/delete operations, as well as confuse MySQL. So a great many indexes might cause MySQL to start choosing a wrong index when doing selects.

 

Q: Sometime, we need to add indices for different queries for the same table. Eventually, the table has too many indices. Any suggestion for such cases?

A: See the response to the previous question.

 

Q: Is there a way in DPA to see what queries are currently running? In other words, to know about slow queries as they run rather than only knowing about them historically?

A: Yes. In the “Current” dashboard, click the “Currently Active Sessions” box. With this option, you can sort by longest running, etc.

 

Q: Why is delay indexed in the composite key? It only covers the query, but the temp table can be avoided by the first two fields?

A: You are referring to this example:

mysql> alter table ontime_2012
add key covered(dayofweek, Carrier, DepDelayMinutes);
explain select max(DepDelayMinutes), Carrier, dayofweek from ontime_2012 where dayofweek =7 group by CarrierG
...                    
possible_keys: DayOfWeek,covered
          key: covered
          key_len: 2
          ref: const
          rows: 905138
          Extra: Using where; Using index

The reason we add DepDelayMinutes is to make the index covered, so MySQL will be able to satisfy the query with an index only.

 

The language of compression

Leif Walsh & friends

Leif Walsh will talk about the language of compression at Percona Live Amsterdam

Storage. Everyone needs it. Whether your data is in MySQL, a NoSQL, or somewhere in the cloud, with ever-growing data volumes – along with the need for SSDs to cut latency and replication to provide insurance – an organization’s storage footprint is an important place to look for savings. That’s where compression comes in (squeeze!) to save disk space.

Two Sigma software engineer Leif Walsh speaks the language of compression. Fluently. In fact, he’ll be speaking on

that exact subject September 22 during the Percona Live conference in Amsterdam.

I asked him about his talk, and about Amsterdam, the other day. Here’s what he had to say.

* * *

Tom: Hi Leif, how will your talk help IT decision-makers cut through the marketing mumbo-jumbo on what’s important to focus on and what is not
Leif: My talk will have three lessons aimed at those making storage decisions for their company:

  1. What are the key factors to consider when evaluating storage options, and how can they affect your bottom line?  This is not only how storage tech influences your hardware, operations, and management costs, but also how it can facilitate new development initiatives and cut time-to-market for your products.
  2. How should you read benchmarks and marketing materials about storage technology?  You’ll learn what to look for in promotional material, and how to think critically about whether that material is applicable to your business needs.
  3. What’s the most effective way to communicate with storage vendors about your application’s requirements?  A lot of time can be spent in the early stages of a relationship in finding a common language for users and vendors to have meaningful discussions about users’ needs and vendors’ capacity to meet those needs.  With the tools you’ll learn in my talk, you’ll be able to accelerate quickly to the high-bandwidth conversations you need to have in order to make the right decision, and consequently, you’ll be empowered to evaluate more choices to find the best one faster.

Tom: In addition to IT decision-makers, who else should attend your session and what will they take away afterward?
Leif: My talk is primarily about the language that everyone in the storage community should be using to communicate. Therefore, storage vendors should attend to get ideas for how  to express their benchmarks and their system’s properties more effectively, and application developers and operations people will learn strategies for getting better support and for making a convincing case to the decision makers in their own company.

Tom: Which session(s) are you most looking forward to besides your own?
Leif: Sam Kottler is a good friend and an intensely experienced systems engineer with a dynamic and boisterous personality, so I can’t wait to hear more about his experiences with Linux tuning.

As one of the original developers of TokuMX, I’ll absolutely have to check out Stephane’s talk about it, but I promise not to heckle. Charity Majors is always hilarious and has great experiences and insights to share, so I’ll definitely check out her talk too.

* * *

Catch Leif’s talk at Percona Live in Amsterdam September 21-23. Enter the promo code “BlogInterview” at registration and save €20! Register now!

The post The language of compression appeared first on MySQL Performance Blog.

MySQL is crashing: a support engineer’s point of view

In MySQL QA Episode #12, “MySQL is Crashing, now what?,” Roel demonstrated how to collect crash-related information that will help Percona discover what the issue is that you are experiencing, and fix it.

As a Support Engineer I (Sveta) am very happy to see this post – but as a person who better understands writing than recording – I’d like to have same information, in textual form. We discussed it, and decided to do a joint blog post. Hence, this post :)

If you haven’t seen the video yet, or you do not have any experience with gdb, core files and crashes, I highly recommend to watch it first.

Once you have an idea of why crashes happen, what to do after it happens in your environment, and how to open a Support issue and/or a bug report, you’re ready for the next step: which information do you need to provide? Note that the more complete and comprehensive information you provide, the quicker the evaluation and potential fix process will go – it’s a win-win situation!

At first we need the MySQL error log file. If possible, please send us the full error log file. Often users like to send only the part which they think is relevant, but the error log file can contain other information, recorded before the crash happened. For example, records about table corruptions, lack of disk space, issues with InnoDB dictionary, etc.

If your error log is quite large, please note it would compress very well using a standard compression tool like gzip. If for some reason you cannot send the full error log file, please sent all lines, written after the words “mysqld: ready for connections” (as seen the last time before the actual crash), until the end of the error log file (alternatively, you can also search for rows, started with word “Version:”). Or, if you use scripts (or mysqld_safe) which automatically restart MySQL Server in case of disaster, obviously please search for the one-previous server start after the crash.

An example which includes an automatic restart as mentioned above:

2015-08-03 14:24:03 9911 [Note] /home/sveta/SharedData/Downloads/5.6.25/bin/mysqld: ready for connections.
Version: '5.6.25-73.1-log'  socket: '/tmp/mysql_sandbox21690.sock'  port: 21690  Percona Server (GPL), Release 73.1, Revision 07b797f
2015-08-03 14:24:25 7f5b193f9700 InnoDB: Buffer pool(s) load completed at 150803 14:24:25
11:25:12 UTC - mysqld got signal 4 ;
This could be because you hit a bug. It is also possible that this binary
or one of the libraries it was linked against is corrupt, improperly built,
or misconfigured. This error can also be caused by malfunctioning hardware.
We will try our best to scrape up some info that will hopefully help
diagnose the problem, but since we have already crashed,
something is definitely wrong and this may fail.
Please help us make Percona Server better by reporting any
bugs at http://bugs.percona.com/
key_buffer_size=268435456
read_buffer_size=131072
max_used_connections=1
max_threads=216
thread_count=1
connection_count=1
It is possible that mysqld could use up to
key_buffer_size + (read_buffer_size + sort_buffer_size)*max_threads = 348059 K  bytes of memory
Hope that's ok; if not, decrease some variables in the equation.
Thread pointer: 0x0
Attempting backtrace. You can use the following information to find out
where mysqld died. If you see no messages after this, something went
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(my_print_stacktrace+0x2e)[0x8dd38e]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(handle_fatal_signal+0x491)[0x6a5dc1]
/lib64/libpthread.so.0(+0xf890)[0x7f5c58ac8890]
/lib64/libc.so.6(__poll+0x2d)[0x7f5c570fbc5d]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(_Z26handle_connections_socketsv+0x1c2)[0x5f64c2]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(_Z11mysqld_mainiPPc+0x1b5d)[0x5fd87d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f5c57040b05]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld[0x5f10fd]
You may download the Percona Server operations manual by visiting
http://www.percona.com/software/percona-server/. You may find information
in the manual which will help you identify the cause of the crash.
150803 14:25:12 mysqld_safe Number of processes running now: 0
150803 14:25:12 mysqld_safe mysqld restarted
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld: /lib64/libssl.so.1.0.0: no version information available (required by /home/sveta/SharedData/Downloads/5.6.25/bin/mysqld)
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld: /lib64/libcrypto.so.1.0.0: no version information available (required by /home/sveta/SharedData/Downloads/5.6.25/bin/mysqld)
2015-08-03 14:25:12 0 [Warning] TIMESTAMP with implicit DEFAULT value is deprecated. Please use --explicit_defaults_for_timestamp server option (see documentation for more details).
2015-08-03 14:25:12 0 [Note] /home/sveta/SharedData/Downloads/5.6.25/bin/mysqld (mysqld 5.6.25-73.1-log) starting as process 10038 ...
2015-08-03 14:25:12 10038 [Warning] Buffered warning: Changed limits: max_open_files: 1024 (requested 50005)
2015-08-03 14:25:12 10038 [Warning] Buffered warning: Changed limits: max_connections: 214 (requested 10000)
2015-08-03 14:25:12 10038 [Warning] Buffered warning: Changed limits: table_open_cache: 400 (requested 4096)
2015-08-03 14:25:12 10038 [Note] Plugin 'FEDERATED' is disabled.
2015-08-03 14:25:12 10038 [Note] InnoDB: Using atomics to ref count buffer pool pages
2015-08-03 14:25:12 10038 [Note] InnoDB: The InnoDB memory heap is disabled
2015-08-03 14:25:12 10038 [Note] InnoDB: Mutexes and rw_locks use GCC atomic builtins
2015-08-03 14:25:12 10038 [Note] InnoDB: Memory barrier is not used
2015-08-03 14:25:12 10038 [Note] InnoDB: Compressed tables use zlib 1.2.3
2015-08-03 14:25:12 10038 [Note] InnoDB: Using Linux native AIO
2015-08-03 14:25:12 10038 [Note] InnoDB: Using CPU crc32 instructions
2015-08-03 14:25:12 10038 [Note] InnoDB: Initializing buffer pool, size = 4.0G
2015-08-03 14:25:13 10038 [Note] InnoDB: Completed initialization of buffer pool
2015-08-03 14:25:13 10038 [Note] InnoDB: Highest supported file format is Barracuda.
2015-08-03 14:25:13 10038 [Note] InnoDB: The log sequence numbers 514865622 and 514865622 in ibdata files do not match the log sequence number 514865742 in the ib_logfiles!
2015-08-03 14:25:13 10038 [Note] InnoDB: Database was not shutdown normally!
2015-08-03 14:25:13 10038 [Note] InnoDB: Starting crash recovery.
2015-08-03 14:25:13 10038 [Note] InnoDB: Reading tablespace information from the .ibd files...
2015-08-03 14:25:14 10038 [Note] InnoDB: Restoring possible half-written data pages
2015-08-03 14:25:14 10038 [Note] InnoDB: from the doublewrite buffer...
InnoDB: Last MySQL binlog file position 0 150866, file name mysql-bin.000006
2015-08-03 14:25:16 10038 [Note] InnoDB: 128 rollback segment(s) are active.
2015-08-03 14:25:16 10038 [Note] InnoDB: Waiting for purge to start
2015-08-03 14:25:16 10038 [Note] InnoDB:  Percona XtraDB (http://www.percona.com) 5.6.25-rel73.1 started; log sequence number 514865742
2015-08-03 14:25:16 7f67ceff9700 InnoDB: Loading buffer pool(s) from .//ib_buffer_pool
2015-08-03 14:25:16 10038 [Note] Recovering after a crash using mysql-bin
2015-08-03 14:25:16 10038 [Note] Starting crash recovery...
2015-08-03 14:25:16 10038 [Note] Crash recovery finished.
2015-08-03 14:25:17 10038 [Note] RSA private key file not found: /home/sveta/sandboxes/rsandbox_Percona-Server-5_6_25/master/data//private_key.pem. Some authentication plugins will not work.
2015-08-03 14:25:17 10038 [Note] RSA public key file not found: /home/sveta/sandboxes/rsandbox_Percona-Server-5_6_25/master/data//public_key.pem. Some authentication plugins will not work.
2015-08-03 14:25:17 10038 [Note] Server hostname (bind-address): '127.0.0.1'; port: 21690
2015-08-03 14:25:17 10038 [Note]   - '127.0.0.1' resolves to '127.0.0.1';
2015-08-03 14:25:17 10038 [Note] Server socket created on IP: '127.0.0.1'.
2015-08-03 14:25:17 10038 [Warning] 'proxies_priv' entry '@ root@thinkie' ignored in --skip-name-resolve mode.
2015-08-03 14:25:17 10038 [Note] Event Scheduler: Loaded 0 events
2015-08-03 14:25:17 10038 [Note] /home/sveta/SharedData/Downloads/5.6.25/bin/mysqld: ready for connections.
Version: '5.6.25-73.1-log'  socket: '/tmp/mysql_sandbox21690.sock'  port: 21690  Percona Server (GPL), Release 73.1, Revision 07b797f

Usually the error log file contains the actual query which caused the crash. If it does not and you know the query (for example, if your application logs errors / query problems), please send us this query too. Additionally, if possible, include the CREATE TABLE statements for any tables mentioned in the query. Actually working with the query is the first thing which you can do to resolve the issue: try to run this query (on a non-production/test server which is as close a copy to your production server as possible), to ensure it crashes MySQL Server consistently. If so, you can try and create a temporary workaround by avoiding this kind of queries in your application.

If you have doubts as to which query caused the crash, but have the general query log turned ON, you can use utility

parse_general_log.pl

  from percona-qa to create a potential test case. Simply execute:

$ sudo yum install bzr
$ cd ~
$ bzr branch lp:percona-qa
$ cp /path_that_contains_your_general_log/your_log_file.sql ~
$ ~/percona-qa/parse_general_log.pl -i./your_log_file.sql -o./output.sql

And subsequently execute output.sql against mysqld on a non-production test server to see if a crash is produced. Alternatively, you may mail us the output.sql file (provided your company privacy etc. policies allow for this). If you want to try and reduce the testcase further, please see QA Episode #7 on reducing testcases.

The next thing which we need is a backtrace. You usually have a simple backtrace showing in the error log directly after crash. An example (extracted from an error log) of what this looks like:

stack_bottom = 0 thread_stack 0x40000
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(my_print_stacktrace+0x2e)[0x8dd38e]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(handle_fatal_signal+0x491)[0x6a5dc1]
/lib64/libpthread.so.0(+0xf890)[0x7f5c58ac8890]
/lib64/libc.so.6(__poll+0x2d)[0x7f5c570fbc5d]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(_Z26handle_connections_socketsv+0x1c2)[0x5f64c2]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(_Z11mysqld_mainiPPc+0x1b5d)[0x5fd87d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f5c57040b05]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld[0x5f10fd]

Note that the above backtrace is mangled. You can send us the file like this (we can demangle it). However, if you want to work with it yourself more comfortably you can unmangle it with help of

c++filt

  utility:

sveta@linux-85fm:~/sandboxes/rsandbox_Percona-Server-5_6_25> cat master/data/msandbox.err | c++filt
...
terribly wrong...
stack_bottom = 0 thread_stack 0x40000
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(my_print_stacktrace+0x2e)[0x8dd38e]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(handle_fatal_signal+0x491)[0x6a5dc1]
/lib64/libpthread.so.0(+0xf890)[0x7f5c58ac8890]
/lib64/libc.so.6(__poll+0x2d)[0x7f5c570fbc5d]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(handle_connections_sockets()+0x1c2)[0x5f64c2]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld(mysqld_main(int, char**)+0x1b5d)[0x5fd87d]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f5c57040b05]
/home/sveta/SharedData/Downloads/5.6.25/bin/mysqld[0x5f10fd]
...

Now the backtrace looks much nicer. When sending error log reports to us, please to not use 

c++filt

 on them before sending. We have a list of known bugs, and to scan the known bugs list we need to receive the error log unaltered.

You can also turn core files ON. Core files are memory dump files, created when a process crashes. They are very helpful for debugging, because they contain not only the backtrace of crashing thread, but backtraces of all threads, and much of what was in memory at the time the crash occurred.

Sidenote: Please note it is always a good idea to have the debuginfo (for example Percona-Server-56-debuginfo.x86_64 from the Percona Repository) package installed. This package provides the debugging symbols for Percona server (there are similar packages for other distributions) and ensures that stack traces are more readable and contain more information. It is important to ensure that you have the right package version etc. as symbols are different for each version of mysqld. If you have installed Percona Server from our repository, you can simply install the debuginfo package, the version will be correct, and the package will be auto-updated when Percona Server is updated.

By default the MySQL server does not create core files. To let it do so, you can follow instructions from the “GDB Cheat sheet” (page 2 under header ‘Core Files Cheat Sheet’). In short:

  • Add the option core-file under the [mysqld] section of your configuration file
  • Tune your OS options, so it allows mysqld to create core files as described in the cheat sheet
    echo "core.%p.%e.%s" > /proc/sys/kernel/core_pattern
    ulimit -c unlimited
    sudo sysctl -w fs.suid_dumpable=2
    Note: some systems use ‘kernel’ instead of ‘fs’: use kernel.suid_dumpable=2 instead if you get key or file warnings/errors.
  • Restart the MySQL server

Besides the core file which is generated by the MySQL server, you can also setup the operating system to dump a core file. These are two different core files (for a single crash of the mysqld binary), and the amount of information contained within may differ. The procedure above shows how to setup the one for the MySQL Server alone.

If you like your operating system to dump a core file as well, please see the MySQL QA Episode #12 video. Also, please note that changing the ulimit and fs.suid_dumpable settings may alter the security of your system. Please read more about these options online before using them or leaving them permanently on a production system.

Once a core file is generated, you can use the GDB utility to debug the core file (also called a ‘coredump’). GDB allows you to better resolve backtraces (also called ‘stack traces’ or ‘stacks’), for example by taking a back trace of all threads instead of only the crashing threads. This is off-course better then the single backtrace available in the error log file. To use GDB, you need to first start it:

gdb /path_to_mysqld /path_to_core

/path_to_core is usually your data directory (for coredumps produced by mysqld as a result of using the –core-file option in your my.cnf file), or sometimes in the same directory where the crashing binary is (for coredumps produced by the OS) – though you can specify an alternate fixed location for OS coredumps as the cheat sheet. Note that OS generated dumps are sometimes written with very few privileges and so you may have to use chown/chmod/sudo to access it.

Once you’re into GDB, and all looks fine, run the commands

bt

(backtrace) and  

bt thread apply all

(get backtrace for all threads) to get the stacktraces. bt should more or less match the backtrace seen in the error log, but sometimes this may differ.

For us, ideally you would run the following commands in GDB (as seen in the cheat sheet):

set trace-commands on
set pagination off
set print pretty on
set print array on
set print array-indexes on
set print elements 4096
set logging file gdb_standard.txt
set logging on
thread apply all bt
set logging off
set logging file gdb_full.txt
set logging on
thread apply all bt full

After you run these commands and have existed ( quit ) GDB, please send us the 

./gdb_standard.txt

  and

./gdb_full.txt

 files.

Finally, we would be happy to receive the actual core file from you. In terms of security and privacy, please note that a core file often contains fragments, or sections, or even the full memory of your server.

However, a core file without mysqld is useless, thus please add the mysqld binary together with the core file. If you use our compiled binaries you can also specify the exact package and file name which you downloaded, but if you use a self-compiled version of the server, the mysqld binary is required for us to resolve backtrace and other necessary information (like varialbes) from your core file. Generally speaking, it’s easier just to sent mysqld along.

Also, it would be really nice, if you send us library files which are dynamically linked with mysqld you use. You can get them by using a tool, called

ldd_files.sh

  from the percona-qa suite. Just create a temporary directory, copy your

mysqld

  binary to it and run the tool on it:

sveta@thinkie:~/tmp> wget http://bazaar.launchpad.net/~percona-core/percona-qa/trunk/download/head:/ldd_files.sh-20150713030145-8xdk0llrd3skfsan-1/ldd_files.sh
sveta@thinkie:~/tmp> mkdir tmp
sveta@thinkie:~/tmp> cd tmp/
sveta@thinkie:~/tmp/tmp> cp /home/sveta/SharedData/Downloads/5.6.25/bin/mysqld . # Copy of your mysqld
sveta@thinkie:~/tmp/tmp> ../ldd_files.sh mysqld # Run the tool on it
cp: cannot stat ‘./mysqld: /lib64/libssl.so.1.0.0: no version information available’: No such file or directory # Ignore
cp: cannot stat ‘./mysqld: /lib64/libcrypto.so.1.0.0: no version information available’: No such file or directory # Ignore
sveta@thinkie:~/tmp/tmp> ls
ld-linux-x86-64.so.2 libaio.so.1 libcrypto.so.1.0.0 libcrypt.so.1 libc.so.6 libdl.so.2 libgcc_s.so.1 libm.so.6 libpthread.so.0 librt.so.1 libssl.so.1.0.0 libstdc++.so.6 libz.so.1 mysqld # Files to supply in combination with mysqld

These library files are needed if case some of the frames from the stacktrace are system calls, so that our developers can resolve/check those frames also.

Summary

If you hit a crash, please send us (in order of preference, but even better ‘all of these’):

  • The error log file (please sent it unaltered – i.e. before c++filt was executed – which allows us to scan for known bugs)
  • The crashing query (from your application logs and/or extracted from the core file – ref the query extraction blog post)
    • Please include the matching CREATE TABLE statements
  • A resolved backtrace (and/or preferably the ./gdb_standard.txt and ./gdb_full.txt files)
  • The core file together with the mysqld binary and preferably the ldd files

Thank you!

 

The post MySQL is crashing: a support engineer’s point of view appeared first on MySQL Performance Blog.

How much could you benefit from MySQL 5.6 parallel replication?

I have heard this question quite often: “At busy times, our replicas start lagging quite frequently. We are using N schemas, so which performance boost could we expect from MySQL 5.6 parallel replication?” Here is a quick way to give you a rough estimate of the potential benefit.

General idea

In MySQL 5.6, parallelism is added at the schema level. So in theory, if you have N schemas and if you use N parallel threads, replication could be up to N times faster. This assumes at least 2 things:

  • Replication throughput scales linearly with the number of parallel threads.
  • Writes are evenly distributed across schemas.

Both assumptions are of course not realistic. But it is easy to know the distribution of writes, and that can already give you an idea about how much you could benefit from parallel replication.

Writes are stored in binary logs but it is much easier to work with the slow query log, so we can enable full slow query logging for some time with long_query_time = 0 and then use pt-query-digest to analyze the resulting log file.

An example

I have a test server with 3 schemas, and I’ve run some sysbench load on it to get a decent slow query log file. Once done, I can run this command:

pt-query-digest --filter '$event->{arg} !~ m/^select|^set|^commit|^show|^admin|^rollback|^begin/i' --group-by db --report-format profile slow_query.log > digest.out

and here is the result I get:

# Profile
# Rank Query ID Response time  Calls  R/Call V/M   Item
# ==== ======== ============== ====== ====== ===== ====
#    1 0x       791.6195 52.1% 100028 0.0079  0.70 db3
#    2 0x       525.1231 34.5% 100022 0.0053  0.68 db1
#    3 0x       203.4649 13.4% 100000 0.0020  0.64 db2

In a perfect world, with 3 parallel threads and if each schema would handle 33% of the total write workload, I could expect a 3x performance improvement.

However here we can see in the report that the 3 replication threads will only work simultaneously 25% of the time in the best case (13.4/52.1 = 0.25). We can also expect 2 replication threads to work simultaneously for some part of the workload, but let’s ignore that for clarity.

It means that instead of the theoretical 200% performance improvement (3 parallel threads 100% of the time), we can hardly expect more than a 50% performance improvement (3 parallel threads 25% of the time). And the reality is that the benefit will be much lower than that.

Conclusion

Parallel replication in MySQL 5.6 is a great step forward, however don’t expect too much if your writes are not evenly distributed across all your schemas. The pt-query-digest trick I shared can give you a rough idea whether your workload is a good fit for multi-threaded slaves in 5.6.

I’m expecting much better results for 5.7, partly because parallelism is handled differently, but also because you can tune how efficient parallel replication will be by adjusting the binlog group commit settings.

The post How much could you benefit from MySQL 5.6 parallel replication? appeared first on MySQL Performance Blog.

Optimizing PXC Xtrabackup State Snapshot Transfer

State Snapshot Transfer (SST) at a glance

PXC uses a protocol called State Snapshot Transfer to provision a node joining an existing cluster with all the data it needs to synchronize.  This is analogous to cloning a slave in asynchronous replication:  you take a full backup of one node and copy it to the new one, while tracking the replication position of the backup.

PXC automates this process using scriptable SST methods.  The most common of these methods is the xtrabackup-v2 method which is the default in PXC 5.6.  Xtrabackup generally is more favored over other SST methods because it is non-blocking on the Donor node (the node contributing the backup).

The basic flow of this method is:

  • The Joiner:
    • joins the cluster
    • Learns it needs a full SST and clobbers its local datadir (the SST will replace it)
    • prepares for a state transfer by opening a socat on port 4444 (by default)
    • The socat pipes the incoming files into the datadir/.sst directory
  • The Donor:
    • is picked by the cluster (could be configured or be based on WAN segments)
    • starts a streaming Xtrabackup and pipes the output of that via socat to the Joiner on port 4444.
    • Upon finishing its backup, sends an indication of this and the final Galera GTID of the backup is sent to the Joiner
  • The Joiner:
    • Records all changes from the Donor’s backup’s GTID forward in its gcache (and overflow pages, this is limited by available disk space)
    • runs the –apply-log phase of Xtrabackup on the donor
    • Moves the datadir/.sst directory contents into the datadir
    • Starts mysqld
    • Applies all the transactions it needs (Joining and Joined states just like IST does it)
    • Moves to the ‘Synced’ state and is done.

There are a lot of moving pieces here, and nothing is really tuned by default.  On larger clusters, SST can be quite scary because it may take hours or even days.  Any failure can mean starting over again from the start.

This blog will concentrate on some ways to make a good dent in the time SST can take.  Many of these methods are trade-offs and may not apply to your situations.  Further, there may be other ways I haven’t thought of to speed things up, please share what you’ve found that works!

The Environment

I am testing SST on a PXC 5.6.24 cluster in AWS.  The nodes are c3.4xlarge and the datadirs are RAID-0 over the two ephemeral SSD drives in that instance type.  These instances are all in the same region.

My simulated application is using only node1 in the cluster and is sysbench OLTP with 200 tables with 1M rows each.  This comes out to just under 50G of data.  The test application runs on a separate server with 32 threads.

The PXC cluster itself is tuned to best practices for Innodb and Galera performance

Baseline

In my first test the cluster is a single member (receiving workload) and I am  joining node2.  This configuration is untuned for SST.  I measured the time from when mysqld started on node2 until it entered the Synced state (i.e., fully caught up).  In the log, it looked like this:

150724 15:59:24 mysqld_safe Starting mysqld daemon with databases from /var/lib/mysql
... lots of other output ...
2015-07-24 16:48:39 31084 [Note] WSREP: Shifting JOINED -> SYNCED (TO: 4647341)

Doing some math on the above, we find that the SST took 51 minutes to complete.

–use-memory

One of the first things I noticed was that the –apply-log step on the Joiner was very slow.  Anyone who uses Xtrabackup a lot will know that –apply-log will be a lot faster if you give it some extra RAM to use while making the backup consistent via the –use-memory option.  We can set this in our my.cnf like this:

[sst]
inno-apply-opts="--use-memory=20G"

The [sst] section is a special one understood only by the xtrabackup-v2 script.  inno-apply-opts allows me to specify arguments to innobackupex when it runs.

Note that this change only applies to the Joiner (i.e., you don’t have to put it on all your nodes and restart them to take advantage of it).

This change immediately makes a huge improvement to our above scenario (node2 joining node1 under load) and the SST now takes just over 30 minutes.

wsrep_slave_threads

Another slow part of getting to Synced is how long it takes to apply transactions up to realtime after the backup is restored and in place on the Joiner.  We can improve this throughput by increasing the number of apply threads on the Joiner to make better use of the CPU.  Prior to this wsrep_slave_threads was set to 1, but if I increase this to 32  (there are 16 cores on this instance type) my SST now takes 25m 32s

Compression

xtrabackup-v2 supports adding a compression process into the datastream.  On the Donor it compresses and on the Joiner it decompresses.  This allows you to trade CPU for transfer speed.  If your bottleneck turns out to be network transport and you have spare CPU, this can help a lot.

Further, I can use pigz instead of gzip to get parallel compression, but theoretically any compression utilization can work as long as it can compress and decompress standard input to standard output.  I install the ‘pigz’ package on all my nodes and change my my.cnf like this:

[sst]
inno-apply-opts="--use-memory=20G"
compressor="pigz"
decompressor="pigz -d"

Both the Joiner and the Donor must have the respective decompressor and compressor settings or the SST will fail with a vague error message (not actually having pigz installed will do the same thing).

By adding compression, my SST is down to 21 minutes, but there’s a catch.  My application performance starts to take a serious nose-dive during this test.  Pigz is consuming most of the CPU on my Donor, which is also my primary application node.  This may or may not hurt your application workload in the same way, but this emphasizes the importance of understanding (and measuring) the performance impact of SST has on your Donor nodes.

Dedicated donor

To alleviate the problem with the application, I now leave node2 up and spin up node3.  Since I’m expecting node2 to normally not be receiving application traffic directly, I can configure node3 to prefer node2 as its donor like this:

[mysqld]
...
wsrep_sst_donor = node2,

When node3 starts, this setting instructs the cluster that node3 is the preferred donor, but if that’s not available, pick something else (that’s what the trailing comma means).

Donor nodes are permitted to fall behind in replication apply as needed without sending flow control.  Sending application traffic to such a node may see an increase in the amount of stale data as well as certification failures for writes (not to mention the performance issues we saw above with node1).  Since node2 is not getting application traffic, moving into the Donor state and doing an expensive SST with pigz compression should be relatively safe for the rest of the cluster (in this case, node1).

Even if you don’t have a dedicated donor, if you use a load balancer of some kind in front of your cluster, you may elect to consider Donor nodes as failing their health checks so application traffic is diverted during any state transfer.

When I brought up node3, with node2 as the donor, the SST time dropped to 18m 33s

Conclusion

Each of these tunings helped the SST speed, though the later adjustments maybe had less of a direct impact.  Depending on your workload, database size, network and CPU available, your mileage may of course vary.  Your tunings should vary accordingly, but also realize you may actually want to limit (and not increase) the speed of state transfers in some cases to avoid other problems. For example, I’ve seen several clusters get unstable during SST and the only explanation for this is the amount of network bandwidth consumed by the state transfer preventing the actual Galera communication between the nodes. Be sure to consider the overall state of production when tuning your SSTs.

The post Optimizing PXC Xtrabackup State Snapshot Transfer appeared first on MySQL Performance Blog.

Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A

Thank you for attending my July 22 webinar titled “Advanced Query Tuning in MySQL 5.6 and 5.7” (my slides and a replay available here). As promised here is the list of questions and my answers (thank you for your great questions).

Q: Here is the explain example:

mysql> explain extended select id, site_id from test_index_id where site_id=1
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_index_id
         type: ref
possible_keys: key_site_id
          key: key_site_id
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index

why is site_id a covered index for the query, given the fact that a) we are selecting “id”, b) key_site_id only contains site_id?

As the table is InnoDB, all secondary keys will always contain primary key (“id”); in this case the secondary index will contain all needed information to satisfy the above query and key_site_id will be “covered index”

Q: Applications change over time. Do you suggest doing a periodic analysis of indexes that are being used and drop the ones that are not? If yes, any suggestions as to tackle that?

Yes, that is a good idea. Usually it can be done easily with Percona toolkit or Performance_schema in MySQL 5.6

  1. Enable slow query log and log every query, then use Pt-index-usage tool
  2. Or use the following query (as suggested by FromDual blog post):
SELECT object_schema, object_name, index_name
  FROM performance_schema.table_io_waits_summary_by_index_usage
 WHERE index_name IS NOT NULL
   AND count_star = 0
 ORDER BY object_schema, object_name;

Q: Does the duplicate index is found on 5.6/5.7 will that causes an performance impact to the db while querying?

Duplicate keys can have negative impact on selects:

  1. MySQL can get confused and choose a wrong index
  2. Total index size can grow, which can cause MySQL to run out of RAM

Q: What is the suggested method to measure performance on queries (other than the slow query log) so as to know where to create indexes?

Slow query log is most common method. In MySQL 5.6 you can also use Performance Schema and use events_statements_summary_by_digest table.

Q: I’m not sure if this was covered in the webinar but… are there any best-practices for fulltext indexes?

That was not covered in this webinar, however, I’ve done a number of presentations regarding Full Text Indexes. For example: Creating Geo Enabled Applications with MySQL 5.6

Q: What would be the limit on index size or number of indexes you can defined per table?

There are no limits on Index size on disk, however, it will be good (performance wise) to have active indexes fit in RAM.

In InnoDB there are a number of index limitations, i.e. a table can contain a maximum of 64 secondary indexes.

Q:  If a table has two columns you would like to sum, can you have that sum indexed as a calculated index? To add to that, can that calculated index have “case when”?

Just to clarify, this is only a feature of MySQL 5.7 (not released yet).

Yes, it is documented now:

CREATE TABLE triangle (
  sidea DOUBLE,
  sideb DOUBLE,
  sidec DOUBLE AS (SQRT(sidea * sidea + sideb * sideb))
);

Q: I have noticed that you created indexes on columns like DayOfTheWeek with very low cardinality. Shouldn’t that be a bad practice normally?

Yes, you are right! Unless, you are doing queries like “select count(*) from … where DayOfTheWeek = 7” those indexes may not be very useful.

Q: I saw an article that if you don’t specify a primary key upfront mysql / innodb creates one in the background (hidden). Is it different from a primary key itself, if most of the where fields that are used not in the primary / semi primary key? And is there a way to identify the tables with the hidden primary key indexes?

The “hidden” primary key will be 6 bytes, which will also be appended (duplicated) to all secondary keys. You can create an INT primary key auto_increment, which will be smaller (if you do not plan to store more than 4 billion rows). In addition, you will not be able to use the hidden primary key in your queries.

The following query (against information_schema) can be used to find all tables without declared primary key (with “hidden” primary key):

SELECT tables.table_schema, tables.table_name, tables.table_rows
FROM information_schema.tables
LEFT JOIN (
  SELECT table_schema, table_name
  FROM information_schema.statistics
  GROUP BY table_schema, table_name, index_name
  HAVING
    SUM(
      CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END
    ) = COUNT(*)
) puks
ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name
WHERE puks.table_name IS NULL
AND tables.table_type = 'BASE TABLE' AND engine='InnoDB'

You may also use mysql.innodb_index_stats table to find rows with the hidden primary key:

Example:

mysql> select * from mysql.innodb_index_stats;
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_diff_pfx01 | 96         | 1           | DB_ROW_ID                         |
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_leaf_pages | 1          | NULL        | Number of leaf pages in the index |
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | size         | 1          | NULL        | Number of pages in the index      |
+---------------+------------+-----------------+---------------------+--------------+------------+-------------+-----------------------------------+

Q: You are using the alter table to create index, but how does mysql sort the data for creating the index? isn’t it uses temp table for that?

That is a very good question: the behavior of the “alter table … add index” has changed over time. As documented in Overview of Online DDL:

Historically, many DDL operations on InnoDB tables were expensive. Many ALTER TABLE operations worked by creating a new, empty table defined with the requested table options and indexes, then copying the existing rows to the new table one-by-one, updating the indexes as the rows were inserted. After all rows from the original table were copied, the old table was dropped and the copy was renamed with the name of the original table.

MySQL 5.5, and MySQL 5.1 with the InnoDB Plugin, optimized CREATE INDEX and DROP INDEX to avoid the table-copying behavior. That feature was known as Fast Index Creation

When MySQL uses “Fast Index Creation” operation it will create a set of temporary files in MySQL’s tmpdir:

To add a secondary index to an existing table, InnoDB scans the table, and sorts the rows using memory buffers and temporary files in order by the values of the secondary index key columns. The B-tree is then built in key-value order, which is more efficient than inserting rows into an index in random order.

Q: How good is InnoDB deadlocks on 5.7 comparing to 5.6 version. Is that based on parameters setup?

InnoDB deadlocks discussion is outside of the scope of this presentation. Valerii Kravchuk and Nilnandan Joshi did an excellent talk at Percona Live 2015 (slides available): Understanding Innodb Locks and Deadlocks

Q: What is the performance impact of generating a virtual column for a table having 66 Million records and generating the index. And how would you go about it? Do you have any suggestions on how to re organize indexes on the physical disk?

As MySQL 5.7 is not released yet, behavior of the virtual columns may change.  The main question here is: will it be online operations to a) add a virtual column (as this is only metadata change it should be very light operation anyway). b) add index on that virtual column. In the labs released it was not online, however this can change.

Thank you again for attending.

The post Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A appeared first on MySQL Performance Blog.

Percona Toolkit 2.2.15 is now available

Percona ToolkitPercona is pleased to announce the availability of Percona Toolkit 2.2.15.  Released August 28, 2015. Percona Toolkit is a collection of advanced command-line tools to perform a variety of MySQL server and system tasks that are too difficult or complex for DBAs to perform manually. Percona Toolkit, like all Percona software, is free and open source.

This release is the current GA (Generally Available) stable release in the 2.2 series. It includes multiple bug fixes as well as continued preparation for MySQL 5.7 compatibility. Full details are below. Downloads are available here and from the Percona Software Repositories.

New Features:

  • Added --max-flow-ctl option to pt-online-schema-change and pt-archiver with a value set in percent. When a Percona XtraDB Cluster node is very loaded, it sends flow control signals to the other nodes to stop sending transactions in order to catch up. When the average value of time spent in this state (in percent) exceeds the maximum provided in the option, the tool pauses until it falls below again.Default is no flow control checking.
  • Added the --sleep option for pt-online-schema-change to avoid performance problems. The option accepts float values in seconds.
  • Implemented ability to specify --check-slave-lag multiple times for pt-archiver. The following example enables lag checks for two slaves:
    pt-archiver --no-delete --where '1=1' --source h=oltp_server,D=test,t=tbl --dest h=olap_server --check-slave-lag h=slave1 --check-slave-lag h=slave2 --limit 1000 --commit-each
  • Added the --rds option to pt-kill, which makes the tool use Amazon RDS procedure calls instead of the standard MySQL kill command.

Bugs Fixed:

  • Fixed bug 1042727: pt-table-checksum doesn’t reconnect the slave $dbh
    Before, the tool would die if any slave connection was lost. Now the tool waits forever for slaves.
  • Fixed bug 1056507: pt-archiver --check-slave-lag agressiveness
    The tool now checks replication lag every 100 rows instead of every row, which significantly improves efficiency.
  • Fixed bug 1215587: Adding underscores to constraints when using pt-online-schema-change can create issues with constraint name length
    Before, multiple schema changes lead to underscores stacking up on the name of the constraint until it reached the 64 character limit. Now there is a limit of two underscores in the prefix, then the tool alternately removes or adds one underscore, attempting to make the name unique.
  • Fixed bug 1277049pt-online-schema-change can’t connect with comma in password
    For all tools, documented that commas in passwords provided on the command line must be escaped.
  • Fixed bug 1441928: Unlimited chunk size when using pt-online-schema-change with --chunk-size-limit=0 inhibits checksumming of single-nibble tables
    When comparing table size with the slave table, the tool now ignores --chunk-size-limit if it is set to zero to avoid multiplying by zero.
  • Fixed bug 1443763: Update documentation and/or implentation of pt-archiver --check-interval
    Fixed the documentation for --check-interval to reflect its correct behavior.
  • Fixed bug 1449226: pt-archiver dies with “MySQL server has gone away” when --innodb_kill_idle_transaction is set to a low value and --check-slave-lag is enabled
    The tool now sends a dummy SQL query to avoid timing out.
  • Fixed bug 1446928: pt-online-schema-change not reporting meaningful errors
    The tool now produces meaningful errors based on text from MySQL errors.
  • Fixed bug 1450499: ReadKeyMini causes pt-online-schema-change session to lock under some circumstances
    Removed ReadKeyMini, because it is no longer necessary.
  • Fixed bug 1452914: --purge and --no-delete are mutually exclusive, but still allowed to be specified together by pt-archiver
    The tool now issues an error when --purge and --no-delete are specified together.
  • Fixed bug 1455486: pt-mysql-summary is missing the --ask-pass option
    Added the --ask-pass option to the tool.
  • Fixed bug 1457573: pt-mysql-summary fails to download pt-diskstats pt-pmp pt-mext pt-align
    Added the -L option to curl and changed download address to use HTTPS.
  • Fixed bug 1462904: pt-duplicate-key-checker doesn’t support triple quote in column name
    Updated TableParser module to handle literal backticks.
  • Fixed bug 1488600: pt-stalk doesn’t check TokuDB status
    Implemented status collection similar to how it is performed for InnoDB.
  • Fixed bug 1488611: various testing bugs related to newer Perl versions

Details of the release can be found in the release notes and the 2.2.15 milestone on Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.15 is now available appeared first on MySQL Performance Blog.

High-load clusters and desynchronized nodes on Percona XtraDB Cluster

There can be a lot of confusion and lack of planning in Percona XtraDB Clusters in regards to nodes becoming desynchronized for various reasons.  This can happen a few ways:

When I say “desynchronized” I mean a node that is permitted to build up a potentially large wsrep_local_recv_queue while some operation is happening.  For example a node taking a backup would set wsrep_desync=ON during the backup and potentially fall behind replication some amount.

Some of these operations may completely block Galera from applying transactions, while others may simply increase load on the server enough that it falls behind and applies at a reduced rate.

In all the cases above, flow control is NOT used while the node cannot apply transactions, but it MAY be used while the node is recovering from the operation.  For an example of this, see my last blog about IST.

If a cluster is fairly busy, then the flow control that CAN happen when the above operations catch up MAY be detrimental to performance.

Example setup

Let us take my typical 3 node cluster with workload on node1.  We are taking a blocking backup of some kind on node3 so we are executing the following steps:

  1. node3> set global wsrep_desync=ON;
  2. Node3’s “backup” starts, this starts with FLUSH TABLES WITH READ LOCK;
  3. Galera is paused on node3 and the wsrep_local_recv_queue grows some amount
  4. Node3’s “backup” finishes, finishing with UNLOCK TABLES;
  5. node3> set global wsrep_desync=OFF;

During the backup

This includes up through step 3 above.  My node1 is unaffected by the backup on node3, I can see it averaging 5-6k writesets(transactions) per second which it did before we began:

Screen Shot 2015-08-19 at 2.38.34 PM

 

node2 is also unaffected:

Screen Shot 2015-08-19 at 2.38.50 PM

but node3 is not applying and its queue is building up:

Screen Shot 2015-08-19 at 2.39.04 PM

Unlock tables, still wsrep_desync=ON

Let’s examine briefly what happens when node3 is permitted to start applying, but wsrep_desync stays enabled:

Screen Shot 2015-08-19 at 2.42.16 PM

node1’s performance is pretty much the same, node3 is not using flow control yet. However, there is a problem:

Screen Shot 2015-08-19 at 2.43.13 PM

It’s hard to notice, but node3 is NOT catching up, instead it is falling further behind!  We have potentially created a situation where node3 may never catch up.

The PXC nodes were close enough to the red-line of performance that node3 can only apply just about as fast (and somewhat slower until it heats up a bit) as new transactions are coming into node1.

This represents a serious concern in PXC capacity planning:

Nodes do not only need to be fast enough to handle normal workload, but also to catch up after maintenance operations or failures cause them to fall behind.

Experienced MySQL DBA’s will realize this isn’t all that different than Master/Slave replication.

Flow Control as a way to recovery

So here’s the trick:  if we turn off wsrep_desync on node3 now, node3 will use flow control if and only if the incoming replication exceeds node3’s apply rate.  This gives node3 a good chance of catching up, but the tradeoff is reducing write throughput of the cluster.  Let’s see what this looks like in context with all of our steps.  wsrep_desync is turned off at the peak of the replication queue size on node3, around 12:20PM:

Screen Shot 2015-08-19 at 2.47.12 PM

Screen Shot 2015-08-19 at 2.48.07 PM

So at the moment node3 starts utilizing flow control to prevent falling further behind, our write throughput (in this specific environment and workload) is reduced by approximately 1/3rd (YMMV).   The cluster will remain in this state until node3 catches up and returns to the ‘Synced’ state.  This catchup is still happening as I write this post, almost 4 hours after it started and will likely take another hour or two to complete.

I can see a more realtime representation of this by using myq_status on node1, summarizing every minute:

[root@node1 ~]# myq_status -i 1m wsrep
mycluster / node1 (idx: 1) / Galera 3.11(ra0189ab)
         Cluster  Node       Outbound      Inbound       FlowC     Conflct Gcache     Appl
    time P cnf  # stat laten msgs data que msgs data que pause snt lcf bfa   ist  idx  %ef
19:58:47 P   5  3 Sync 0.9ms 3128 2.0M   0   27 213b   0 25.4s   0   0   0 3003k  16k  62%
19:59:47 P   5  3 Sync 1.1ms 3200 2.1M   0   31 248b   0 18.8s   0   0   0 3003k  16k  62%
20:00:47 P   5  3 Sync 0.9ms 3378 2.2M  32   27 217b   0 26.0s   0   0   0 3003k  16k  62%
20:01:47 P   5  3 Sync 0.9ms 3662 2.4M  32   33 266b   0 18.9s   0   0   0 3003k  16k  62%
20:02:47 P   5  3 Sync 0.9ms 3340 2.2M  32   27 215b   0 27.2s   0   0   0 3003k  16k  62%
20:03:47 P   5  3 Sync 0.9ms 3193 2.1M   0   27 215b   0 25.6s   0   0   0 3003k  16k  62%
20:04:47 P   5  3 Sync 0.9ms 3009 1.9M  12   28 224b   0 22.8s   0   0   0 3003k  16k  62%
20:05:47 P   5  3 Sync 0.9ms 3437 2.2M   0   27 218b   0 23.9s   0   0   0 3003k  16k  62%
20:06:47 P   5  3 Sync 0.9ms 3319 2.1M   7   28 220b   0 24.2s   0   0   0 3003k  16k  62%
20:07:47 P   5  3 Sync 1.0ms 3388 2.2M  16   31 251b   0 22.6s   0   0   0 3003k  16k  62%
20:08:47 P   5  3 Sync 1.1ms 3695 2.4M  19   39 312b   0 13.9s   0   0   0 3003k  16k  62%
20:09:47 P   5  3 Sync 0.9ms 3293 2.1M   0   26 211b   0 26.2s   0   0   0 3003k  16k  62%

This reports around 20-25 seconds of flow control every minute, which is consistent with that ~1/3rd of performance reduction we see in the graphs above.

Watching node3 the same way proves it is sending the flow control (FlowC snt):

mycluster / node3 (idx: 2) / Galera 3.11(ra0189ab)
         Cluster  Node       Outbound      Inbound       FlowC     Conflct Gcache     Appl
    time P cnf  # stat laten msgs data que msgs data que pause snt lcf bfa   ist  idx  %ef
17:38:09 P   5  3 Dono 0.8ms    0   0b   0 4434 2.8M 16m 25.2s  31   0   0 18634  16k  80%
17:39:09 P   5  3 Dono 1.3ms    0   0b   1 5040 3.2M 16m 22.1s  29   0   0 37497  16k  80%
17:40:09 P   5  3 Dono 1.4ms    0   0b   0 4506 2.9M 16m 21.0s  31   0   0 16674  16k  80%
17:41:09 P   5  3 Dono 0.9ms    0   0b   0 5274 3.4M 16m 16.4s  27   0   0 22134  16k  80%
17:42:09 P   5  3 Dono 0.9ms    0   0b   0 4826 3.1M 16m 19.8s  26   0   0 16386  16k  80%
17:43:09 P   5  3 Jned 0.9ms    0   0b   0 4957 3.2M 16m 18.7s  28   0   0 83677  16k  80%
17:44:09 P   5  3 Jned 0.9ms    0   0b   0 3693 2.4M 16m 27.2s  30   0   0  131k  16k  80%
17:45:09 P   5  3 Jned 0.9ms    0   0b   0 4151 2.7M 16m 26.3s  34   0   0  185k  16k  80%
17:46:09 P   5  3 Jned 1.5ms    0   0b   0 4420 2.8M 16m 25.0s  30   0   0  245k  16k  80%
17:47:09 P   5  3 Jned 1.3ms    0   0b   1 4806 3.1M 16m 21.0s  27   0   0  310k  16k  80%

There are a lot of flow control messages (around 30) per minute.  This is a lot of ON/OFF toggles of flow control where writes are briefly delayed rather than a steady “you can’t write” for 20 seconds straight.

It also interestingly spends a long time in the Donor/Desynced state (even though wsrep_desync was turned OFF hours before) and then moves to the Joined state (this has the same meaning as during an IST).

Does it matter?

As always, it depends.

If these are web requests and suddenly the database can only handle ~66% of the traffic, that’s likely a problem, but maybe it just slows down the website somewhat.  I want to emphasize that WRITES are what is affected here.  Reads on any and all nodes should be normal (though you probably don’t want to read from node3 since it is so far behind).

If this were some queue processing that had reduced throughput, I’d expect it to possibly catch up later

This can only be answered for your application, but the takeaways for me are:

  • Don’t underestimate your capacity requirements
  • Being at the redline normally means you are well past the redline for abnormal events.
  • Plan for maintenance and failure recoveries
  • Where possible, build queuing into your workflows so diminished throughput in your architecture doesn’t generate failures.

Happy clustering!

Graphs in this post courtesy of VividCortex.

The post High-load clusters and desynchronized nodes on Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Percona Live Europe 2015! Call for speakers; registration open

Percona Live Europe 2015! Call for speakers; registration now openPercona Live is moving from London to Amsterdam this year and the event is also expanding to three full days. Percona Live Europe 2015, September 21-23, will be at the Mövenpick Hotel Amsterdam City Centre. The call for speakers and Super Saver registration are now open. Hurry though because the deadline for submitting a speaking proposal is June 21st and Super Saver registration ends July 5th!

This year’s conference will feature one day of tutorials and two days of keynote talks and breakout sessions related to MySQL, NoSQL and Data in the Cloud. You’ll get briefed on the hottest topics, learn about operating a high-performing deployment and hear from top-industry leaders describe the future of the ecosystem – encompassing MySQL, MariaDB, Percona Server, MongoDB (and more). Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

Have something to say? Why not lead a breakout session or a tutorial?

Breakout sessions are 50 minutes including a Q&A. Tutorial sessions focus on an immediate and practical application of in-depth MySQL and NoSQL knowledge. Tutorial speakers should assume that attendees will have laptops to work through detailed and potentially hands-on presentations. Tutorials are typically three hours long including a Q&A, however, if you have content for a full day, submissions for 6-hour-long tutorials are also being accepted. If your tutorial or breakout session is approved, you’ll receive a complimentary full-conference pass.

Huge thanks to our Conference Committee!

There would be no Percona Live without the hard work of our conference committees. Meet this year’s Percona Live Europe 2015 Conference Committee – a dedicated group of experts in MySQL, NoSQL and Data in the Cloud:

  • Erik Beebe – Founder and CTO, ObjectRocket
  • Luis Motta Campos – Database Administrator, ebay Classifieds Group
  • Colin Charles – Chief Evangelist, MariaDB
  • César Trigo Esteban – Development Director, Gigigo
  • Kenny Gorman – Chief Technologist; Data. Office of the CTO, Rackspace
  • Amrith Kumar – Founder and CTO, Tesora
  • Giuseppe Maxia – Quality Assurance Architect, VMWare
  • Shlomi Noach – Senior Systems Engineer, Booking.com
  • Konstantin Osipov – Engineering Manager, Mail.Ru
  • Morgan Tocker – MySQL Community Manager, Oracle
  • Art van Scheppingen – Head of Database Engineering, Spil Games
  • Charity Majors- Production Engineering Manager, Facebook
  • Peter Zaitsev – Co-founder and CEO, Percona

Sponsorships

Sponsorship opportunities for Percona Live Europe 2015 are now available. Sponsors become part of a dynamic and fast-growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. This year’s conference will feature expanded accommodations and turnkey kiosks.

Planning to attend?

Super Saver registration discounts for Percona Live Europe 2015 are available through July 5th (at 11:30 p.m. CEST). Visit the Percona Live Europe 2015 website for more information about the conference. Interested community members can also register to receive email updates about Percona Live Europe 2015.

Percona has also negotiated a special hotel rate at the Mövenpick Hotel Amsterdam City Centre. If you book your hotel before July 6th your delicious breakfast is included.

I hope to see you in Amsterdam!

The post Percona Live Europe 2015! Call for speakers; registration open appeared first on MySQL Performance Blog.