Featured Talk: The Future of Replication is Today: New Features in Practice

In the past years, both MySQL 5.6, MySQL 5.7 and MariaDB 10 have been successful implementing new features. For many DBAs, the “old way” of replicating data is comfortable so taking the action to implement these new features seems like a momentous leap rather then a simple step. But perhaps it isn’t that complicated…

Giuseppe Maxia, a Quality Assurance Architect at VMware and loyal member of the Percona Live Confepercona-2015DSC_4112rence Committee will be presenting “The Future of Replication is Today: New Features in Practice” at the Percona Live Data Performance Conference this September in Amsterdam.
Percona’s Community Manager, Tom Diederich had an opportunity to catch up with Giuseppe last week and get an in-depth look at some of the items Giuseppe will be covering in his talk in addition to getting his take on some of the hot sessions to hit while at the conference.  This is how it went:

(Hint: Read to the end to find a special discount code) 


Tom: Your talk is titled, “The Future of Replication is today: new features in practice.” What are the top 3 areas in which replication options have improved in MySQL 5.6, MySQL 5.7, and MariaDB 10?
Giuseppe: Replication has been stagnant for over 10 years. Before MySQL 5.6, the only important change in the technology was the introduction of row-based replication in 2008. After that, we had to wait till 2013 to see global transaction identifiers in MySQL 5.6, followed by the same feature, with different implementation in 2014 with MariaDB 10. GTID has been complemented, in both flavors, with crash-safe replication tables, which is a feature that guarantees a reliable resume of replication after a server failure. There is also the parallel applier, a minor feature that has been implemented in both MySQL 5.6 and MariaDB, and improved in latest versions, although it seems to lack proper support for monitoring. The last feature that was introduced in MySQL 5.6 and MariaDB 10 is multi-source replication, i.e. the ability of replicating from multiple masters to a single slave. In both editions, the implementation is quite simple, and not so different from what DBAs are used to do for regular replication.
Tom: For DBAs, how difficult will it be to make the change from the “old way” of replicating data — to stop using the same comfortable features that have been around for several years — and put into practice some of the latest features?
Giuseppe: The adoption of new features can be deceptively simple. For example, GTID in MariaDB comes out of the box and its adoption could be as easy as running a backup followed by a restore, but it can produce unpleasant results if you try to combine this feature with multi-source replication without planning ahead. That said, the transition could be simpler than its counterpart in MySQL.
MySQL 5.6 and 5.7 require some reconfiguration to run GTID, and users can face unpleasant failures due to the complexity of the rules applying to this feature. They will need to read the manual thoroughly and test the deployment extensively before trusting an upgrade in production.
For multi-source replication, the difficulties are, in my experience, hidden in the users expectations. When speaking about multi-source (or multi-masters, as it is commonly referred to), many users have the mistaken expectation that they can easily insert anything in multiple masters as if they were doing it in a single server. However, the nature of asynchronous replication and the current implementation of multi-source topologies do not handle conflicts, and this fact will probably surprise and anger the early adopters.
Tom: What is still missing in replication technology? How can MySQL improve?
Giuseppe: There are two areas where the current implementation is lacking. The first one is monitoring data: while new features have been adding up to replication, there is not enough effort made to cover the monitoring needs. The current way of monitoring replication is hard-wired around the original replication feature, and little has been done to give the users a deeper view of what is going on. With the latest releases at our disposal, we can run parallel replication using multiple masters, and yet we have very little visibility on what goes on inside the dozen of threads that the new features can unchain inside a single slave. It’s like driving a F1 racing car with the dashboard of a Ford model-T. MySQL 5.7 has moved a few steps in that direction, with the new replication tables in performance_schema, but it is still a drop in the ocean compared to what we need.
The second area where replication is still too much tied with its past is in heterogeneous replication. While relational databases are still dominating the front-end of the web economy, its back-end is largely being run by different structures, such as Hadoop, MongoDB, Cassandra. Moving data back and forth between the relational storage and its growing siblings has become an urgent need. There have been a few sparks of change in this direction, but nothing that can qualify as promising changes.
Tom: Which other session(s) are you most looking forward to besides your own?
Giuseppe: I am always interested in the sessions that explain and discuss new features. I am most interested in the talks by Oracle engineers, who have been piling up many features in the latest years, and I am sure they have something more up their sleeve that will appear at the conference. I also attend eagerly sessions about complementary tools, which are usually highly educational and often give me more ideas.

Want to read more on the topic? Visit Giuseppe’s blog:

 MySQL Replication Monitoring 101

The Percona Live Data Performance Conference is the premier event for the rich and diverse MySQL, NoSQL and data in the cloud ecosystems in Europe. It is the place to be for the open source community as well as businesses that thrive in the MySQL, NoSQL, cloud, big data and IoT (Internet of Things) marketplaces. Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

This year’s conference will feature one day of tutorials and two days of keynote talks and breakout sessions related to MySQL, NoSQL and Data in the Cloud. Attendees will get briefed on the hottest topics, learn about building and maintaining high-performing deployments and hear from top industry leaders.

The Percona Live Europe Data Performance Conference will be September 21-23 at the Mövenpick Hotel Amsterdam City Centre.

Register using code “FeaturedTalk” and save 20 euros off of registration!

Hope to see you in Amsterdam!

The post Featured Talk: The Future of Replication is Today: New Features in Practice appeared first on MySQL Performance Blog.

How much could you benefit from MySQL 5.6 parallel replication?

I have heard this question quite often: “At busy times, our replicas start lagging quite frequently. We are using N schemas, so which performance boost could we expect from MySQL 5.6 parallel replication?” Here is a quick way to give you a rough estimate of the potential benefit.

General idea

In MySQL 5.6, parallelism is added at the schema level. So in theory, if you have N schemas and if you use N parallel threads, replication could be up to N times faster. This assumes at least 2 things:

  • Replication throughput scales linearly with the number of parallel threads.
  • Writes are evenly distributed across schemas.

Both assumptions are of course not realistic. But it is easy to know the distribution of writes, and that can already give you an idea about how much you could benefit from parallel replication.

Writes are stored in binary logs but it is much easier to work with the slow query log, so we can enable full slow query logging for some time with long_query_time = 0 and then use pt-query-digest to analyze the resulting log file.

An example

I have a test server with 3 schemas, and I’ve run some sysbench load on it to get a decent slow query log file. Once done, I can run this command:

pt-query-digest --filter '$event->{arg} !~ m/^select|^set|^commit|^show|^admin|^rollback|^begin/i' --group-by db --report-format profile slow_query.log > digest.out

and here is the result I get:

# Profile
# Rank Query ID Response time  Calls  R/Call V/M   Item
# ==== ======== ============== ====== ====== ===== ====
#    1 0x       791.6195 52.1% 100028 0.0079  0.70 db3
#    2 0x       525.1231 34.5% 100022 0.0053  0.68 db1
#    3 0x       203.4649 13.4% 100000 0.0020  0.64 db2

In a perfect world, with 3 parallel threads and if each schema would handle 33% of the total write workload, I could expect a 3x performance improvement.

However here we can see in the report that the 3 replication threads will only work simultaneously 25% of the time in the best case (13.4/52.1 = 0.25). We can also expect 2 replication threads to work simultaneously for some part of the workload, but let’s ignore that for clarity.

It means that instead of the theoretical 200% performance improvement (3 parallel threads 100% of the time), we can hardly expect more than a 50% performance improvement (3 parallel threads 25% of the time). And the reality is that the benefit will be much lower than that.


Parallel replication in MySQL 5.6 is a great step forward, however don’t expect too much if your writes are not evenly distributed across all your schemas. The pt-query-digest trick I shared can give you a rough idea whether your workload is a good fit for multi-threaded slaves in 5.6.

I’m expecting much better results for 5.7, partly because parallelism is handled differently, but also because you can tune how efficient parallel replication will be by adjusting the binlog group commit settings.

The post How much could you benefit from MySQL 5.6 parallel replication? appeared first on MySQL Performance Blog.

Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A

Thank you for attending my July 22 webinar titled “Advanced Query Tuning in MySQL 5.6 and 5.7” (my slides and a replay available here). As promised here is the list of questions and my answers (thank you for your great questions).

Q: Here is the explain example:

mysql> explain extended select id, site_id from test_index_id where site_id=1
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: test_index_id
         type: ref
possible_keys: key_site_id
          key: key_site_id
      key_len: 5
          ref: const
         rows: 1
     filtered: 100.00
        Extra: Using where; Using index

why is site_id a covered index for the query, given the fact that a) we are selecting “id”, b) key_site_id only contains site_id?

As the table is InnoDB, all secondary keys will always contain primary key (“id”); in this case the secondary index will contain all needed information to satisfy the above query and key_site_id will be “covered index”

Q: Applications change over time. Do you suggest doing a periodic analysis of indexes that are being used and drop the ones that are not? If yes, any suggestions as to tackle that?

Yes, that is a good idea. Usually it can be done easily with Percona toolkit or Performance_schema in MySQL 5.6

  1. Enable slow query log and log every query, then use Pt-index-usage tool
  2. Or use the following query (as suggested by FromDual blog post):
SELECT object_schema, object_name, index_name
  FROM performance_schema.table_io_waits_summary_by_index_usage
 WHERE index_name IS NOT NULL
   AND count_star = 0
 ORDER BY object_schema, object_name;

Q: Does the duplicate index is found on 5.6/5.7 will that causes an performance impact to the db while querying?

Duplicate keys can have negative impact on selects:

  1. MySQL can get confused and choose a wrong index
  2. Total index size can grow, which can cause MySQL to run out of RAM

Q: What is the suggested method to measure performance on queries (other than the slow query log) so as to know where to create indexes?

Slow query log is most common method. In MySQL 5.6 you can also use Performance Schema and use events_statements_summary_by_digest table.

Q: I’m not sure if this was covered in the webinar but… are there any best-practices for fulltext indexes?

That was not covered in this webinar, however, I’ve done a number of presentations regarding Full Text Indexes. For example: Creating Geo Enabled Applications with MySQL 5.6

Q: What would be the limit on index size or number of indexes you can defined per table?

There are no limits on Index size on disk, however, it will be good (performance wise) to have active indexes fit in RAM.

In InnoDB there are a number of index limitations, i.e. a table can contain a maximum of 64 secondary indexes.

Q:  If a table has two columns you would like to sum, can you have that sum indexed as a calculated index? To add to that, can that calculated index have “case when”?

Just to clarify, this is only a feature of MySQL 5.7 (not released yet).

Yes, it is documented now:

CREATE TABLE triangle (
  sidea DOUBLE,
  sideb DOUBLE,
  sidec DOUBLE AS (SQRT(sidea * sidea + sideb * sideb))

Q: I have noticed that you created indexes on columns like DayOfTheWeek with very low cardinality. Shouldn’t that be a bad practice normally?

Yes, you are right! Unless, you are doing queries like “select count(*) from … where DayOfTheWeek = 7” those indexes may not be very useful.

Q: I saw an article that if you don’t specify a primary key upfront mysql / innodb creates one in the background (hidden). Is it different from a primary key itself, if most of the where fields that are used not in the primary / semi primary key? And is there a way to identify the tables with the hidden primary key indexes?

The “hidden” primary key will be 6 bytes, which will also be appended (duplicated) to all secondary keys. You can create an INT primary key auto_increment, which will be smaller (if you do not plan to store more than 4 billion rows). In addition, you will not be able to use the hidden primary key in your queries.

The following query (against information_schema) can be used to find all tables without declared primary key (with “hidden” primary key):

SELECT tables.table_schema, tables.table_name, tables.table_rows
FROM information_schema.tables
  SELECT table_schema, table_name
  FROM information_schema.statistics
  GROUP BY table_schema, table_name, index_name
      CASE WHEN non_unique = 0 AND nullable != 'YES' THEN 1 ELSE 0 END
    ) = COUNT(*)
) puks
ON tables.table_schema = puks.table_schema AND tables.table_name = puks.table_name
WHERE puks.table_name IS NULL
AND tables.table_type = 'BASE TABLE' AND engine='InnoDB'

You may also use mysql.innodb_index_stats table to find rows with the hidden primary key:


mysql> select * from mysql.innodb_index_stats;
| database_name | table_name | index_name      | last_update         | stat_name    | stat_value | sample_size | stat_description                  |
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_diff_pfx01 | 96         | 1           | DB_ROW_ID                         |
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | n_leaf_pages | 1          | NULL        | Number of leaf pages in the index |
| test          | t1         | GEN_CLUST_INDEX | 2015-08-08 20:48:23 | size         | 1          | NULL        | Number of pages in the index      |

Q: You are using the alter table to create index, but how does mysql sort the data for creating the index? isn’t it uses temp table for that?

That is a very good question: the behavior of the “alter table … add index” has changed over time. As documented in Overview of Online DDL:

Historically, many DDL operations on InnoDB tables were expensive. Many ALTER TABLE operations worked by creating a new, empty table defined with the requested table options and indexes, then copying the existing rows to the new table one-by-one, updating the indexes as the rows were inserted. After all rows from the original table were copied, the old table was dropped and the copy was renamed with the name of the original table.

MySQL 5.5, and MySQL 5.1 with the InnoDB Plugin, optimized CREATE INDEX and DROP INDEX to avoid the table-copying behavior. That feature was known as Fast Index Creation

When MySQL uses “Fast Index Creation” operation it will create a set of temporary files in MySQL’s tmpdir:

To add a secondary index to an existing table, InnoDB scans the table, and sorts the rows using memory buffers and temporary files in order by the values of the secondary index key columns. The B-tree is then built in key-value order, which is more efficient than inserting rows into an index in random order.

Q: How good is InnoDB deadlocks on 5.7 comparing to 5.6 version. Is that based on parameters setup?

InnoDB deadlocks discussion is outside of the scope of this presentation. Valerii Kravchuk and Nilnandan Joshi did an excellent talk at Percona Live 2015 (slides available): Understanding Innodb Locks and Deadlocks

Q: What is the performance impact of generating a virtual column for a table having 66 Million records and generating the index. And how would you go about it? Do you have any suggestions on how to re organize indexes on the physical disk?

As MySQL 5.7 is not released yet, behavior of the virtual columns may change.  The main question here is: will it be online operations to a) add a virtual column (as this is only metadata change it should be very light operation anyway). b) add index on that virtual column. In the labs released it was not online, however this can change.

Thank you again for attending.

The post Advanced Query Tuning in MySQL 5.6 and MySQL 5.7 Webinar: Q&A appeared first on MySQL Performance Blog.

TIMESTAMP Columns, Amazon RDS 5.6, and You

This comes from an issue that I worked on recently, wherein a customer reported that their application was working fine under stock MySQL 5.6 but producing erroneous results when they tried running it on Amazon RDS 5.6. They had a table which, on the working server, contained two TIMESTAMP columns, one which defaulted to CURRENT_TIMESTAMP and the other which defaulted to ’0000-00-00 00:00:00′, like so:

CREATE TABLE mysql56 (
  ts2 TIMESTAMP NOT NULL DEFAULT '0000-00-00 00:00:00',

However, under Amazon RDS, the same table looked like this:


They mentioned that their schema contains TIMESTAMP column definitions without any modifiers for nullability or default values. In other words, they were doing something like this:

    ts1 TIMESTAMP,

It’s a known issue (or change, or difference, whatever we choose to call it) that MySQL is deprecating defaults for TIMESTAMP columns that don’t have any nullability or default-value specifiers; this is covered in the 5.6 documentation. However, the docs also mention that the default value for this setting is OFF – i.e., if you create a table with TIMESTAMP columns without any defaults, it will fill them in for you, similarly to what I’ve described above.

As it turns out, the RDS default for this setting is ON, hence the “NULL DEFAULT NULL” modifiers when creating the table under RDS. We changed the parameter group, restarted the instance (note that this variable is NOT dynamic), and their schema-creation script created the tables in the proper way.

So, what have we learned here?

  • Migrating from standalone MySQL to Amazon RDS sometimes has hidden pitfalls that aren’t always readily apparent. Many times it will “just work” – but sometimes it doesn’t. Percona is, of course, happy to help review your configurations and assist with any Amazon RDS implementation plans you might have.
  • When in doubt, fully-specify your TIMESTAMP columns. If you want them NOT NULL, say so. If you want a default value or an on-updated value, set it. Even the configuration variable explicit_defaults_for_timestamp is deprecated and slated for removal in a future version, so eventually it won’t be possible to get the old pre-5.6 behavior at all.

The post TIMESTAMP Columns, Amazon RDS 5.6, and You appeared first on MySQL Performance Blog.

MySQL 5.6 New Replication Features: Webinar followup Q&A

Percona MySQL webinar Q&A: 5.6 New Replication FeaturesI want to thank all attendees of my webinar, “MySQL 5.6 New Replication Features: Benefits, Challenges and Limitations“. We had questions that I didn’t have the time to answer:

Q: If I run on Amazon’s RDS, do I need to worry about enabling crash-safe slaves, or is that already in place?

A: Crash-safe replication is already configured for read replicas using MySQL 5.6.

Q: How the relay log purge will manage in case of multiple db’s replication running on multiple threads?

A: Same thing as with single-threaded replication: when all event in a relay log file have been executed, the relay log is removed. The only difference is that executing the last event of a file no longer means that all previous events have been executed, because execution gaps can happen with multi-threaded replication. However a checkpoint is performed from time to time (see slave_checkpoint_period setting, default is 300ms) to make sure that no event is left behind. In practice, you should not notice any difference between the purge of relay logs with single-threaded replication and with multi-threaded replication.

Q: How can we make slave only for read purpose similar to AWS RDS MASTER- SLAVE Replication where slave is read only?

A: You can set read_only = 1 on the slave. The slave won’t be strictly read-only because all users with the SUPER privilege will be able to write to the slave anyway. But if you don’t grant the SUPER privilege to your users, the slave will act as a true read-only server. This is what happened on RDS read replicas: you don’t have the SUPER privilege so you can’t write on replicas.

Q: How can binlog position be higher on a slave compared to the master – for position-based replication?

A: Each time you restart MySQL, a new binary log file will be created. So let’s say that we start a master and its slave (the slave being configured with log_slave_updates to write events received by replication in its own binary logs), both binary logs will be mysql-bin.0000001. If we now restart the slave twice, the slave will write in mysql-bin.0000003 while the master will still be writing on mysql-bin.0000001.

Q: Can we change statement-based replication to mixed replication after 1000+ transactions or not? What will be the impact?

A: You can easily switch to mixed replication as the binlog_format setting is a dynamic variable. Mixed replication means using statement-based replication by default and switching to row-based replication only for queries that are not safe for statement-based replication. So it is usually quite safe to switch from statement-based to mixed replication.

Q: server_id and server UUID – what is the difference? Do we need both simultaneously?

A: server_id and server_uuid are both identifiers of each server in a replication topology. server_id is used by replication and server_uuid is used to generate GTIDs (server_uuid has been added in MySQL 5.6). You need both.

Q: How to clarify compatibility of GTID and multi-threaded; is there any plans to fix it by Percona?

A: GTID and multi-threaded replication are compatible. The only concern is that the monitoring tools (SHOW SLAVE STATUS and the mysql.slave_relay_log_info table) are confusing when using both. This could make quite hard to fix replication if you have a replication error. We don’t have any plan to fix that at the moment.

Q: What will happen if GTID is used and a transaction that creates a temporary table is run on the master?

A: When enforce-gtid-consistency is set to ON, temporary tables are only allowed if autocommit = 1. Otherwise this will generate an error.

Q: How could remote binary log be used for backups?

A: It can allow you to mirror the binary logs in real time. However this doesn’t replace backups. You can find an example here.

Q: What is the difference between semi-synchrous replication and gtid replication?

A: By default, replication is asynchronous and doesn’t use GTIDs. Optionally, you can enable semi-sync replication to make sure at least one slave has acknowledged the writes from the master, and you can also enable GTIDs (with or without semi-sync replication) to make replication reconfiguration easier.

The post MySQL 5.6 New Replication Features: Webinar followup Q&A appeared first on MySQL Performance Blog.

InnoDB adaptive flushing in MySQL 5.6: checkpoint age and io capacity

In MySQL 5.6 InnoDB has a dedicated thread (page_cleaner) that’s responsible for performing flushing operations. Page_cleaner performs flushing of the dirty pages from the buffer pool based on two factors:
access pattern  –  the least recently used pages will be flushed by LRU flusher from LRU_list when buffer pool has no free pages anymore;
age – the oldest modified non-flushed pages are part of flush_list structure and will be flushed by flush_list flusher based on several heuristics.

There is a good overview of the page_cleaner and also here you may find some details about flushing in MySQL 5.6. Below I describe several additional aspects of the flush_list flushing that was not really covered yet.

flush_list flushing and checkpoint age

The amount of the aged pages that is possible to keep in the flush_list is limited by the combined size of the innodb log files. So the main purpose of the flush_list flushing is to flush pages from this list with such a rate that will also always allow enough free space in the log files. On the other hand, too aggressive flushing means less write combining, unnecessary load on the I/O subsystem, in the end undoing performance benefits of having larger redo logs.  In MySQL 5.6 the amount of pages to flush is calculated in the InnoDB adaptive routine based on the current checkpoint age with the following formula:

percentage of the IO capacity that should be used for flushing =
        ((srv_max_io_capacity / srv_io_capacity) * (lsn_age_factor * sqrt(lsn_age_factor))) / 7.5;

We modeled that formula in R and found that it’s possible to improve it such a way that the curve becomes more flat and as a result flushing becomes less aggressive. That new formula is enabled in Percona Server 5.6 by default.


flush_list flushing and io_capacity

InnoDB provides two variables that allow the control of the background flushing rate – innodb_io_capacity and innodb_io_capacity_max. There is quite a detailed description for these vars. However there are several things that are not really covered in the documentation:

innodb_io_capacity_max is the most important variable in case of adaptive flushing as only that variable actually limiting the flushing rate. See above formula and charts.

innodb_io_capacity is used for limiting IO operations during merging of the insert buffer and flushing in cases of server inactivity/shutdown.

For practical needs, the above means the following:

– if  the MySQL server is in an active state (serving user requests) you need to adjust innodb_io_capacity_max to increase/decrease flushing rate.
– if the MySQL server is in an idle state or performing shutdown flushing of the pages from flush_list will be limited by innodb_io_capacity value only.

– if change_buffering is ON and server is in active state it will allow to use either 5% of innodb_io_capacity or vary rate from 5% to 55%  if more than 50% of insert buffer size was already used.
– if change_buffering is ON and server is idle it will use 100% of innodb_io_capacity for merge operations

The post InnoDB adaptive flushing in MySQL 5.6: checkpoint age and io capacity appeared first on MySQL Performance Blog.

MySQL 5.6 Configuration Optimization Webinar, Sept. 25

Webinar: MySQL 5.6 Configuration OptimizationThis Wednesday in our next webinar I’ll share how to configure a better-performing MySQL 5.6 server. You’ll lean a practical approach to generating a sensible configuration file that sets what is needed and omits what is not.

Why dedicate an entire webinar to the new configuration settings within MySQL 5.6? Mainly because the default configuration files that come with MySQL 5.6 are not designed for high volume production use, and I’ve seen many MySQL incidents caused by poor configuration. Hopefully my advice will save you the headache of tweaking the variables within MySQL’s configuration files in order to work within your organization’s unique business environment.

And while I’ll be doing most of the talking, I do look forward to your questions during the webinar, titled “MySQL 5.6 Configuration Optimization.” It will begin at 10 a.m. Pacific time and run for about an hour. So please tune in on Wednesday, Sept. 25 at 10 a.m. You can register here to reserve your spot.

I welcome your questions both during the webinar and in the comments section below. See you Wednesday!

The post MySQL 5.6 Configuration Optimization Webinar, Sept. 25 appeared first on MySQL Performance Blog.

MySQL Security: Armoring Your Dolphin

MySQL Security: Armoring Your DolphinMy colleague and teammate Ernie Souhrada will be presenting a webinar on Wednesday, August 21, 2013 at 10 a.m. PDT titled “MySQL Security: Armoring Your Dolphin.”

This is a popular topic with news breaking routinely that yet another Internet company has leaked private data of one form or another. Ernie’s webinar will be a great overview of security MySQL from top to bottom, including changes related to security in the 5.6 release.

Topics to be covered include:

  • Basic security concepts
  • Security above the MySQL layer (network, hardware, OS, etc.)
  • Tips for application design
  • A more secure MySQL configuration
  • Security-related changes in MySQL 5.6

Attendees will leave this presentation knowing where to start when identifying vulnerability in their systems.

Be sure to register for the webinar in advance!

The post MySQL Security: Armoring Your Dolphin appeared first on MySQL Performance Blog.

InnoDB Full-text Search in MySQL 5.6: Part 3, Performance

This is part 3 of a 3 part series covering the new InnoDB full-text search features in MySQL 5.6. To catch up on the previous parts, see part 1 or part 2

Some of you may recall a few months ago that I promised a third part in my InnoDB full-text search (FTS) series, in which I’d actually take a look at the performance of InnoDB FTS in MySQL 5.6 versus traditional MyISAM FTS. I hadn’t planned on quite such a gap between part 2 and part 3, but as they say, better late than never. Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffed web pages) and the other which I call DIR (800K directory records), and we are comparing MyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I’m using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM’s default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

CREATE TABLE dir_test (
  full_name VARCHAR(100),
  details TEXT

The table definition for the SEO data set is:

CREATE TABLE seo_test (
 title VARCHAR(255),

Table Load / Index Creation

First, let’s try loading data and creating our FT indexes in one pass – i.e., we’ll create the FT indexes as part of the original table definition itself. In particular, this means adding “FULLTEXT KEY (full_name, details)” to our DIR tables and adding “FULLTEXT KEY (title, body)” to the SEO tables. We’ll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we’ll load the table, and then we’ll do an ALTER to add the FT indexes. All times in seconds.

EngineData Setone-pass (load)two-pass (load, alter)
MyISAMSEO3.913.96 (0.76, 3.20)
InnoDBSEO3.7777.32 (1.53, 5.79)
MyISAMDIR43.15944.93 (6.99, 37.94)
InnoDBDIR330.7656.99 (12.70, 44.29)

Interesting. For MyISAM, we might say that it really doesn’t make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it’s worth comparing the time required to add a second FT index (this time we will just index each table’s TEXT/MEDIUMTEXT field). While we’re at it, let’s look at the time required to drop the second FT index as well. Again, all times in seconds.

EngineData SetFT Index Create TimeFT Index Drop Time

InnoDB wins this second test all around. I’d attribute InnoDB’s win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB’s win on the drop test isn’t surprising.

Query Performance

Recall the queries that were used in the previous post from this series:

1. SELECT id, title, MATCH(title, body) AGAINST ('arizona business records'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3
2. SELECT id, title, MATCH(title, body) AGAINST ('corporation commission forms'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3 DESC
   LIMIT 5;
3. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson +arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
4. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
5. SELECT id, full_name, MATCH(full_name, details) AGAINST ('"Thomas B Smith"'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 1;

The queries were run consecutively from top to bottom, a total of 10 times each. Here are the results in tabular format:

Query #EngineMin. Execution TimeAvg. Execution TimeMax. Execution Time

Not a lot of variance in execution times for a given query, so that’s good, but InnoDB is always coming back slower than MyISAM. In general, I’m not that surprised that MyISAM tends to be faster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines (e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries #3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in each query state was identical between the InnoDB and MyISAM versions of the query, with one exception.

InnoDB: | Creating sort index | 0.626529 |
MyISAM: | Creating sort index | 0.014588 |

That’s where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn’t really explain why InnoDB is taking so much longer, and here’s where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial “B”, and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and ‘Creating sort index’ didn’t even appear in the query profile:

mysql [localhost] {msandbox} (test): SELECT id, full_name, MATCH(full_name, details) AGAINST
('"Thomas B Smith"' IN BOOLEAN MODE) AS score FROM dir_test_innodb ORDER BY 3 DESC LIMIT 1;
| id    | full_name      | score             |
| 62633 | Thomas B Smith | 32.89915466308594 |
1 row in set (0.31 sec)
mysql [localhost] {msandbox} (test): show profile;
| Status                  | Duration |
| starting                | 0.000090 |
| checking permissions    | 0.000007 |
| Opening tables          | 0.000017 |
| init                    | 0.000034 |
| System lock             | 0.000012 |
| optimizing              | 0.000008 |
| statistics              | 0.000027 |
| preparing               | 0.000012 |
| FULLTEXT initialization | 0.304933 |
| executing               | 0.000008 |
| Sending data            | 0.000684 |
| end                     | 0.000006 |
| query end               | 0.000006 |
| closing tables          | 0.000011 |
| freeing items           | 0.000019 |
| cleaning up             | 0.000003 |

Hm. It’s still slower than MyISAM by quite a bit, but much faster than before. The reason it’s faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then ‘Creating sort index’ returns to the tune of roughly 0.5 to 0.6 seconds, and ‘FULLTEXT initialization’ remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you’re searching for. The time spent in “Creating sort index” doesn’t vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

Three blog posts later, then, what’s my overall take on InnoDB FTS in MySQL 5.6? I don’t think it’s great, but it’s serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can’t modify their existing applications or who just don’t have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

The post InnoDB Full-text Search in MySQL 5.6: Part 3, Performance appeared first on MySQL Performance Blog.

Percona Server 5.6 Webinar follow-up and Q&A

Good news everyone! I recently presented a webinar: Percona Server 5.6: Enterprise Grade MySQL. It was also recorded so you can watch along or view the slide deck. As with all my talks, I am not simply reading the slides so it really is worth to listen to the audio rather than just glance through the slide deck.

There were a number of great questions asked which I’ll answer below:

Q: How does Stewart feel about this version of 5.6 taking into consideration “Stewart’s .20 rule?” (ref 2013 Percona Live Conference).

A: For those who aren’t familiar with it, I have a rule which I call “Stewart’s dot twenty rule” which I’ve posted a few times about on my personal blog. It states: “a piece of software is never really mature until a dot twenty release.” I would say that MySQL 5.6 (and Percona Server 5.6) are both in really good states currently.

I strongly recommend the excellent series of “Fun With Bugs” posts by Valeriy Kravchuk. The latest Fun With Bugs post is: Fun with Bugs #20 – welcome MySQL 5.6.13! and certainly worth a read. I’m rather safe in saying that the first GA release of MySQL 5.6 was by far the best first GA release of any MySQL version ever and subsequent MySQL 5.6 releases have improved upon that. It is quite likely that 5.6 will work perfectly for you today.

If you are really conservative with software upgrades and want as few surprises as possible, then you can of course wait – but I’d certainly recommend kicking the tyres of 5.6 over the next few months and starting to plan a migration.

Q: Any estimate on availability of XtraDB Cluster using 5.6?

A: Since Percona XtraDB Cluster is built upon both Percona Server and Galera it’s only natural to build upon a GA release of Percona Server and a GA release of Galera.

Q: What’s the birds name?

Spike the cockatiel

Spike the cockatiel

A: (Background: at one point during the Webinar you could hear one of our pet birds start to burst into song). I’m glad you asked as it gives me an excellent opportunity to include gratuitous photos of our birds! They’re both Cockatiels. People will often think cockatoo (specifically the Sulphur-Crested Cockatoo)and not cockatiel. A cockatoo is any of the 21 species belonging to the bird family Cacatuidae and the cockatiel is the smallest of the 21 species.

Beaker helping out with our next release

Beaker the cockatiel helping with Percona Server 5.6

We have both a boy (Spike) and a girl (Beaker). Spike is the one who sings (while Beaker, like the muppet, goes meep) and could be heard for a moment during the webinar. Beaker has also been spotted helping with Percona Server 5.6 releases.

Q: The ‘first in Percona Server’ optimizations, did Oracle implement Percona code or write their own?

A: It would be accurate to say that there are changes in MySQL 5.6 that have been inspired by our work, and previously there has been Percona code that has made its way into MySQL (see COPYING.Percona in the MySQL bzr repository). For a multitude of reasons that aren’t worth going into here, it has historically been problematic getting code into MySQL if you didn’t work for the company that owned MySQL. This has been true of MySQL AB, Sun and Oracle and is certainly nothing new or unique to Oracle. What is different now is that things seem to be changing for the better and there is likely to be more cooperation with Oracle going forward.

Q: Has HandlerSocket been cooked into your 5.6 releases yet? Have there been any other improvements on that front?

A: We don’t currently have HandlerSocket in Percona Server 5.6. There has been a very small amount of adoption of HandlerSocket and we’ve taken the approach that we’ll see if the HandlerSocket team ports to 5.6 and if there is adequate demand for HandlerSocket in 5.6. So far, you’re the first person to request it.

Q: What Oracle 5.6 features have not yet been copied or reimplemented in Percona 5.6?

A: Everything in Oracle MySQL 5.6 is in Percona Server 5.6 and has been from the very first Percona Server 5.6 release.

Q: Was innodb fake changes picked up by Oracle?

A: No, at least not yet :)

Q: Has Percona developed or found some solutions for migrating a production Percona server 5.5 to a production Percona server 5.6 without any downtime. Previously I solved this by making a newer version of Percona server as a replica of an older version of either Percona server or mysql db. Then I would point the application servers to the new replica to complete the deployment with a trivial downtime. It seems like this approach is not valid given the new replication design.

A: You can do the old replication trick

Q: Can Xtrabackup 5.6 be used on a system running Percona Server 5.5?

A: Percona XtraBackup 2.1 (the current stable release, which works with MySQL 5.6 and Percona Server 5.6) will also work with MySQL 5.5, Percona Server 5.5, Percona Server 5.1 and MySQL 5.1 running the innodb plugin. There is also support for various MariaDB versions.

Q: question on replication: my database has no partitioned table, multi-thread replication (feature of 5.6) is not going to help. Am I right?

A: Currently the multi-threaded replication slave will partition work up across database schema. It doesn’t matter if your tables are partitioned or not, it matters what database (schema) they’re in. If all your tables are in the same schema, then parallel slave will not currently help.

Q: Is Percona Server 5.6 a drop in replacement for 5.5 or is there an upgrade process? If so, what is involved to roll back to 5.5 if necessary?

A: The upgrade process should be fairly painless and could well be a simple drop-in replacement. It does, of course, depend on what features you may be using along with the type and size of workload. We have a In-Place upgrading from Percona Server 5.5 to Percona Server 5.6 section in our Percona Server 5.6 manual and along with the Changed in Percona Server 5.6 section this should provide a fair amount of insight into what you may expect from the Percona Server side of things. There is also the Upgrading from MySQL 5.5 to 5.6 section of the MySQL manual which is well worth a read.

There is a section in the MySQL manual on downgrading from 5.6 to 5.5 and I don’t think there should be any extra limitations imposed by Percona Server on going from 5.6 back to 5.5. That being said, downgrading is certainly not as well tested as upgrading and I would consider it more of a last resort than something to jump to quickly.

Q: When does production Percona server 5.6 release?

A: Soon. The current Percona Server 5.6 releases are fairly solid and I can certainly recommend trialling them.

Q: Are there any known mysqllib binding issues or deprecations for 5.6?

A: None that I’m aware of.

Q: Is there a white paper or other docs on migratiing from 5.1 Percona server to 5.6 Percona server?

A: Not currently. Generally, the recommended practice is to go through each major version (going through 5.5 before heading to 5.6). There is upgrade documentation for upgrading 5.1 to 5.5 and for 5.5 to 5.6 – and you can certainly run 5.5 for only a few minutes before upgrading to 5.6.

Q: Will you offer training on 5.6?

A: Yes! There is a Moving to MySQL 5.6 training course offered by Percona which covers both MySQL 5.6 and Percona Server 5.6.

Q: I didn’t notice any mention of the improved NUMA support in PS 5.5 (http://www.percona.com/doc/percona-server/5.5/performance/innodb_numa_support.html). Is this carried over to Oracle and/or Percona 5.6?

A: Yes it has made it into Percona Server 5.6. See http://www.percona.com/doc/percona-server/5.6/performance/innodb_numa_support.html for the 5.6 documentation on it. I am not aware of Oracle having implemented it though.

Q: Have you made tests of user_stats overhead compared to performance_schema in 5.6?

A: I’m not aware of any published benchmarks for 5.6 although it would be great to see some.

Q: Does this release support the live table changes?

A: For some types of changes, yes.

Q: Is “Warning: Using a password on the command line interface can be insecure.” error being filtered out in the Percona release?

A: No. It’s not a good idea to provide passwords on the command line.

Q: He also promised a migration blog post ;-)

A: As promised, I am right now going to pester people about writing various posts on migrating from 5.5 to 5.6.

The post Percona Server 5.6 Webinar follow-up and Q&A appeared first on MySQL Performance Blog.