InnoDB Full-text Search in MySQL 5.6: Part 3, Performance

This is part 3 of a 3 part series covering the new InnoDB full-text search features in MySQL 5.6. To catch up on the previous parts, see part 1 or part 2

Some of you may recall a few months ago that I promised a third part in my InnoDB full-text search (FTS) series, in which I’d actually take a look at the performance of InnoDB FTS in MySQL 5.6 versus traditional MyISAM FTS. I hadn’t planned on quite such a gap between part 2 and part 3, but as they say, better late than never. Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffed web pages) and the other which I call DIR (800K directory records), and we are comparing MyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I’m using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM’s default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

CREATE TABLE dir_test (
  id INT UNSIGNED NOT NULL PRIMARY KEY,
  full_name VARCHAR(100),
  details TEXT
);

The table definition for the SEO data set is:

CREATE TABLE seo_test (
 id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 title VARCHAR(255),
 body MEDIUMTEXT
);

Table Load / Index Creation

First, let’s try loading data and creating our FT indexes in one pass – i.e., we’ll create the FT indexes as part of the original table definition itself. In particular, this means adding “FULLTEXT KEY (full_name, details)” to our DIR tables and adding “FULLTEXT KEY (title, body)” to the SEO tables. We’ll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we’ll load the table, and then we’ll do an ALTER to add the FT indexes. All times in seconds.

EngineData Setone-pass (load)two-pass (load, alter)
MyISAMSEO3.913.96 (0.76, 3.20)
InnoDBSEO3.7777.32 (1.53, 5.79)
MyISAMDIR43.15944.93 (6.99, 37.94)
InnoDBDIR330.7656.99 (12.70, 44.29)

Interesting. For MyISAM, we might say that it really doesn’t make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it’s worth comparing the time required to add a second FT index (this time we will just index each table’s TEXT/MEDIUMTEXT field). While we’re at it, let’s look at the time required to drop the second FT index as well. Again, all times in seconds.

EngineData SetFT Index Create TimeFT Index Drop Time
MyISAMSEO6.343.17
InnoDBSEO3.260.01
MyISAMDIR74.9637.82
InnoDBDIR24.590.01

InnoDB wins this second test all around. I’d attribute InnoDB’s win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB’s win on the drop test isn’t surprising.

Query Performance

Recall the queries that were used in the previous post from this series:

1. SELECT id, title, MATCH(title, body) AGAINST ('arizona business records'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3
   DESC LIMIT 5;
2. SELECT id, title, MATCH(title, body) AGAINST ('corporation commission forms'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3 DESC
   LIMIT 5;
3. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson +arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
4. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
5. SELECT id, full_name, MATCH(full_name, details) AGAINST ('"Thomas B Smith"'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 1;

The queries were run consecutively from top to bottom, a total of 10 times each. Here are the results in tabular format:

Query #EngineMin. Execution TimeAvg. Execution TimeMax. Execution Time
1MyISAM0.0079530.0081020.008409
1InnoDB0.0149860.0153310.016243
2MyISAM0.0018150.0018930.001998
2InnoDB0.0019870.0020770.002156
3MyISAM0.0007480.0008170.000871
3InnoDB0.6701100.6765400.684837
4MyISAM0.0011990.0012830.001372
4InnoDB0.0554790.0562560.060985
5MyISAM0.0084710.0085970.008817
5InnoDB0.6243050.6309590.641415

Not a lot of variance in execution times for a given query, so that’s good, but InnoDB is always coming back slower than MyISAM. In general, I’m not that surprised that MyISAM tends to be faster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines (e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries #3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in each query state was identical between the InnoDB and MyISAM versions of the query, with one exception.

InnoDB: | Creating sort index | 0.626529 |
MyISAM: | Creating sort index | 0.014588 |

That’s where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn’t really explain why InnoDB is taking so much longer, and here’s where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial “B”, and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and ‘Creating sort index’ didn’t even appear in the query profile:

mysql [localhost] {msandbox} (test): SELECT id, full_name, MATCH(full_name, details) AGAINST
('"Thomas B Smith"' IN BOOLEAN MODE) AS score FROM dir_test_innodb ORDER BY 3 DESC LIMIT 1;
+-------+----------------+-------------------+
| id    | full_name      | score             |
+-------+----------------+-------------------+
| 62633 | Thomas B Smith | 32.89915466308594 |
+-------+----------------+-------------------+
1 row in set (0.31 sec)
mysql [localhost] {msandbox} (test): show profile;
+-------------------------+----------+
| Status                  | Duration |
+-------------------------+----------+
| starting                | 0.000090 |
| checking permissions    | 0.000007 |
| Opening tables          | 0.000017 |
| init                    | 0.000034 |
| System lock             | 0.000012 |
| optimizing              | 0.000008 |
| statistics              | 0.000027 |
| preparing               | 0.000012 |
| FULLTEXT initialization | 0.304933 |
| executing               | 0.000008 |
| Sending data            | 0.000684 |
| end                     | 0.000006 |
| query end               | 0.000006 |
| closing tables          | 0.000011 |
| freeing items           | 0.000019 |
| cleaning up             | 0.000003 |
+-------------------------+----------+

Hm. It’s still slower than MyISAM by quite a bit, but much faster than before. The reason it’s faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then ‘Creating sort index’ returns to the tune of roughly 0.5 to 0.6 seconds, and ‘FULLTEXT initialization’ remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you’re searching for. The time spent in “Creating sort index” doesn’t vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

Three blog posts later, then, what’s my overall take on InnoDB FTS in MySQL 5.6? I don’t think it’s great, but it’s serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can’t modify their existing applications or who just don’t have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

The post InnoDB Full-text Search in MySQL 5.6: Part 3, Performance appeared first on MySQL Performance Blog.

Schema Design in MongoDB vs Schema Design in MySQL

For people used to relational databases, using NoSQL solutions such as MongoDB brings interesting challenges. One of them is schema design: while in the relational world, normalization is a good way to start, how should we design our collections when creating a new MongoDB application?

Let’s see with a simple example how we would create a data structure for MySQL (or any relational database) and for MongoDB. We will assume in this post that we want to store people information (their name) and the details from their passport (country and validity date).

Relational Design

In the relational world, the basic idea is to try to stick to the 3rd normal form and create two tables (I’ll omit indexes and foreign keys for clarity – MongoDB supports indexes but not foreign keys):

mysql> select * from people;
+----+------------+
| id | name       |
+----+------------+
|  1 | Stephane   |
|  2 | John       |
|  3 | Michael    |
|  4 | Cinderella |
+----+------------+
mysql> select * from passports;
+----+-----------+---------+-------------+
| id | people_id | country | valid_until |
+----+-----------+---------+-------------+
|  4 |         1 | FR      | 2020-01-01  |
|  5 |         2 | US      | 2020-01-01  |
|  6 |         3 | RU      | 2020-01-01  |
+----+-----------+---------+-------------+

One of the good things with such a design is that it’s equally easy to run any query (as long as we don’t consider joins as something difficult to use):

  • Do you want the number of people?
    SELECT count(*) FROM people
  • Do you want to know the validity date of Stephane’s passport?
    SELECT valid_until from passports ps join people pl ON ps.people_id = pl.id WHERE name = 'Stephane'
  • Do you want to know how many people do not have a passport? Run
    SELECT name FROM people pl LEFT JOIN passports ps ON ps.people_id = pl.id WHERE ps.id IS NULL
  • etc

MongoDB design

Now how should we design our collections in MongoDB to make querying easy?

Using the 3rd normal form is of course possible, but that would probably be inefficient as all joins should be done in the application. So out of the 3 queries above, only the query #1 could be easily run. So which other designs could we have?

A first option would be to store everything in the same collection:

> db.people_all.find().pretty()
{
	"_id" : ObjectId("51f7be1cd6189a56c399d3bf"),
	"name" : "Stephane",
	"country" : "FR",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{
	"_id" : ObjectId("51f7be3fd6189a56c399d3c0"),
	"name" : "John",
	"country" : "US",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{
	"_id" : ObjectId("51f7be4dd6189a56c399d3c1"),
	"name" : "Michael",
	"country" : "RU",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{ "_id" : ObjectId("51f7be5cd6189a56c399d3c2"), "name" : "Cinderella" }

By the way, we can see here that MongoDB is schemaless: there is no problem in storing documents that do not have the same structure.

The drawback is that it is no longer clear which attributes belong to the passport, so if you want to get all passport information for Michael, you will need to correctly understand the whole data structure.

A second option would be to embed passport information inside people information – MongoDB supports rich documents:

> db.people_embed.find().pretty()
{
	"_id" : ObjectId("51f7c0048ded44d5ebb83774"),
	"name" : "Stephane",
	"passport" : {
		"country" : "FR",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{
	"_id" : ObjectId("51f7c70e8ded44d5ebb83775"),
	"name" : "John",
	"passport" : {
		"country" : "US",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{
	"_id" : ObjectId("51f7c71b8ded44d5ebb83776"),
	"name" : "Michael",
	"passport" : {
		"country" : "RU",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{ "_id" : ObjectId("51f7c7258ded44d5ebb83777"), "name" : "Cinderella" }

Or we could embed the other way (however this looks a bit dubious as some people may not have a passport like Cinderella in our example):

> db.passports_embed.find().pretty()
{
	"_id" : ObjectId("51f7c7e58ded44d5ebb8377b"),
	"country" : "FR",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "Stephane"
	}
}
{
	"_id" : ObjectId("51f7c7ec8ded44d5ebb8377c"),
	"country" : "US",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "John"
	}
}
{
	"_id" : ObjectId("51f7c7fa8ded44d5ebb8377d"),
	"country" : "RU",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "Michael"
	}
}
{
	"_id" : ObjectId("51f7c8058ded44d5ebb8377e"),
	"person" : {
		"name" : "Cinderella"
	}
}

That’s a lot of options! How can we choose? Here is where you should be aware of a fundamental difference between MongoDB and relational databases when it comes to schema design:

Collections inside MongoDB should be designed with the most frequent access patterns of the application in mind, while in the relational world, you can forget how data will be accessed if your tables are normalized.

So…

  • If you read people information 99% of the time, having 2 separate collections can be a good solution: it avoids keeping in memory data is almost never used (passport information) and when you need to have all information for a given person, it may be acceptable to do the join in the application.
  • Same thing if you want to display the name of people on one screen and the passport information on another screen.
  • But if you want to display all information for a given person, storing everything in the same collection (with embedding or with a flat structure) is likely to be the best solution.

Conclusion

We saw in this post one of the fundamental differences between MySQL and MongoDB when it comes to creating the right data structure for an application: with MongoDB, you need to know the data access pattern of the application. This should not be neglected as creating a wrong schema design is a recipe for disaster: queries will be difficult to write and to optimize, they will be slow and they will sometimes need to be replaced by custom code. All that can lead to low performance and frustration.

The next question is: which way is better? And of course, there is no definite answer: MongoDB fans will say that by making all access patterns equal, normalization make them equally bad, and normalization fans will say that a normalized schema provides good performance for most applications and that you can always denormalize to help a few queries run faster.

The post Schema Design in MongoDB vs Schema Design in MySQL appeared first on MySQL Performance Blog.

Percona celebrates its 7th anniversary by giving to open source ecosystem

Percona celebrates its 7th anniversaryToday we’re celebrating Percona’s 7th anniversary.  A lot has changed in these past 7 years – we have grown from a two-person outfit focused exclusively on consulting to a 100-person company with teammates in 22 different countries and 18 different states, now providing Support, Consulting, RemoteDBA, Server Development and Training services.

We also made our mark in open source software development, creating some of the most popular products for the MySQL ecosystem – Percona Toolkit, Percona Xtrabackup, Percona XtraDB Cluster, Percona Server and others. Additionally, we’re into our second year of hosting the Percona Live conference series for the MySQL community. We have grown to serve over 2,000 customers and I’m proud to say we could do it all in bootstrap mode without attracting outside investors and keeping the company owned by its employees.

So how are we celebrating our anniversary? We decided to celebrate by supporting the open source ecosystem, making donations to a number of open source initiatives that have helped us through all these years. We would not be here without you!

As such we’re supporting:

  • MariaDB Foundation for supporting MariaDB, one of the MySQL alternatives that we fully support at Percona.
  • Free Software Foundation as an organization instrumental to the success of the open source movement.
  • Linux Foundation for supporting Linux, by far the most popular platform among our customers.
  • Debian for creating a foundation for some of the most popular Linux distributions out there.
  • Jenkins for the Continuous Integration server we use for our development projects.
  • OpenSSH for software that helps us to access customer systems securely.
  • Drupal for powering our website as well as the websites of many of our customers.

We’re happy to enjoy the growth that’s allowing us to support other projects in our ecosystem. If you have the chance I encourage you do the same. There is a tremendous amount of work going into open source software, which is made free to use, but it is by far not free to create and maintain.

The post Percona celebrates its 7th anniversary by giving to open source ecosystem appeared first on MySQL Performance Blog.

Advanced MySQL Query Tuning: Webinar followup Q&A

Thanks to all who attended my “MySQL Query Tuning” webinar on July 24.  If you missed it, you can you can download the slides and also watch the recorded video. Thank you for the excellent questions after the webinar as well. Query tuning is a big topic and, due to the limited time, I had to skip some material, especially some of the monitoring. I would like, however, to answer all the questions I did not get into during the webinar session.

Q: Did you reset the query cache before doing your benchmark on your query? 0.00 seconds sounds too good 

A: (This is in response to a couple of slides where the time showed as 0.00). Yes, MySQL was running with query cache disabled. The 0.00 just means that the query was executed in less than 0.004 sec. MySQL does not show the higher precision if you run the query from mysql monitor. There are a couple of ways to get the exact query times:

  • MySQL 5.0 +: Use “profiling” feature: http://dev.mysql.com/doc/refman/5.5/en/show-profile.html
  • MySQL 5.1 +: Enable the slow query log with microsecond presision and log the query. To log all queries in the slow query log you can temporary set:  long_query_time = 0
  • MySQL 5.6: Use the new performance_schema counters

Here is the profile for an example query, the query shows 0.00 seconds:

mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000064 |
| checking permissions | 0.000003 |
| checking permissions | 0.000006 |
| Opening tables | 0.000019 |
| System lock | 0.000011 |
| init | 0.000031 |
| optimizing | 0.000011 |
| statistics | 0.000014 |
| preparing | 0.000011 |
| executing | 0.000002 |
| Sending data | 0.002161 |
| end | 0.000004 |
| query end | 0.000002 |
| closing tables | 0.000007 |
| freeing items | 0.000012 |
| logging slow query | 0.000001 |
| cleaning up | 0.000002 |
+----------------------+----------+

As we can see, sending data is actually 0.002 seconds.

Q: Do you ever see doing a seminar that shows how to leverage parallelization (openCL or CUDA) with databases and the performance differences?

A:  MySQL does not support it right now. Usually openCL / CUDA does not help with the disk-bounded applications like databases. However, some projects in OLAP space can actually utilize openCL/CUDA, for example, Alenka, is a column store that is massively parallel. Scanning, aggregation, sorting, etc are done in a data flow manner via the CUDA processing.

 Q: Is this possible to use this /covered index for order by – A.R/ with join? For example if we want to use where on table A and sort it by column from table B

A: Unfortunately, MySQL does not support that with the covered index.  MySQL will only use the filter on the where condition (to limit the number of rows) + filesort. However, if we have a limit clause, MySQL may be able to use the index for order by and stop after finding N rows, matching the condition. It may not be faster thou (as I showed during the webinar) and you may have to use index hints to tell mysql to use the exact index (may not be the best approach as in some cases the use of this index may not be the best for this case). Example:

mysql> explain select * from City ct join Country cn on (ct.CountryCode = cn.Code) where Continent = 'North America' order by ct.population desc limit 10G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ct
type: index
possible_keys: NULL
key: Population
key_len: 4
ref: NULL
rows: 10
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: cn
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: world.ct.CountryCode
rows: 1
Extra: Using where

As we can see, MySQL will use index and avoid “order by”.

Q: Why are Hash Indexes not available for InnoDB engine ? Any plans to bring Hash indexes.

A: InnoDB use Hash Indexes for so called “Adaptive Hash Index” feature.  InnoDB does not  support hash indexes as a normal table index. We are not aware of the Oracle’s InnoDB team plans to bring this feature in.

Please note: MySQL will allow you to use “using hash” keyword when creating an index on InnoDB table. However, it will create a b-tree index instead.

Q: Will foreign key constraints slow down my queries?

A: It may slow down the queries, as InnoDB will have to

  1. Check the foreign key constraint table
  2. Place a shared lock on the row it will read: 

If a FOREIGN KEY constraint is defined on a table, any insert, update, or delete that requires the constraint condition to be checked sets shared record-level locks on the records that it looks at to check the constraint.InnoDB also sets these locks in the case where the constraint fails. (http://dev.mysql.com/doc/refman/5.5/en/innodb-locks-set.html)

Q: How does use of index vary with the number of columns selected in a select query?

If we are talking about the covered index: if we select a column which is not a part of covered index, mysql will not be able to satisfy the query with index only (“using index” in the explain plan). It may be slower, especially if MySQL will have to select large columns and the data is not cached.

In addition, if we select a text or blob column and MySQL will need to create a temporary table, this temporary table will be created ondisk. I’ve described this scenario during the webinar.

The post Advanced MySQL Query Tuning: Webinar followup Q&A appeared first on MySQL Performance Blog.

John Cesario of Go Daddy on Percona MySQL Training

Senior MySQL DBA John Cesario of Go Daddy spoke at Percona Live in April and shared his views on Percona MySQL Training. Here’s what he had to say:

“Go Daddy is passionate about helping our customers. Percona Innodb training gives us another tool to elevate our customer support to the next level. It gives us the kind of insight that only comes from understanding MySQL at a code level. This helps Go Daddy understand MySQL internals, so we can increase performance and reduce troubleshooting time. The instructors are top-notch and leverage their years of field work, providing curriculum that is both relevant and immediately practical. I highly recommend Percona training for all aspects of MySQL consumption: from developers to operational administrators.”

You can watch John’s presentation here:

John Cesario of Go Daddy speaking about Percona Training at Percona Live 2013

Percona offers public training as well as custom training to customers around the world.

The post John Cesario of Go Daddy on Percona MySQL Training appeared first on MySQL Performance Blog.

Percona Server 5.6 Webinar follow-up and Q&A

Good news everyone! I recently presented a webinar: Percona Server 5.6: Enterprise Grade MySQL. It was also recorded so you can watch along or view the slide deck. As with all my talks, I am not simply reading the slides so it really is worth to listen to the audio rather than just glance through the slide deck.

There were a number of great questions asked which I’ll answer below:

Q: How does Stewart feel about this version of 5.6 taking into consideration “Stewart’s .20 rule?” (ref 2013 Percona Live Conference).

A: For those who aren’t familiar with it, I have a rule which I call “Stewart’s dot twenty rule” which I’ve posted a few times about on my personal blog. It states: “a piece of software is never really mature until a dot twenty release.” I would say that MySQL 5.6 (and Percona Server 5.6) are both in really good states currently.

I strongly recommend the excellent series of “Fun With Bugs” posts by Valeriy Kravchuk. The latest Fun With Bugs post is: Fun with Bugs #20 – welcome MySQL 5.6.13! and certainly worth a read. I’m rather safe in saying that the first GA release of MySQL 5.6 was by far the best first GA release of any MySQL version ever and subsequent MySQL 5.6 releases have improved upon that. It is quite likely that 5.6 will work perfectly for you today.

If you are really conservative with software upgrades and want as few surprises as possible, then you can of course wait – but I’d certainly recommend kicking the tyres of 5.6 over the next few months and starting to plan a migration.

Q: Any estimate on availability of XtraDB Cluster using 5.6?

A: Since Percona XtraDB Cluster is built upon both Percona Server and Galera it’s only natural to build upon a GA release of Percona Server and a GA release of Galera.

Q: What’s the birds name?

Spike the cockatiel

Spike the cockatiel

A: (Background: at one point during the Webinar you could hear one of our pet birds start to burst into song). I’m glad you asked as it gives me an excellent opportunity to include gratuitous photos of our birds! They’re both Cockatiels. People will often think cockatoo (specifically the Sulphur-Crested Cockatoo)and not cockatiel. A cockatoo is any of the 21 species belonging to the bird family Cacatuidae and the cockatiel is the smallest of the 21 species.

Beaker helping out with our next release

Beaker the cockatiel helping with Percona Server 5.6

We have both a boy (Spike) and a girl (Beaker). Spike is the one who sings (while Beaker, like the muppet, goes meep) and could be heard for a moment during the webinar. Beaker has also been spotted helping with Percona Server 5.6 releases.

Q: The ‘first in Percona Server’ optimizations, did Oracle implement Percona code or write their own?

A: It would be accurate to say that there are changes in MySQL 5.6 that have been inspired by our work, and previously there has been Percona code that has made its way into MySQL (see COPYING.Percona in the MySQL bzr repository). For a multitude of reasons that aren’t worth going into here, it has historically been problematic getting code into MySQL if you didn’t work for the company that owned MySQL. This has been true of MySQL AB, Sun and Oracle and is certainly nothing new or unique to Oracle. What is different now is that things seem to be changing for the better and there is likely to be more cooperation with Oracle going forward.

Q: Has HandlerSocket been cooked into your 5.6 releases yet? Have there been any other improvements on that front?

A: We don’t currently have HandlerSocket in Percona Server 5.6. There has been a very small amount of adoption of HandlerSocket and we’ve taken the approach that we’ll see if the HandlerSocket team ports to 5.6 and if there is adequate demand for HandlerSocket in 5.6. So far, you’re the first person to request it.

Q: What Oracle 5.6 features have not yet been copied or reimplemented in Percona 5.6?

A: Everything in Oracle MySQL 5.6 is in Percona Server 5.6 and has been from the very first Percona Server 5.6 release.

Q: Was innodb fake changes picked up by Oracle?

A: No, at least not yet :)

Q: Has Percona developed or found some solutions for migrating a production Percona server 5.5 to a production Percona server 5.6 without any downtime. Previously I solved this by making a newer version of Percona server as a replica of an older version of either Percona server or mysql db. Then I would point the application servers to the new replica to complete the deployment with a trivial downtime. It seems like this approach is not valid given the new replication design.

A: You can do the old replication trick

Q: Can Xtrabackup 5.6 be used on a system running Percona Server 5.5?

A: Percona XtraBackup 2.1 (the current stable release, which works with MySQL 5.6 and Percona Server 5.6) will also work with MySQL 5.5, Percona Server 5.5, Percona Server 5.1 and MySQL 5.1 running the innodb plugin. There is also support for various MariaDB versions.

Q: question on replication: my database has no partitioned table, multi-thread replication (feature of 5.6) is not going to help. Am I right?

A: Currently the multi-threaded replication slave will partition work up across database schema. It doesn’t matter if your tables are partitioned or not, it matters what database (schema) they’re in. If all your tables are in the same schema, then parallel slave will not currently help.

Q: Is Percona Server 5.6 a drop in replacement for 5.5 or is there an upgrade process? If so, what is involved to roll back to 5.5 if necessary?

A: The upgrade process should be fairly painless and could well be a simple drop-in replacement. It does, of course, depend on what features you may be using along with the type and size of workload. We have a In-Place upgrading from Percona Server 5.5 to Percona Server 5.6 section in our Percona Server 5.6 manual and along with the Changed in Percona Server 5.6 section this should provide a fair amount of insight into what you may expect from the Percona Server side of things. There is also the Upgrading from MySQL 5.5 to 5.6 section of the MySQL manual which is well worth a read.

There is a section in the MySQL manual on downgrading from 5.6 to 5.5 and I don’t think there should be any extra limitations imposed by Percona Server on going from 5.6 back to 5.5. That being said, downgrading is certainly not as well tested as upgrading and I would consider it more of a last resort than something to jump to quickly.

Q: When does production Percona server 5.6 release?

A: Soon. The current Percona Server 5.6 releases are fairly solid and I can certainly recommend trialling them.

Q: Are there any known mysqllib binding issues or deprecations for 5.6?

A: None that I’m aware of.

Q: Is there a white paper or other docs on migratiing from 5.1 Percona server to 5.6 Percona server?

A: Not currently. Generally, the recommended practice is to go through each major version (going through 5.5 before heading to 5.6). There is upgrade documentation for upgrading 5.1 to 5.5 and for 5.5 to 5.6 – and you can certainly run 5.5 for only a few minutes before upgrading to 5.6.

Q: Will you offer training on 5.6?

A: Yes! There is a Moving to MySQL 5.6 training course offered by Percona which covers both MySQL 5.6 and Percona Server 5.6.

Q: I didn’t notice any mention of the improved NUMA support in PS 5.5 (http://www.percona.com/doc/percona-server/5.5/performance/innodb_numa_support.html). Is this carried over to Oracle and/or Percona 5.6?

A: Yes it has made it into Percona Server 5.6. See http://www.percona.com/doc/percona-server/5.6/performance/innodb_numa_support.html for the 5.6 documentation on it. I am not aware of Oracle having implemented it though.

Q: Have you made tests of user_stats overhead compared to performance_schema in 5.6?

A: I’m not aware of any published benchmarks for 5.6 although it would be great to see some.

Q: Does this release support the live table changes?

A: For some types of changes, yes.

Q: Is “Warning: Using a password on the command line interface can be insecure.” error being filtered out in the Percona release?

A: No. It’s not a good idea to provide passwords on the command line.

Q: He also promised a migration blog post ;-)

A: As promised, I am right now going to pester people about writing various posts on migrating from 5.5 to 5.6.

The post Percona Server 5.6 Webinar follow-up and Q&A appeared first on MySQL Performance Blog.

Big Data with MySQL and Hadoop at MySQL Connect 2013

I will be talking about Big Data with MySQL and Hadoop at MySQL Connect 2013 (Sept. 21-22) in San Francisco as well as at Percona University at Washington, DC (September 12, 2013). Apache Hadoop is a very popular Big Data solution and we can nowadays easily integrate it with MySQL. I will start with a brief introduction of Apache Hadoop and its components (HFDS, Map/Reduce, Hive, HBase/HCatalog, Flume, Scoop, etc). Next I will show 2 major Big Data scenarios:

  • From file to Hadoop to MySQL. This is an example of “ELT” process: Extract data from external source; Load data into Hadoop; Transform data/Analyze data; Extract results to MySQL. It is similar to the original Data Warehouse ETL (Extract; Transfer; Load) process, however, instead of “transforming” data before loading it to the Data Warehouse, we will load it “as is” and then run the data analysis. As a result of this analysis (map/reduce process) we can generate a report and load it to MySQL (using Sqoop export). To illustrate this process I will show 2 classical examples: Clickstream analysis and Twitter feed analysis. On top of those examples I will also show how to use MySQL / Full Text Search solutions to perform a near real-time reports from HBase.

Picture 1: ELT pipeline, from File to Hadoop to MySQL

clickstream_example

  • From OLTP MySQL to Hadoop to MySQL reporting. In this scenario we extract data (potentially close to real-time) from MySQL, load it to Hadoop for storage and analysis and later generate reports to load it into another MySQL instance (reporting), which can be used to generate and display graphs.

Picture 2: From OLTP MySQL to Hadoop to MySQL reporting.

hadoop_mysql_reporting

Note: The reason why we need an additional storage for MySQL reports is that it may take a long time to generate a Hive report (as it is executed with Map/Reduce which reads all the files/no indexes). So it make sense to “offload” a common reports’ results into a separate storage (MySQL).

In both scenarios we will need a way to integrate Hadoop and MySQL. In my previous post, MySQL and Hadoop integration, I have demonstrated how to integrate Hadoop and MySQL with Sqoop and Hadoop Applier for MySQL. This case is similar, however we can use a different toolset. In the scenario 1 (i.e. Clickstream) we can use Apache Flume to grab files (or read “events”) and load them into Hadoop. With Flume we can define a “source” and a “sink”. Flume supports a range of different sources including HTTP requests, Syslog, TCP, etc. HTTP source is interesting, as we can convert all (or a number of) HTTP requests (“Source”) into an “event” which can be loaded into Hadoop (“Sink”).

During my presentation I will show the exact configurations for the sample Clickstream process, including:

  1. Flume configuration
  2. HiveQL queries to generate a report
  3. Sqoop export queries to load the report into MySQL

See you at MySQL Connect 2013!

The post Big Data with MySQL and Hadoop at MySQL Connect 2013 appeared first on MySQL Performance Blog.

The top 5 proactive measures to minimize MySQL downtime

The top 5 proactive measures to minimize MySQL downtimeI’m happy to announce that the recording for my recent webinar “5 Proactive Measures to Minimize MySQL Downtime” is now available, along with the slides. They can both be found here.

My webinar focused on the top 5 operational measures that prevent or reduce downtime — along with the related business impact in a significant number of customer emergency scenarios.

As a senior consultant on Percona’s 24×7 Emergency Consulting team, I’ve helped resolve a myriad of client emergencies related to MySQL downtime and know that every emergency situation is unique and has its own set of challenges. However, when cases are systematically studied and analyzed, patterns of what typically causes MySQL downtime and how it can be best avoided emerge. And that’s what my webinar focused on. Thanks to everyone who attended, asked questions, and sent me thank you emails after the fact.

If you were not able to attend but are interested in the material, be sure to watch the recording, as the slides include only a small part of the information presented. If you have questions, please leave them in the comments section below and I’ll answer them as best I can.

The post The top 5 proactive measures to minimize MySQL downtime appeared first on MySQL Performance Blog.