Experimental GIT Mirrors of Percona XtraBackup, Percona Server plus Oracle MySQL trees

I recently blogged on setting up Experimental Git mirror of Oracle MySQL trees up on GitHub. I’m now happy to announce that there are also mirrors for:

I’ve also updated the Oracle MySQL GIT mirror to include MySQL 5.7 and the (now abandoned) MySQL 6.0. I include the abandoned 6.0 tree as it can provide useful archaeology as to where some code came from.

I’d love to hear about any positive/negative experiences using these mirrors. Hopefully shortly I’ll fix up the last two bits of missing metadata in the transfer: bzr branch names for each commit and the “bugs fixed” by each commit (what’s added with “bzr commit –fixes lp:1234″).

The post Experimental GIT Mirrors of Percona XtraBackup, Percona Server plus Oracle MySQL trees appeared first on MySQL Performance Blog.

Want to archive tables? Use Percona Toolkit’s pt-archiver

pt-archiverPercona Toolkit’s pt-archiver is one of the best utilities to archive the records from large tables to another tables or files. One interesting thing is that pt-archiver is a read-write tool. It deletes data from the source by default, so after archiving you don’t need to delete it separately.

As it is done by default, you should take care before actually running it on then production server. You can test your archiving jobs with the – dry-run  OR you can use the –no-delete option if you’re not sure about. The purpose of this script is mainly to archive old data from the table without impacting OLTP queries and insert the data into another table on the same/different server OR into a file in a format which is suitable for LOAD DATA INFILE.

How does pt-archiver select records to archive? 

Pt-archiver uses the index to select records from the table. The index is used to optimize repeated accesses to the table. Pt-archiver remembers the last row it retrieves from each SELECT statement, and uses it to construct a WHERE clause. It does this using the columns in the specified index that should allow MySQL to start the next SELECT where the last one ended – rather than potentially scanning from the beginning of the table with each successive SELECT.

If you want to run pt-archiver with a specific index you can use the “-i” option in –source DSN options. The “-i” option tells pt-archiver which index it should scan to archive. This appears in a FORCE INDEX or USE INDEX hint in the SELECT statements that are used to fetch rows to archive. If you don’t specify anything, pt-archiver will auto-discover a good index, preferring a PRIMARY KEY if one exists. Most of the time, without “-i” option, pt-archiver works well.

How to run pt-archiver?

For archive records into normal file, you can run something like

pt-archiver --source h=localhost,D=nil,t=test --file '/home/nilnandan/%Y-%m-%d-tabname' --where "name='nil'" --limit-1000

From archive records from one table to another table on same server or different, you can run something like

pt-archiver --source h=localhost,D=nil,t=test --dest h=fedora.vm --where "name='nil'" --limit-1000

Please check this before you use default file option (-F) in –source  http://www.percona.com/doc/percona-toolkit/2.1/pt-archiver.html#cmdoption-pt-archiver–dest

Archiving in a replication environment:

In the replication environment it’s really important that the slave should not lag for a long time. So for that, there are two options which we can use while archiving to control the slave lag on slave server.

–check-slave-lag : Pause archiving until the specified DSN’s slave lag is less than –max-lag. In this option, you can give slave details to connect slave lag. (i.e –check-slave-lag h=localhost,S=/tmp/mysql_sandbox29784.sock)

–max-lag : Pause archiving if the slave given by –check-slave-lag lags.

This options causes pt-archiver to look at the slave every time when it’s about to fetch another row. If the slave’s lag is greater than the option’s value, or if the slave isn’t running (so its lag is NULL), pt-archiver sleeps for –check-interval seconds and then looks at the lag again. It repeats until the slave is caught up, then proceeds to fetch and archive the row.

Some useful options for pt-archiver:

–for-update/-share-lock  : Adds the FOR UPDATE/LOCK IN SHARE MODE  modifier to SELECT statements.

–no-delete : Do not delete archived rows.

–plugin : Perl module name to use as a generic plugin.

–progress : Print progress information every X rows.

–statistics : Collect and print timing statistics.

–where : WHERE clause to limit which rows to archive (required).

nilnandan@nil:~$ pt-archiver --source h=localhost,D=nil,t=test,S=/tmp/mysql_sandbox29783.sock --file '/home/nilnandan/%Y-%m-%d-tabname' --where "name='nilnandan'" --limit=50000 --progress=50000 --txn-size=50000 --statistics --bulk-delete --max-lag=1 --check-interval=15 --check-slave-lag h=localhost,S=/tmp/mysql_sandbox29784.sock
TIME ELAPSED COUNT
2013-08-08T10:08:39 0 0
2013-08-08T10:09:25 46 50000
2013-08-08T10:10:32 113 100000
2013-08-08T10:11:41 182 148576
Started at 2013-08-08T10:08:39, ended at 2013-08-08T10:11:59
Source: D=nil,S=/tmp/mysql_sandbox29783.sock,h=localhost,t=test
SELECT 148576
INSERT 0
DELETE 148576
Action Count Time Pct
print_file 148576 18.2674 9.12
bulk_deleting 3 8.9535 4.47
select 4 2.9204 1.46
commit 3 0.0005 0.00
other 0 170.0719 84.95
nilnandan@nil:~$

Percona Toolkit’s pt-archiver works with Percona XtraDB Cluster (PXC) 5.5.28-23.7 and newer, but there are three limitations you should consider before archiving on a cluster. You can get more information here.

pt-archiver is extensible via a plugin mechanism. You can inject your own code to add advanced archiving logic that could be useful for archiving dependent data, applying complex business rules, or building a data warehouse during the archiving process. Follow this URL for more info on that.

Bugs related to pt-archiver: https://bugs.launchpad.net/percona-toolkit/+bugs?field.tag=pt-archiver

More details about pt-archiver: http://www.percona.com/doc/percona-toolkit/2.2/pt-archiver.html

The post Want to archive tables? Use Percona Toolkit’s pt-archiver appeared first on MySQL Performance Blog.

Getting Percona PAM to work with Percona Server & its client apps

Percona Server for MySQLPercona Server is bundled with the PAM plugin which opens a plethora of ways to authenticate to MySQL such as restricting time when users can connect to MySQL, authenticate via a USB key, authenticate to an external authentication system such as LDAP and many, many more PAM compatible mechanisms.

If you want to use PAM authentication on the community version of MySQL, you may follow the instructions here to get it working on your system. If you want to test PAM authentication, the simplest way is to authenticate via /etc/shadow. The steps do so can be found in here or you can follow the steps below.

Here’s a primer for setting up Percona PAM on CentOS 6 to authenticate via /etc/shadow:

1. Install Percona yum repository

# rpm -Uvh http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm
Retrieving http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm
Preparing...                ########################################### [100%]
   1:percona-release        ########################################### [100%]

2. Install Percona Server 5.5

# yum install Percona-Server-server-55 Percona-Server-client-55
…
snipped for brevity
…
Installed:
  Percona-Server-client-55.x86_64 0:5.5.32-rel31.0.549.rhel6 Percona-Server-server-55.x86_64 0:5.5.32-rel31.0.549.rhel6
Dependency Installed:
  Percona-Server-shared-55.x86_64 0:5.5.32-rel31.0.549.rhel6   perl.x86_64 4:5.10.1-131.el6_4   perl-Module-Pluggable.x86_64 1:3.90-131.el6_4   perl-Pod-Escapes.x86_64 1:1.04-131.el6_4   perl-Pod-Simple.x86_64 1:3.13-131.el6_4   perl-libs.x86_64 4:5.10.1-131.el6_4
  perl-version.x86_64 3:0.77-131.el6_4
Complete!

3. Start Percona Server 5.5

# service mysql start
Starting MySQL (Percona Server)...... SUCCESS!

4. From the mysql console, enable auth_pam and auth_pam_compat plugins. These PAM plugins will be discussed in detail later.

mysql> INSTALL PLUGIN auth_pam SONAME 'auth_pam.so';
Query OK, 0 rows affected (0.00 sec)
mysql> INSTALL PLUGIN auth_pam_compat SONAME 'auth_pam_compat.so';
Query OK, 0 rows affected (0.00 sec)
mysql> SHOW PLUGINS;
+--------------------------------+----------+--------------------+--------------------+---------+
| Name                           | Status   | Type               | Library            | License |
+--------------------------------+----------+--------------------+--------------------+---------+
| binlog                         | ACTIVE   | STORAGE ENGINE     | NULL               | GPL     |
…
snipped for brevity
…
| auth_pam                       | ACTIVE   | AUTHENTICATION     | auth_pam.so        | GPL     |
| auth_pam_compat                | ACTIVE   | AUTHENTICATION     | auth_pam_compat.so | GPL     |
+--------------------------------+----------+--------------------+--------------------+---------+
42 rows in set (0.01 sec)

5. From the MySQL console, create two users that will authenticate using auth_pam and auth_pam_compat respectively. You also need to delete anonymous users:

mysql> CREATE USER ap_user IDENTIFIED WITH auth_pam;
Query OK, 0 rows affected (0.00 sec)
mysql> CREATE USER apc_user IDENTIFIED WITH auth_pam_compat;
Query OK, 0 rows affected (0.00 sec)
mysql> DELETE FROM mysql.user WHERE USER='';
Query OK, 2 rows affected (0.00 sec)
mysql> FLUSH PRIVILEGES;
Query OK, 0 rows affected (0.00 sec)

6. Configure Percona Server to authenticate via /etc/shadow by creating a PAM config file in /etc/pam.d/mysqld with the following content:

auth       required     pam_warn.so
auth       required     pam_unix.so audit
account    required     pam_unix.so audit

7. Ensure Percona Server can read /etc/shadow by changing the group ownership and permissions of it

#chgrp mysql /etc/shadow
#chmod g+r /etc/shadow

8. Create system users and respective passwords. The usernames should match the users created from the MySQL console

#useradd ap_user
#passwd ap_user
#useradd apc_user
#passwd apc_user

9. Test if you can connect to Percona Server using the Unix passwords of ap_user and apc_user:

# mysql -u ap_user -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 9
Server version: 5.5.32-31.0 Percona Server (GPL), Release rel31.0, Revision 549
Copyright (c) 2009-2013 Percona Ireland Ltd.
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> SELECT USER(), CURRENT_USER();
+-------------------+----------------+
| USER()            | CURRENT_USER() |
+-------------------+----------------+
| ap_user@localhost | ap_user@%      |
+-------------------+----------------+
1 row in set (0.00 sec)
# mysql --enable-cleartext-plugin -u apc_user -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or g.
Your MySQL connection id is 11
Server version: 5.5.32-31.0 Percona Server (GPL), Release rel31.0, Revision 549
Copyright (c) 2009-2013 Percona Ireland Ltd.
Copyright (c) 2000, 2013, Oracle and/or its affiliates. All rights reserved.
Oracle is a registered trademark of Oracle Corporation and/or its
affiliates. Other names may be trademarks of their respective
owners.
Type 'help;' or 'h' for help. Type 'c' to clear the current input statement.
mysql> SELECT USER(), CURRENT_USER();
+--------------------+----------------+
| USER()             | CURRENT_USER() |
+--------------------+----------------+
| apc_user@localhost | apc_user@%     |
+--------------------+----------------+
1 row in set (0.00 sec)

10. Done.

The Percona PAM plugin authentication plugin has 2 types of plugins. The first is auth_pam plugin which is a full-featured implementation of the PAM plugin that uses the dialog plugin. The other is auth_pam_compat which uses mysql_clear_password plugin which Oracle provides.

The drawback of this plugin is it’s only able to accept a password as an input and credentials are sent in cleartext which is not secure. Now, why would you need to choose between one or the other? It all depends if the client supports any of those plugins or worse, none at all. To create a user to authenticate via auth_pam, on the MySQL console run:
CREATE USER <auth_pam_user> IDENTIFIED WITH auth_pam;
For auth_pam_compat, run:
CREATE USER <auth_pam_compat_user> IDENTIFIED WITH auth_pam_compat;

As of now, only Percona Server’s mysql client and an older version of HeidiSQL(version 7), a GUI MySQL client for Windows, are able to authenticate over PAM via the auth_pam plugin by default.

So, if you try to connect to MySQL using Perl, PHP, Ruby, Python and the like, you will receive this error: “Client does not support authentication protocol requested by server; consider upgrading MySQL client.”

The good news is that if the client uses libmysqlclient library to connect via MySQL, you can recompile the client’s source code to use the libmysqlclient library of Percona Server to make it compatible. This involves installing Percona Server development library, compiler tools, and development libraries followed by compiling and installing the client’s source code. This maybe an easy task for a single server but if you have a large deployment of servers and clients, it would be wiser to build RPMs or DEBs instead.

To give you an idea of which clients depend on libmysqlclient, if you’re using CentOS and yum-utils is installed, you can run: repoquery -q –whatrequires mysql-libs

For Ubuntu, just run: apt-cache rdepends libmysqlclient18

Below is an example of rebuilding and installing perl’s DBD MySQL package to make it compatible with the auth_pam plugin:

1. Install Percona yum repository and Percona Server 5.5 development library:

# rpm -Uvh http://www.percona.com/downloads/percona-release/percona-release-0.0-1.x86_64.rpm
# yum install Percona-Server-devel-55

2. Install rpm-build, the package used to build RPMs

# yum install rpm-build

3. Download and install perl-DBD-MySQL Source RPM.

# wget http://vault.centos.org/6.4/os/Source/SPackages/perl-DBD-MySQL-4.013-3.el6.src.rpm
# rpm -Uvh perl-DBD-MySQL-4.013-3.el6.src.rpm

The source will be installed under rpmbuild of the current directory.

4. Install compilers and dependencies:

yum install Percona-Server-client-55 perl perl-DBI zlib-devel perl-ExtUtils-MakeMaker  make gcc openssl-devel

5. Build the RPM file:

cd rpmbuild/SPECS/
rpmbuild -bb rpmbuild/SPECS/perl-DBD-MySQL.spec

6. Install the RPM file:

rpm -Uvh rpmbuild/RPMS/x86_64/perl-DBD-MySQL-4.013-3.el6.x86_64.rpm

7. Test by creating a perl script and running it:
testpam.pl

#!/usr/bin/perl
use DBI;
$database="mysql";
$username="ap_user";
$password="ap_user_password";
$host="127.0.0.1";
$sql="SHOW TABLES";
$table="";
$dbh = DBI->connect("DBI:mysql:$database:$host", $username, $password) or die "$DBI::errstrn";
$query = $dbh->prepare($sql) or die "$dbh->errstrn";
$query->execute or die "$query->errstr";
while (@row = $query->fetchrow_array()) {
        $table = $row[0];
        print "$tablen";
}

#chmod +x testpam.pl
#perl testpam.pl
DBI connect('mysql:127.0.0.1','ap_user',...) failed: Authentication plugin 'dialog' cannot be loaded: /usr/lib64/mysql/plugin/dialog.so: cannot open shared object file: No such file or directory at test.pl line 10
Authentication plugin 'dialog' cannot be loaded: /usr/lib64/mysql/plugin/dialog.so: cannot open shared object file: No such file or directory

8. Troubleshooting
If you get the error above, it means you need to install the dialog plugin which is bundled in the Percona-Server-server-55 RPM. If you don’t want to install the RPM, you may opt to download and extract it from the binary tar file provided in the Percona website and place the plugin on the /usr/lib64/mysql/plugin directory. In this example, we will install the Percona-Server-server-55 RPM

#yum install Percona-Server-server-55
#perl testpam.pl
columns_priv
db
event
func
general_log
help_category
help_keyword
help_relation
help_topic
host
ndb_binlog_index
plugin
proc
procs_priv
proxies_priv
servers
slow_log
tables_priv
time_zone
time_zone_leap_second
time_zone_name
time_zone_transition
time_zone_transition_type
user

9. Done.

As for the auth_pam_compat plugin, if the clients use the libmysqlclient.so.18 and above, you do not need to recompile the client’s source code such as described above.

On Ubuntu 12.04, there’s no need to recompile libdbd-mysql-perl package:

# dpkg -L libdbd-mysql-perl
/.
…
snipped for brevity
…
/usr/lib/perl5/auto/DBD/mysql/mysql.so
#ldd /usr/lib/perl5/auto/DBD/mysql/mysql.so
linux-vdso.so.1 =>  (0x00007fff943ff000)
libmysqlclient.so.18 => /usr/lib/libmysqlclient.so.18 (0x00007f20517bf000)
…
snipped for brevity
…
libz.so.1 => /lib/x86_64-linux-gnu/libz.so.1 (0x00007f20504aa000)

On the other hand, perl-DBD-MySQL of CentOS 6 needs to be recompiled because it depends on libmysqlclient.so.16:

# rpm -ql perl-DBD-MySQL
/usr/lib64/perl5/Bundle
…
snipped for brevity
…
/usr/lib64/perl5/auto/DBD/mysql/mysql.so
…
snipped for brevity
…
/usr/share/man/man3/DBD::mysql::INSTALL.3pm.gz
# ldd /usr/lib64/perl5/auto/DBD/mysql/mysql.so
linux-vdso.so.1 =&gt; (0x00007fff847ff000)
libmysqlclient.so.16 =&gt; /usr/lib64/libmysqlclient.so.16 (0x00007fa2471c3000)
…
snipped for brevity
…
libselinux.so.1 =&gt; /lib64/libselinux.so.1 (0x00007fa244730000)

Once you’re able to resolved the issue above, depending on your client, you can use the auth_pam_compat plugin by enabling the use of the cleartext plugin via environment variable, program option or placing the configuration on my.cnf:

Examples:
Using the environment variable:

$ export LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1
$ mysql -h 127.0.0.1 -u apc_user

Enable the plugin as an option. It’s supported on mysql, mysqladmin and mysqlslap.

$ mysql --enable-cleartext-plugin -h 127.0.0.1 -u apc_user

As a my.cnf config:
/etc/my.cnf

[client]
enable-cleartext-plugin

$ mysql -h 127.0.0.1 -u apc_user

For more information on the cleartext authentication, click here.

A practical example of using auth_pam_compat is being able to use MySQL Workbench 5.2 over PAM on a Mac:

$ export LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1
$ /Applications/MySQLWorkbench.app/Contents/MacOS/MySQLWorkbench

and on Linux:

$ export LIBMYSQL_ENABLE_CLEARTEXT_PLUGIN=1
$ mysql-workbench

Do note that on MySQL Workbench 5.2, you need to store the password in the keychain for authentication to PAM to work. Also, sadly, Percona PAM does not work on MySQL Workbench 5.2 on Windows as confirmed in the bug I reported last February 2013.

As for the latest release of MySQL Workbench 6.0, there’s an advanced option to enable the cleartext authentication plugin when setting up connections. However, not all features work with auth_pam_compat. For example, SQL Editor does not connect successfully with auth_pam_compat plugin but the Reverse Engineer tool works fine with the plugin.

Conclusion
If you want to use Percona PAM, you need to test first if your MySQL clients support it and most likely you will need to recompile these clients to make it work. If your only choice is auth_pam_compat but you are weary of using it because credentials are sent in cleartext, you can add layer of security by enabling SSL encryption on MySQL. One such example is JDBC, where cleartext authentication is permissible, but only if the connection is encrypted.

The post Getting Percona PAM to work with Percona Server & its client apps appeared first on MySQL Performance Blog.

Percona University at Washington, D.C. – Sept. 12

Percona University at Washington, D.C. - Sept. 12Following our events earlier this year in Raleigh, Montevideo, Buenos Aires, Toronto and Portland, we bring Percona University to Washington, D.C. on September 12.

This free one-day technical education event will cover many MySQL ecosystem technologies, including: Percona Server, MariaDB, MySQL 5.6, Percona XtraDB Cluster, Tungsten Replicator, Percona XtraBackup, MySQL Cluster and Percona Toolkit. We also will learn about Apache Hadoop and how it can be most successfully used with MySQL environments.

Matt Yonkovit, VP of Percona MySQL Consulting, will give his great talk on the “5 minute DBA,” drawing from a wealth of practical DBA tips for developers and system administrators – professionals who must step up to the plate on occasion and perform DBA tasks.

Ryan Huddleston, director of Percona Remote DBA, will talk about “MySQL Backups in the Real World,” sharing best practices you’ll want to follow if you never want to lose your data again – as well as being able to recover from various data incidents as quickly as possible.

Our special guest for this event is Baron Schwartz, co-Founder and CEO of VividCortex, who will be speaking about Using MySQL with Go. Go Programming language is quickly gaining popularity and we’re excited to bring the most up to date information about using Go with MySQL database.

If you will be in the Washington, D.C. area on September 12, I’d love to see you at this event – space is limited though, so please register now.  And if you’re interested in Percona University events and would like to bring one to your city, please fill out this form to let us know – especially if you can help us by hosting or promoting the event.

The post Percona University at Washington, D.C. – Sept. 12 appeared first on MySQL Performance Blog.

MySQL Security: Armoring Your Dolphin

MySQL Security: Armoring Your DolphinMy colleague and teammate Ernie Souhrada will be presenting a webinar on Wednesday, August 21, 2013 at 10 a.m. PDT titled “MySQL Security: Armoring Your Dolphin.”

This is a popular topic with news breaking routinely that yet another Internet company has leaked private data of one form or another. Ernie’s webinar will be a great overview of security MySQL from top to bottom, including changes related to security in the 5.6 release.

Topics to be covered include:

  • Basic security concepts
  • Security above the MySQL layer (network, hardware, OS, etc.)
  • Tips for application design
  • A more secure MySQL configuration
  • Security-related changes in MySQL 5.6

Attendees will leave this presentation knowing where to start when identifying vulnerability in their systems.

Be sure to register for the webinar in advance!

The post MySQL Security: Armoring Your Dolphin appeared first on MySQL Performance Blog.

InnoDB Full-text Search in MySQL 5.6: Part 3, Performance

This is part 3 of a 3 part series covering the new InnoDB full-text search features in MySQL 5.6. To catch up on the previous parts, see part 1 or part 2

Some of you may recall a few months ago that I promised a third part in my InnoDB full-text search (FTS) series, in which I’d actually take a look at the performance of InnoDB FTS in MySQL 5.6 versus traditional MyISAM FTS. I hadn’t planned on quite such a gap between part 2 and part 3, but as they say, better late than never. Recall that we have been working with two data sets, one which I call SEO (8000-keyword-stuffed web pages) and the other which I call DIR (800K directory records), and we are comparing MyISAM FTS in MySQL 5.5.30 versus InnoDB FTS in MySQL 5.6.10.

For reference, although this is not really what I would call a benchmark run, the platform I’m using here is a Core i7-2600 3.4GHz, 32GiB of RAM, and 2 Samsung 256GB 830 SSDs in RAID-0. The OS is CentOS 6.4, and the filesystem is XFS with dm-crypt/LUKS. All MySQL settings are their respective defaults, except for innodb_ft_min_token_size, which is set to 4 (instead of the default of 3) to match MyISAM’s default ft_min_word_len.

Also, recall that the table definition for the DIR data set is:

CREATE TABLE dir_test (
  id INT UNSIGNED NOT NULL PRIMARY KEY,
  full_name VARCHAR(100),
  details TEXT
);

The table definition for the SEO data set is:

CREATE TABLE seo_test (
 id INT NOT NULL AUTO_INCREMENT PRIMARY KEY,
 title VARCHAR(255),
 body MEDIUMTEXT
);

Table Load / Index Creation

First, let’s try loading data and creating our FT indexes in one pass – i.e., we’ll create the FT indexes as part of the original table definition itself. In particular, this means adding “FULLTEXT KEY (full_name, details)” to our DIR tables and adding “FULLTEXT KEY (title, body)” to the SEO tables. We’ll then drop these tables, drop our file cache, restart MySQL, and try the same process in two passes: first we’ll load the table, and then we’ll do an ALTER to add the FT indexes. All times in seconds.

EngineData Setone-pass (load)two-pass (load, alter)
MyISAMSEO3.913.96 (0.76, 3.20)
InnoDBSEO3.7777.32 (1.53, 5.79)
MyISAMDIR43.15944.93 (6.99, 37.94)
InnoDBDIR330.7656.99 (12.70, 44.29)

Interesting. For MyISAM, we might say that it really doesn’t make too much difference which way you proceed, as the numbers from the one-pass load and the two-pass load are within a few percent of each other, but for InnoDB, we have mixed behavior. With the smaller SEO data set, it makes more sense to do it in a one-pass process, but with the larger DIR data set, the two-pass load is much faster.

Recall that when adding the first FT index to an InnoDB table, the table itself has to be rebuilt to add the FTS_DOC_ID column, so I suspect that the size of the table when it gets rebuilt has a lot to do with the performance difference on the smaller data set. The SEO data set fits completely into the buffer pool, the DIR data set does not. That also suggests that it’s worth comparing the time required to add a second FT index (this time we will just index each table’s TEXT/MEDIUMTEXT field). While we’re at it, let’s look at the time required to drop the second FT index as well. Again, all times in seconds.

EngineData SetFT Index Create TimeFT Index Drop Time
MyISAMSEO6.343.17
InnoDBSEO3.260.01
MyISAMDIR74.9637.82
InnoDBDIR24.590.01

InnoDB wins this second test all around. I’d attribute InnoDB’s win here partially to not having to rebuild the whole table with second (and subsequent) indexes, but also to the fact that at least some the InnoDB data was already in the buffer pool from when the first FT index was created. Also, we know that InnoDB generally drops indexes extremely quickly, whereas MyISAM requires a rebuild of the .MYI file, so InnoDB’s win on the drop test isn’t surprising.

Query Performance

Recall the queries that were used in the previous post from this series:

1. SELECT id, title, MATCH(title, body) AGAINST ('arizona business records'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3
   DESC LIMIT 5;
2. SELECT id, title, MATCH(title, body) AGAINST ('corporation commission forms'
   IN NATURAL LANGUAGE MODE) AS score FROM seo_test_{myisam,innodb} ORDER BY 3 DESC
   LIMIT 5;
3. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson +arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
4. SELECT id, full_name, MATCH(full_name, details) AGAINST ('+james +peterson arizona'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 5;
5. SELECT id, full_name, MATCH(full_name, details) AGAINST ('"Thomas B Smith"'
   IN BOOLEAN MODE) AS score FROM dir_test_{myisam,innodb} ORDER BY 3 DESC LIMIT 1;

The queries were run consecutively from top to bottom, a total of 10 times each. Here are the results in tabular format:

Query #EngineMin. Execution TimeAvg. Execution TimeMax. Execution Time
1MyISAM0.0079530.0081020.008409
1InnoDB0.0149860.0153310.016243
2MyISAM0.0018150.0018930.001998
2InnoDB0.0019870.0020770.002156
3MyISAM0.0007480.0008170.000871
3InnoDB0.6701100.6765400.684837
4MyISAM0.0011990.0012830.001372
4InnoDB0.0554790.0562560.060985
5MyISAM0.0084710.0085970.008817
5InnoDB0.6243050.6309590.641415

Not a lot of variance in execution times for a given query, so that’s good, but InnoDB is always coming back slower than MyISAM. In general, I’m not that surprised that MyISAM tends to be faster; this is a simple single-threaded, read-only test, so none of the areas where InnoDB shines (e.g., concurrent read/write access) are being exercised here, but I am quite surprised by queries #3 and #5, where InnoDB is just getting smoked.

I ran both versions of query 5 with profiling enabled, and for the most part, the time spent in each query state was identical between the InnoDB and MyISAM versions of the query, with one exception.

InnoDB: | Creating sort index | 0.626529 |
MyISAM: | Creating sort index | 0.014588 |

That’s where the bulk of the execution time is. According to the docs, this thread state means that the thread is processing a SELECT which required an internal temporary table. Ok, sure, that makes sense, but it doesn’t really explain why InnoDB is taking so much longer, and here’s where things get a bit interesting. If you recall part 2 in this series, query 5 actually returned 0 results when run against InnoDB with the default configuration because of the middle initial “B”, and I had to set innodb_ft_min_token_size to 1 in order to get results back. For the sake of completeness, I did that again here, then restarted the server and recreated my FT index. The results? Execution time dropped by 50% and ‘Creating sort index’ didn’t even appear in the query profile:

mysql [localhost] {msandbox} (test): SELECT id, full_name, MATCH(full_name, details) AGAINST
('"Thomas B Smith"' IN BOOLEAN MODE) AS score FROM dir_test_innodb ORDER BY 3 DESC LIMIT 1;
+-------+----------------+-------------------+
| id    | full_name      | score             |
+-------+----------------+-------------------+
| 62633 | Thomas B Smith | 32.89915466308594 |
+-------+----------------+-------------------+
1 row in set (0.31 sec)
mysql [localhost] {msandbox} (test): show profile;
+-------------------------+----------+
| Status                  | Duration |
+-------------------------+----------+
| starting                | 0.000090 |
| checking permissions    | 0.000007 |
| Opening tables          | 0.000017 |
| init                    | 0.000034 |
| System lock             | 0.000012 |
| optimizing              | 0.000008 |
| statistics              | 0.000027 |
| preparing               | 0.000012 |
| FULLTEXT initialization | 0.304933 |
| executing               | 0.000008 |
| Sending data            | 0.000684 |
| end                     | 0.000006 |
| query end               | 0.000006 |
| closing tables          | 0.000011 |
| freeing items           | 0.000019 |
| cleaning up             | 0.000003 |
+-------------------------+----------+

Hm. It’s still slower than MyISAM by quite a bit, but much faster than before. The reason it’s faster is because it found an exact match and I only asked for one row, but if I change LIMIT 1 to LIMIT 2 (or limit N>1), then ‘Creating sort index’ returns to the tune of roughly 0.5 to 0.6 seconds, and ‘FULLTEXT initialization’ remains at 0.3 seconds. So this answers another lingering question: there is a significant performance impact to using a lower innodb_ft_min_token_size (ifmts), and it can work for you or against you, depending upon your queries and how many rows you’re searching for. The time spent in “Creating sort index” doesn’t vary too much (maybe 0.05s) between ifmts=1 and ifmts=4, but the time spent in FULLTEXT initialization with ifmts=4 was typically only a few milliseconds, as opposed to the 300ms seen here.

Finally, I tried experimenting with different buffer pool sizes, temporary table sizes, per-thread buffer sizes, and I also tried changing from Antelope (ROW_FORMAT=COMPACT) to Barracuda (ROW_FORMAT=DYNAMIC) and switching character sets from utf8 to latin1, but none of these made any difference. The only thing which seemed to provide a bit of a performance improvement was upgrading to 5.6.12. The execution times for the InnoDB FTS queries under 5.6.12 were about 5-10 percent faster than with 5.6.10, and query #2 actually performed a bit better under InnoDB than MyISAM (average execution time 0.00075 seconds faster), but other than that, MyISAM still wins on raw SELECT performance.

Three blog posts later, then, what’s my overall take on InnoDB FTS in MySQL 5.6? I don’t think it’s great, but it’s serviceable. The performance for BOOLEAN MODE queries definitely leaves something to be desired, but I think InnoDB FTS fills a need for those people who want the features and capabilities of InnoDB but can’t modify their existing applications or who just don’t have enough FTS traffic to justify building out a Sphinx/Solr/Lucene-based solution.

The post InnoDB Full-text Search in MySQL 5.6: Part 3, Performance appeared first on MySQL Performance Blog.

Schema Design in MongoDB vs Schema Design in MySQL

For people used to relational databases, using NoSQL solutions such as MongoDB brings interesting challenges. One of them is schema design: while in the relational world, normalization is a good way to start, how should we design our collections when creating a new MongoDB application?

Let’s see with a simple example how we would create a data structure for MySQL (or any relational database) and for MongoDB. We will assume in this post that we want to store people information (their name) and the details from their passport (country and validity date).

Relational Design

In the relational world, the basic idea is to try to stick to the 3rd normal form and create two tables (I’ll omit indexes and foreign keys for clarity – MongoDB supports indexes but not foreign keys):

mysql> select * from people;
+----+------------+
| id | name       |
+----+------------+
|  1 | Stephane   |
|  2 | John       |
|  3 | Michael    |
|  4 | Cinderella |
+----+------------+
mysql> select * from passports;
+----+-----------+---------+-------------+
| id | people_id | country | valid_until |
+----+-----------+---------+-------------+
|  4 |         1 | FR      | 2020-01-01  |
|  5 |         2 | US      | 2020-01-01  |
|  6 |         3 | RU      | 2020-01-01  |
+----+-----------+---------+-------------+

One of the good things with such a design is that it’s equally easy to run any query (as long as we don’t consider joins as something difficult to use):

  • Do you want the number of people?
    SELECT count(*) FROM people
  • Do you want to know the validity date of Stephane’s passport?
    SELECT valid_until from passports ps join people pl ON ps.people_id = pl.id WHERE name = 'Stephane'
  • Do you want to know how many people do not have a passport? Run
    SELECT name FROM people pl LEFT JOIN passports ps ON ps.people_id = pl.id WHERE ps.id IS NULL
  • etc

MongoDB design

Now how should we design our collections in MongoDB to make querying easy?

Using the 3rd normal form is of course possible, but that would probably be inefficient as all joins should be done in the application. So out of the 3 queries above, only the query #1 could be easily run. So which other designs could we have?

A first option would be to store everything in the same collection:

> db.people_all.find().pretty()
{
	"_id" : ObjectId("51f7be1cd6189a56c399d3bf"),
	"name" : "Stephane",
	"country" : "FR",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{
	"_id" : ObjectId("51f7be3fd6189a56c399d3c0"),
	"name" : "John",
	"country" : "US",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{
	"_id" : ObjectId("51f7be4dd6189a56c399d3c1"),
	"name" : "Michael",
	"country" : "RU",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{ "_id" : ObjectId("51f7be5cd6189a56c399d3c2"), "name" : "Cinderella" }

By the way, we can see here that MongoDB is schemaless: there is no problem in storing documents that do not have the same structure.

The drawback is that it is no longer clear which attributes belong to the passport, so if you want to get all passport information for Michael, you will need to correctly understand the whole data structure.

A second option would be to embed passport information inside people information – MongoDB supports rich documents:

> db.people_embed.find().pretty()
{
	"_id" : ObjectId("51f7c0048ded44d5ebb83774"),
	"name" : "Stephane",
	"passport" : {
		"country" : "FR",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{
	"_id" : ObjectId("51f7c70e8ded44d5ebb83775"),
	"name" : "John",
	"passport" : {
		"country" : "US",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{
	"_id" : ObjectId("51f7c71b8ded44d5ebb83776"),
	"name" : "Michael",
	"passport" : {
		"country" : "RU",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{ "_id" : ObjectId("51f7c7258ded44d5ebb83777"), "name" : "Cinderella" }

Or we could embed the other way (however this looks a bit dubious as some people may not have a passport like Cinderella in our example):

> db.passports_embed.find().pretty()
{
	"_id" : ObjectId("51f7c7e58ded44d5ebb8377b"),
	"country" : "FR",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "Stephane"
	}
}
{
	"_id" : ObjectId("51f7c7ec8ded44d5ebb8377c"),
	"country" : "US",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "John"
	}
}
{
	"_id" : ObjectId("51f7c7fa8ded44d5ebb8377d"),
	"country" : "RU",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "Michael"
	}
}
{
	"_id" : ObjectId("51f7c8058ded44d5ebb8377e"),
	"person" : {
		"name" : "Cinderella"
	}
}

That’s a lot of options! How can we choose? Here is where you should be aware of a fundamental difference between MongoDB and relational databases when it comes to schema design:

Collections inside MongoDB should be designed with the most frequent access patterns of the application in mind, while in the relational world, you can forget how data will be accessed if your tables are normalized.

So…

  • If you read people information 99% of the time, having 2 separate collections can be a good solution: it avoids keeping in memory data is almost never used (passport information) and when you need to have all information for a given person, it may be acceptable to do the join in the application.
  • Same thing if you want to display the name of people on one screen and the passport information on another screen.
  • But if you want to display all information for a given person, storing everything in the same collection (with embedding or with a flat structure) is likely to be the best solution.

Conclusion

We saw in this post one of the fundamental differences between MySQL and MongoDB when it comes to creating the right data structure for an application: with MongoDB, you need to know the data access pattern of the application. This should not be neglected as creating a wrong schema design is a recipe for disaster: queries will be difficult to write and to optimize, they will be slow and they will sometimes need to be replaced by custom code. All that can lead to low performance and frustration.

The next question is: which way is better? And of course, there is no definite answer: MongoDB fans will say that by making all access patterns equal, normalization make them equally bad, and normalization fans will say that a normalized schema provides good performance for most applications and that you can always denormalize to help a few queries run faster.

The post Schema Design in MongoDB vs Schema Design in MySQL appeared first on MySQL Performance Blog.

Percona celebrates its 7th anniversary by giving to open source ecosystem

Percona celebrates its 7th anniversaryToday we’re celebrating Percona’s 7th anniversary.  A lot has changed in these past 7 years – we have grown from a two-person outfit focused exclusively on consulting to a 100-person company with teammates in 22 different countries and 18 different states, now providing Support, Consulting, RemoteDBA, Server Development and Training services.

We also made our mark in open source software development, creating some of the most popular products for the MySQL ecosystem – Percona Toolkit, Percona Xtrabackup, Percona XtraDB Cluster, Percona Server and others. Additionally, we’re into our second year of hosting the Percona Live conference series for the MySQL community. We have grown to serve over 2,000 customers and I’m proud to say we could do it all in bootstrap mode without attracting outside investors and keeping the company owned by its employees.

So how are we celebrating our anniversary? We decided to celebrate by supporting the open source ecosystem, making donations to a number of open source initiatives that have helped us through all these years. We would not be here without you!

As such we’re supporting:

  • MariaDB Foundation for supporting MariaDB, one of the MySQL alternatives that we fully support at Percona.
  • Free Software Foundation as an organization instrumental to the success of the open source movement.
  • Linux Foundation for supporting Linux, by far the most popular platform among our customers.
  • Debian for creating a foundation for some of the most popular Linux distributions out there.
  • Jenkins for the Continuous Integration server we use for our development projects.
  • OpenSSH for software that helps us to access customer systems securely.
  • Drupal for powering our website as well as the websites of many of our customers.

We’re happy to enjoy the growth that’s allowing us to support other projects in our ecosystem. If you have the chance I encourage you do the same. There is a tremendous amount of work going into open source software, which is made free to use, but it is by far not free to create and maintain.

The post Percona celebrates its 7th anniversary by giving to open source ecosystem appeared first on MySQL Performance Blog.

Advanced MySQL Query Tuning: Webinar followup Q&A

Thanks to all who attended my “MySQL Query Tuning” webinar on July 24.  If you missed it, you can you can download the slides and also watch the recorded video. Thank you for the excellent questions after the webinar as well. Query tuning is a big topic and, due to the limited time, I had to skip some material, especially some of the monitoring. I would like, however, to answer all the questions I did not get into during the webinar session.

Q: Did you reset the query cache before doing your benchmark on your query? 0.00 seconds sounds too good 

A: (This is in response to a couple of slides where the time showed as 0.00). Yes, MySQL was running with query cache disabled. The 0.00 just means that the query was executed in less than 0.004 sec. MySQL does not show the higher precision if you run the query from mysql monitor. There are a couple of ways to get the exact query times:

  • MySQL 5.0 +: Use “profiling” feature: http://dev.mysql.com/doc/refman/5.5/en/show-profile.html
  • MySQL 5.1 +: Enable the slow query log with microsecond presision and log the query. To log all queries in the slow query log you can temporary set:  long_query_time = 0
  • MySQL 5.6: Use the new performance_schema counters

Here is the profile for an example query, the query shows 0.00 seconds:

mysql> show profile;
+----------------------+----------+
| Status | Duration |
+----------------------+----------+
| starting | 0.000064 |
| checking permissions | 0.000003 |
| checking permissions | 0.000006 |
| Opening tables | 0.000019 |
| System lock | 0.000011 |
| init | 0.000031 |
| optimizing | 0.000011 |
| statistics | 0.000014 |
| preparing | 0.000011 |
| executing | 0.000002 |
| Sending data | 0.002161 |
| end | 0.000004 |
| query end | 0.000002 |
| closing tables | 0.000007 |
| freeing items | 0.000012 |
| logging slow query | 0.000001 |
| cleaning up | 0.000002 |
+----------------------+----------+

As we can see, sending data is actually 0.002 seconds.

Q: Do you ever see doing a seminar that shows how to leverage parallelization (openCL or CUDA) with databases and the performance differences?

A:  MySQL does not support it right now. Usually openCL / CUDA does not help with the disk-bounded applications like databases. However, some projects in OLAP space can actually utilize openCL/CUDA, for example, Alenka, is a column store that is massively parallel. Scanning, aggregation, sorting, etc are done in a data flow manner via the CUDA processing.

 Q: Is this possible to use this /covered index for order by – A.R/ with join? For example if we want to use where on table A and sort it by column from table B

A: Unfortunately, MySQL does not support that with the covered index.  MySQL will only use the filter on the where condition (to limit the number of rows) + filesort. However, if we have a limit clause, MySQL may be able to use the index for order by and stop after finding N rows, matching the condition. It may not be faster thou (as I showed during the webinar) and you may have to use index hints to tell mysql to use the exact index (may not be the best approach as in some cases the use of this index may not be the best for this case). Example:

mysql> explain select * from City ct join Country cn on (ct.CountryCode = cn.Code) where Continent = 'North America' order by ct.population desc limit 10G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ct
type: index
possible_keys: NULL
key: Population
key_len: 4
ref: NULL
rows: 10
Extra:
*************************** 2. row ***************************
id: 1
select_type: SIMPLE
table: cn
type: eq_ref
possible_keys: PRIMARY
key: PRIMARY
key_len: 3
ref: world.ct.CountryCode
rows: 1
Extra: Using where

As we can see, MySQL will use index and avoid “order by”.

Q: Why are Hash Indexes not available for InnoDB engine ? Any plans to bring Hash indexes.

A: InnoDB use Hash Indexes for so called “Adaptive Hash Index” feature.  InnoDB does not  support hash indexes as a normal table index. We are not aware of the Oracle’s InnoDB team plans to bring this feature in.

Please note: MySQL will allow you to use “using hash” keyword when creating an index on InnoDB table. However, it will create a b-tree index instead.

Q: Will foreign key constraints slow down my queries?

A: It may slow down the queries, as InnoDB will have to

  1. Check the foreign key constraint table
  2. Place a shared lock on the row it will read: 

If a FOREIGN KEY constraint is defined on a table, any insert, update, or delete that requires the constraint condition to be checked sets shared record-level locks on the records that it looks at to check the constraint.InnoDB also sets these locks in the case where the constraint fails. (http://dev.mysql.com/doc/refman/5.5/en/innodb-locks-set.html)

Q: How does use of index vary with the number of columns selected in a select query?

If we are talking about the covered index: if we select a column which is not a part of covered index, mysql will not be able to satisfy the query with index only (“using index” in the explain plan). It may be slower, especially if MySQL will have to select large columns and the data is not cached.

In addition, if we select a text or blob column and MySQL will need to create a temporary table, this temporary table will be created ondisk. I’ve described this scenario during the webinar.

The post Advanced MySQL Query Tuning: Webinar followup Q&A appeared first on MySQL Performance Blog.

John Cesario of Go Daddy on Percona MySQL Training

Senior MySQL DBA John Cesario of Go Daddy spoke at Percona Live in April and shared his views on Percona MySQL Training. Here’s what he had to say:

“Go Daddy is passionate about helping our customers. Percona Innodb training gives us another tool to elevate our customer support to the next level. It gives us the kind of insight that only comes from understanding MySQL at a code level. This helps Go Daddy understand MySQL internals, so we can increase performance and reduce troubleshooting time. The instructors are top-notch and leverage their years of field work, providing curriculum that is both relevant and immediately practical. I highly recommend Percona training for all aspects of MySQL consumption: from developers to operational administrators.”

You can watch John’s presentation here:

John Cesario of Go Daddy speaking about Percona Training at Percona Live 2013

Percona offers public training as well as custom training to customers around the world.

The post John Cesario of Go Daddy on Percona MySQL Training appeared first on MySQL Performance Blog.