ScaleArc: Benchmarking with sysbench

ScaleArc recently hired Percona to perform various tests on its database traffic management product. This post is the outcome of the benchmarks carried out by Uday Sawant (ScaleArc) and myself. You can also download the report directly as a PDF here.

The goal of these benchmarks is to identify the potential overhead of the ScaleArc software itself and the potential benefits of caching. The benchmarks were carried out with the trunk version of sysbench. For this reason, we used a very small set of data, so the measurements will be fast, and it’s known that caching has huge benefits when the queries themselves are rather expensive. We decided that we would rather show that benefit with a real-world application, which is coming later is this series. And if you’re in the Silicon Valley area, be sure to join us this evening at the first-ever Open Source Appreciation Day – I’d be happy to discuss the findings presented here in this post. Admission is free but due to limited space you should register now. I’ll also be available throughout the Percona Live MySQL Conference and Expo all this week.

sysbench_image1.2

In this summary graph it’s visible that in terms of throughput (read-only benchmark, which is relevant for read mostly applications), ScaleArc doesn’t have any significant overhead, while caching can have potentially huge benefits.

sysbench_image2

The situation is pretty similar with response times. ScaleArc doesn’t add any significant overhead, and caching can mean huge benefit in terms of response time as well.

In case of this particular workload (which is read only sysbench), using caching means a roughly 3x increase in throughput and a roughly 80% drop in response time.

Overall, ScaleArc is a good product in terms of performance and features as well. I would definitely recommend it.

About ScaleArc for MySQL
ScaleArc for MySQL is a software appliance that drops in transparently between applications and databases to improve application availability and performance. It requires no changes to applications or databases and delivers:

  • Instant scale up – transparent connection pooling and multiplexing, TTL-based transparent caching, surge protection
  • Transparent scale out – read/write split, load balancing, query routing, sharding
  • Automatic high availability – automatic failover
  • Real-time actionable analytics

Benchmarking setup
The client machines are running the benchmarking software like sysbench in case of these benchmarks.

CPU: 2 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz (6 cores, chip multithreading off)
Memory: 64G

We used 2 clients. The results of the 2 clients are graphed separately, so it’s visible that they put the same amount of workload on the database or ScaleArc software.

Database machines
CPU: 2 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz (6 cores, chip multithreading off)
Memory: 64G

Running MySQL Community Edition 5.6.15

MySQL configuration

[mysqld]
   max_allowed_packet = 64M
   thread_cache = 256
   query_cache_size = 0
   query_cache_type = 0
   max_connections = 20020
   max_user_connections = 20000
   max_connect_errors = 99999999
   wait_timeout = 28800
   interactive_timeout = 28800
   log-error=/var/lib/mysql/mysql.err
   back_log=60000
   innodb_buffer_pool_size = 3G
   innodb_additional_mem_pool_size = 16M
   innodb_log_buffer_size = 8M
   innodb_flush_log_at_trx_commit = 0
   innodb_flush_method = O_DIRECT
   innodb_open_files = 2000
   innodb_file_per_table
   innodb_log_file_size=2G
   innodb_log_files_in_group=2
   innodb_purge_threads=1
   innodb_max_purge_lag=0
   innodb_support_xa=0
   innodb_locks_unsafe_for_binlog = 1
   innodb_buffer_pool_instances=8
   sql_mode=NO_ENGINE_SUBSTITUTION,STRICT_TRANS_TABLES

The buffer pool of the database is intentionally small, so it’s easy to generate a disk-bound workload.

Please note that the following settings are not recommended in production.

innodb_support_xa=0
   innodb_locks_unsafe_for_binlog = 1

 

We used these settings to drive the node to its peak performance, avoiding any possible overhead which might be required on a production system. In typical production settings, these are not set, and binary logging is enabled, which potentially reduces ScaleArc’s overhead further.

ScaleArc software appliances
CPU: 1 x Intel(R) Xeon(R) CPU E5-2620 v2 @ 2.10GHz (6 cores, chip multithreading off)
Memory: 64G

The machines were running ScaleArc for MySQL 3.0.

Network
The machines were connected using 10G connections.

Measurements
All of the measurements were done with a very small database that completely fits in memory.

--oltp-table-size=10000
  --oltp-tables-count=64

 

In these benchmarks, we expected both the database and ScaleArc to be CPU bound. In case of a disk-bound workload, ScaleArc would shine even more than in this benchmark. If the queries are more expensive (they have to hit storage), the overhead in % is smaller, and in case of caching the query-by-query benefit is bigger.

We measured 3 different setups, both on read-only and read-write cases. These are the following.

  • Direct connection to the database.
  • Connection to the database through ScaleArc, where ScaleArc only acts as a pass-through filter (since it’s a load balancer that speaks the MySQL wire protocol, all the mechanics for that are still in place). Please note that this setup doesn’t make sense in real life. The purpose of this setup is to show the potential overhead of using ScaleArc and uncovering potential limitations of the ScaleArc software itself.
  • Connection to database through ScaleArc, where ScaleArc is allowed to cache. Caching in ScaleArc is TTL (Time To Live)-based caching, meaning that a read query’s results are cached in ScaleArc. If that read query is seen again before expiring, the query is not run again on the database server but rather served from the cache. Once the timer for the cached query expires, the query will be issued on the database again. Caching of course only works for reads, which are not in an explicit transaction (autocommit is on and no START TRANSACTION is issued). Because of that, we used –oltp-skip-trx during cached benchmarks (read-only case). In case of these benchmarks, the TTL was 1 hour, because we wanted to saturate the ScaleArc software while serving cached queries. An 1 hour TTL might be unrealistic for some applications, while for other applications even an 1 day TTL is something they can live with for some queries. In this case, we wanted to measure the cache’s performance, so we wanted the queries to be cached during the entire benchmark run to show the potential gain even in case of very small queries.

TTL-based caching
It’s important to note that the cache’s expiration is controlled by a TTL value – there is no other invalidation, so it’s possible to read stale data when the query results is changed, but the cache is not expired. Reading stale data alone is ok for most applications, it can happen with a regular, asynchronous slave if it’s lagging behind the master (and it always lagging behind somewhat). Otherwise, the cache is pretty similar to MySQL’s query cache, which doesn’t suffer from the stale read problem, but it has a coarse invalidation (if a table is written, the cache entries belonging to the given table are flushed). While the cache is flushed, the query cache mutex is held, which blocks reads even. Because of the mutex, the built-in query cache is a very usual performance bottleneck. ScaleArc’s cache doesn’t suffer from this.

It’s important to note that ScaleArc caches nothing by default. Also, there are other ways to invlidate cache entries apart from waiting for the TTL to expire.

  • API Call based invalidation (you can clear the cache for an entire query pattern rule with one API call)
  • Query comment based invalidation (you can put a comment /*wipe*/ before a query and wipe and refresh the cache)
  • Cache Bypass (you can send a comment /*nocache*/ and bypass the cache for that specific query)

Read-only
Sysbench throughput

sysbench_image6

In the lower region of threads (up to 32), we see that the TPS value significantly drops in case of going through ScaleArc. That’s nothing to be surprised about, the reason for that is network roundtrips. Because ScaleArc is a software appliance, it adds a hop between the database and the application, which introduces latency. If the number of threads is higher (32 and up), this starts to matter less and less, and performance is almost identical which is very impressive. It means that around the optimal degree of parallelism for these machines, ScaleArc introduces very little (barely measurable) overhead.

Sysbench response time
sysbench_image7
This graph contains the response times belonging to the previous benchmarks. This is really hard to read because at 4096 threads, the system is overloaded, and the response time is much more than in the maximum throughput region. Because it’s multiple orders of magnitude higher, the interesting response times are not readable from this graph.

sysbench_image8

The following graph is the same as above, except that the y axis is limited to 250 ms, so the region which is not visible on the graph above is visible here. What we see there regarding the overhead is pretty much the same as we saw in case of the throughput graph, which means that ScaleArc by itself introduces immeasurably low latency (which explains the difference in cases when parallelism is low). Usually applications which are utilizing the database server are using significantly more than one thread (in MySQL a single query always uses a single thread, in other words there is no intra-query parallelism). The latency from 32 threads above is actually somewhat lower when going through ScaleArc (the exact tipping point can be different here based on the number of CPUs). The reason for that is ScaleArc itself uses an event loop to connect to MySQL, so at a high concurrency, and can schedule sending the traffic to MySQL differently. This only matters when otherwise the MySQL server is saturated CPU-wise.

CPU utilization

sysbench_image9

Last but not least, this graph contains the CPU utilization of the different setups. The left-hand side shows the CPU utilization when connecting directly to the database, and the right-hand side shows connecting through ScaleArc. In both cases, the database server’s CPU is the bottleneck. It’s visible that the client node’s CPU is more than 75% idle (only client1 is graphed to improve readability, client2 is practically the same). From 32 threads and up, the blue bar (CPU user%) is relatively high on the database servers, as is the green (CPU sys%). From 64 threads, the idle time is practically 0, until the systems are overloaded. On the right hand side, we can see that ScaleArc at this load still had 50% idle CPU, which means that we could practically do the same benchmark on another set of boxes through the very same ScaleArc, and only then it will be fully utilized. We are talking about 3000 sysbench tps here. One more interesting thing to note is the relatively high system time of ibd. This is also because of the way ScaleArc connects to the database (see the previous paragraph).

[  17s] threads: 64, tps: 3001.98, reads/s: 41962.70, writes/s: 0.00, response time: 35.22ms (95%)

 

These threads are from a single client, which means that ScaleArc could keep up with parsing roughly 84000 statements / second with utilizing half of its CPU, which is impressive. Please note that the ScaleArc software in this case was tuned towards this type of workload, which means we had more query processing threads. In case of caching, we will have more cache handler threads.

Effects of caching on read-only workload
Sysbench throughput

The next set of graphs will compare the cases when cache is used and not used.

sysbench_image11

The preceding TPS graph contains reads / second (because we measured with –oltp-skip-trx), so roughly 42000 reads corresponds to roughly 3000 transactions in the earlier setup (14 reads in a transaction). On the left-hand side of the graph, the cached throughput is visible with green – on the right-hand side, the non-cached throughput is visible with red (direct access) and blue (access through ScaleArc as a pass-through filter). It’s visible that caching improves the speed drastically, but when ScaleArc becomes overloaded (8192 client threads, 4096 from each client), the performance becomes somewhat inconsistent, which is understandable considering how few cores ScaleArc was running on. On the graph, the dots are translucent, which means the colors are brighter in the areas that have more samples. Even in the overloaded case, the majority of the samples are in the region of 100k+ reads / second across two clients, which means that the performance degrades very gracefully even under heavy load.

Sysbench response time

sysbench_image12

Like in the case of a non-cached workload, the response times are not too readable because of the very high response times when the systems are overloaded. But from the overloaded response times visible, it seems like using caching doesn’t make response times worse.

sysbench_Image13

Like in the case of non-cached workload, this graph is the zoomed version of the previous one. Here the maximum of the y axis is 100 ms. From this graph, it’s visible that at lower concurrency and at the optimal throughput, caching actually helps response time. This is understandable, since in case of a cache hit, ScaleArc can serve the results, and the client (in our case here sysbench) doesn’t have to go to the database, so a roundtrip and database processing time is spared. It’s also worth mentioning that the data “comes from memory,” it doesn’t matter if we hit the ScaleArc cache of the database. When the ScaleArc cache is used, the response time is lower because the additional roundtrip to the database and potential database work (like parsing SQL) is avoided. This means that caching can have benefits even if the database fits in the buffer pool. The improvement is always subject to the workload – caching helps the most when it can cache relatively expensive queries like aggregations and queries hitting the storage.

CPU utilization

sysbench_image14

Similarly to the previous case, the preceding graph shows CPU utilization of the various components. In case of the cached workload, the client itself is much more utilized (since it gets responses sooner, it has to generate the traffic faster). With this kind of workload, when using only one client, we would hit the client’s CPU as the performance bottleneck. The database is interesting too. With caching, its CPU is barely used. This is because if a query is served from the cache, it never gets to the database, so the database’s CPU utilization will be lower. In other words, using the cache helps to offload the database. If offloading is visible on ScaleArc’s graphs, when caching is used, the CPU on the server hosting ScaleArc is much more utilized. For this benchmark, the ScaleArc software was tuned to handle a cached workload, which means more cache handler threads.

Read-write
For read-write benchmarks, we had to create oltp_nontran.lua, which is the same sysbench benchmark as oltp.lua, except that it does the reads outside of the transaction and does only the writes in transaction, so caching can have an effect on read. The rest of the benchmarking setup is the same as the read-only case.

Sysbench throughput

sysbench_image15

Similarly to the read-only case, at a low concurrency, the overhead of ScaleArc is coming from the additional network roundtrip. At the optimal concurrency, the overhead is barely measurable (the dots are plotted practically on top of each other).

Sysbench response time

sysbench_image16

sysbench_image17

The case is pretty similar with the response times as in the read-only case. Similarly, the second graph is a zoomed version of the first one, which a 250 ms maximum.

CPU utilization

sysbench_image18

The CPU utilization graph shows that in this case, the database server’s CPU is the bottleneck. What is interesting is that ScaleArc is using less CPU than in the read-only case. This is understandable, since a transaction now contains writes as well, which are expensive on the database side, but they are still just statements to route on the ScaleArc side.

Effects of caching on read-write workload
Measuring caching here is interesting because the workload is no longer read-only of mostly reads. We have a very significant amount of writes.

sysbench_image19

For 30k reads, we get 8,5k writes. It’s expected that caching won’t help as much as in the previous case, because writes can’t be cached and while they are in process, the benchmarking threads can’t proceed with reads. Please note that this means that roughly 25% of the traffic is write, a typical application scaling out with additional slaves for reads doesn’t have this kind of read-to-write ratio.

Sysbench throughput

sysbench_image20

The first graph shows that in terms of total throughput, caching still helps.

Sysbench response time

sysbench_image21

sysbench_image22

Similarly to the read-only case, caching also helps response time, because it reduces the time needed for the read part of the workload.

CPU utilization

sysbench_image24

This test really stresses the database server’s CPU when not caching. With caching on, similarly to the read-only case, the client’s workload increases somewhat (but not as much), and the database server’s CPU usage decreases significantly. In the last row, the CPU utilization of ScaleArc shows that although it’s somewhat higher with caching, it’s still not that much higher.

From these tests it’s visible that caching can still be beneficial even if the write ratio is as high as in this test.

Conclusion
Engineering is always about making the right tradeoffs. If one wants features that needs a protocol-level load balancer like ScaleArc, the price should be paid in the overhead of Layer 7 parsing and decision making. ScaleArc’s engineering team did a great job minimizing this overhead. ScaleArc itself is very well tunable for different workload types (if caching is important, ScaleArc can be tuned for caching – if query rewriting, ScaleArc can be tuned for that).

The post ScaleArc: Benchmarking with sysbench appeared first on MySQL Performance Blog.

How to add an existing Percona XtraDB Cluster to Percona ClusterControl

In my last blog post I explained how to use Percona ClusterControl to create a new Percona XtraDB Cluster from scratch. That’s a good option when you want to create a testing environment in just some mouse clicks. In this case I’m going to show you how to add your existing cluster to Percona ClusterControl so you can manage and monitor it on the web interface.

The environment will be pretty similar, we will have UI, CMON and 3 XtraDB Cluster nodes. The cluster should be already running and Percona ClusterControl also installed.

Adding an existing Cluster

The ClusterControl web interface is empty, there are no clusters on it. To add an existing one we need to click on “Add existing Galera Cluster.” (Click on the image for an enlarged view).

 

 

Add Existing ClusterA new form will be shown pretty similar to the one we saw last time when we were creating a new cluster. We can divide the form in two parts. First we need to give information about our Cluster. The info requires is the Linux distribution and version, IP of PXC nodes and MySQL root passwords. Pretty easy:

 

 

 

 

FromIn the second part we have the SSH configuration. There is one pre-requisite, the UI server should be able to connect to all servers using a SSH key. Therefore, our first step is to create a SSH key pair in our UI server and copy the public one to all other servers.

 

 

It is also necessary to add the private key in the web interface. You can do it using the form shown after clicking on “Add Key Pair”:

Add key pair

Once the key is added, we can verify the access:

Access Check

As we can see here, everything works as expected and all servers are reachable by SSH. The parameter “Create shared SSH key” also needs to be enabled. That option will make ClusterControl to create a new SSH key pair on CMON node so this one can also connect to PXC nodes with passwordless SSH.

Now everything is prepared. We can proceed with the deployment. Just click on “Add cluster” and the installation process will start. While the installation is in progress you will see this notification:

 

 

Deployment notification 

 

Clicking on it we can see the progress of the deployment:

Progress

After some minutes our PXC is shown in the Percona ClusterControl UI:

Cluster

Now we can monitor it, get alerts, clone, run backups and everything from the web interface. You can also add multiple clusters and create new ones.

The post How to add an existing Percona XtraDB Cluster to Percona ClusterControl appeared first on MySQL Performance Blog.

Facebook’s Yoshinori Matsunobu on MySQL, WebScaleSQL & Percona Live

Facebook's Yoshinori Matsunobu

Facebook’s Yoshinori Matsunobu

I spoke with Facebook database engineer Yoshinori Matsunobu here at Percona Live 2014 today about a range of topics, including MySQL at Facebook, the company’s recent move to WebScaleSQL, new MySQL flash storage technologies – and why attending the Percona Live MySQL Conference and Expo each year is very important to him.

Facebook engineers are hosting several sessions at this year’s conference and all have been standing room only. That’s not too surprising considering that Facebook, the undisputed king of online social networks, has 1.23 billion monthly active users collectively contributing to an ocean of data-intensive tasks – making the company one of the world’s top MySQL users. And they are sharing what they’ve learned in doing so this week.

You can read my previous post where I interviewed five Facebook MySQL experts (Steaphan Greene, Evan Elias, Shlomo Priymak, Yoshinori Matsunobu and Mark Callaghan) and below is my short video interview with Yoshi, who will lead his third and final session of the conference today at 12:50 p.m. Pacific time titled, “Global Transaction ID at Facebook.”

The post Facebook’s Yoshinori Matsunobu on MySQL, WebScaleSQL & Percona Live appeared first on MySQL Performance Blog.

Optimizing MySQL Performance: Choosing the Right Tool for the Job

Next Wednesday, I will present a webinar about MySQL performance profiling tools that every MySQL DBA should know.

Application performance is a key aspect of ensuring a good experience for your end users. But finding and fixing performance bottlenecks is difficult in the complex systems that define today’s web applications. Having a method and knowing how to use the tools available can significantly reduce the amount of time between problems manifesting and fixes being deployed.

In the webinar, titled “Optimizing MySQL Performance: Choosing the Right Tool for the Job,” we’ll start with the basic top, iostat, and vmstat then move onto advanced tools like GDB, Oprofile, and Strace.

I’m looking forward to this webinar and invite you to join us April 16th at 10 a.m. Pacific time. You can learn more and also register here to reserve your spot. I also invite you to submit questions ahead of time by leaving them in the comments section below. Thanks for reading and see you next Wednesday!

The post Optimizing MySQL Performance: Choosing the Right Tool for the Job appeared first on MySQL Performance Blog.

OpenSSL heartbleed CVE-2014-0160 – Data leaks make my heart bleed

The heartbleed bug was introduced in OpenSSL 1.0.1 and is present in

  • 1.0.1
  • 1.0.1a
  • 1.0.1b
  • 1.0.1c
  • 1.0.1d
  • 1.0.1e
  • 1.0.1f

The bug is not present in 1.0.1g, nor is it present in the 1.0.0 branch nor the 0.9.8 branch of OpenSSL some sources report 1.0.2-beta is also affected by this bug at the time of writing, however it is a beta product and I would really recommend not to use beta quality releases for something as fundamentally important as OpenSSL in production.

The bug itself is within the heartbeat extension of OpenSSL (RFC6520). The bug allows an attacker to leak the memory in up to 64k chunks, this is not to say the data being leaked is limited to 64k as the attacker can continually abuse this bug to leak data, until they are satisfied with what has been recovered.

At worst the attacker can retrieve the private keys, the implications for which is that the attacker now has the keys to decrypt the encrypted data, as such the only way to be 100% certain of protection against this bug is to first update OpenSSL (>= 1.0.1g) and then revoke and regenerate new keys and certificates, expect to see a tirade of revocations and re-issuing of CA certs and the like in the coming days.

So how does this affect you as a MySQL user?

Taking Percona Server as an example, this is linked against OpenSSL, meaning if you want to use TLS for your client connections and/or your replication connections you’re going to need to have openSSL installed.

You can find your version easily via your package manager for example:

  • rpm -q openssl
  • dpkg-query -W openssl

If you’re running a vulnerable installation of OpenSSL an update will be required.

  • update OpenSSL >= 1.0.1g
  1. 1.0.1e-2+deb7u5 is reported as patched on debian,
  2. 1.0.1e-16.el6_5.7 is reported as patched in RedHat and CentOS
  3. 1.0.1e-37.66 changelogs note this has been patched on Amazon AMI
  • shutdown mysqld
  • regenerate keys and certs used by mysql for TLS connections (revoking the old certs if possible to do so)
  • start mysqld

You can read more about the heartbleed bug at heartbleed.com Redhat Bugzilla Mitre CVE filing Ubuntu Security Notice

The post OpenSSL heartbleed CVE-2014-0160 – Data leaks make my heart bleed appeared first on MySQL Performance Blog.

Heartbleed: Separating FAQ From FUD

If you’ve been following this blog (my colleague, David Busby, posted about it yesterday) or any tech news outlet in the past few days, you’ve probably seen some mention of the “Heartbleed” vulnerability in certain versions of the OpenSSL library.

So what is ‘Heartbleed’, really?

In short, Heartbleed is an information-leak issue. An attacker can exploit this bug to retrieve the contents of a server’s memory without any need for local access. According to the researchers that discovered it, this can be done without leaving any trace of compromise on the system. In other words, if you’re vulnerable, they can steal your keys and you won’t even notice that they’ve gone missing. I use the word “keys” literally here; by being able to access the contents of the impacted service’s memory, the attacker is able to retrieve, among other things, private encryption keys for SSL/TLS-based services, which means that the attacker would be able to decrypt the communications, impersonate other users (see here, for example, for a session hijacking attack based on this bug), and generally gain access to data which is otherwise believed to be secure. This is a big deal. It isn’t often that bugs have their own dedicated websites and domain names, but this one does: http://www.heartbleed.com

Why is it such a big deal?

One, because it has apparently been in existence since at least 2012. Two, because SSL encryption is widespread across the Internet. And three, because there’s no way to know if your keys have been compromised. The best detection that currently exists for this are some Snort rules, but if you’re not using Snort or some other IDS, then you’re basically in the dark.

What kinds of services are impacted?

Any software that uses the SSL/TLS features of a vulnerable version of OpenSSL. This means Apache, NGINX, Percona Server, MariaDB, the commercial version of MySQL 5.6.6+, Dovecot, Postfix, SSL/TLS VPN software (OpenVPN, for example), instant-messaging clients, and many more. Also, software packages that bundle their own vulnerable version of SSL rather than relying on the system’s version, which might be patched. In other words, it’s probably easier to explain what isn’t affected.

What’s NOT impacted?

SSH does not use SSL/TLS, so you’re OK there. If you downloaded a binary installation of MySQL community from Oracle, you’re also OK, because the community builds use yaSSL, which is not known to be vulnerable to this bug. Obviously, any service which doesn’t use SSL/TLS is not going to be vulnerable, either, since the salient code paths aren’t going to be executed. So, for example, if you don’t use SSL for your MySQL connections, then this bug isn’t going to affect your database server, although it probably still impacts you in other ways (e.g., your web servers).

What about Amazon cloud services?

According to Amazon’s security bulletin on the issue, all vulnerable services have been patched, but they still recommend that you rotate your SSL certificates.

Do I need to upgrade Percona Server, MySQL, NGINX, Apache, or other server software?

No, not unless you built any of these and statically-linked them with a vulnerable version of OpenSSL. This is not common. 99% of affected users can fix this issue by upgrading their OpenSSL libraries and cycling their keys/certificates.

What about client-level tools, like Percona Toolkit or XtraBackup?

Again, no. The client sides of Percona Toolkit, Percona XtraBackup, and Percona Cloud Tools (PCT) are not impacted. Moreover, the PCT website has already been patched. Encrypted backups are still secure.

There are some conflicting reports out there about exactly how much information leakage this bug allows. What’s the deal?

Some of the news articles and blogs claim that with this bug, any piece of the server’s memory can be accessed. Others have stated that the disclosure is limited to memory space owned by processes using a vulnerable version of OpenSSL. As far as we are aware, and as reported in CERT’s Vulnerability Notes Database, the impact of the bug is the latter; i.e., it is NOT possible for an attacker to use this exploit to retrieve arbitrary bits of your server’s memory, only bits of memory from your vulnerable services. That said, your vulnerable services could still leak information that attackers could exploit in other ways.

How do I know if I’m affected?

You can test your web server at http://filippo.io/Heartbleed/ – enter your site’s URL and see what it says. If you have Go installed, you can also get a command-line tool from Github and test from the privacy of your own workstation. There’s also a Python implementation. You can also check the version of OpenSSL that’s installed on your servers. If you’re running OpenSSL 1.0.1 through 1.0.1f or 1.0.2-beta, you’re vulnerable. (Side note here: some distributions, such as RHEL/CentOS, have patched the issue without actually updating the OpenSSL version number; on RPM-based systems you can view the changelog to see if it’s been fixed, for example:

rpm -q --changelog openssl | head -2
* Mon Apr 07 2014 Tomáš Mráz <tmraz@redhat.com> 1.0.1e-16.7
- fix CVE-2014-0160 - information disclosure in TLS heartbeat extension

Also, note that versions of OpenSSL prior to 1.0.1 are not known to be vulnerable. If you’re still unsure, we would be happy to assist you in determining the extent of the issue in your environment and taking any required corrective action. Just give us a call.

How can I know if I’ve been compromised?

If you’ve already been compromised, you won’t. However, if you use Snort as an IDS, you can use some rules developed by Fox-IT to detect and defer new attacks; other IDS/IPS vendors may have similar rule updates available. Without some sort of IDS in place, however, attacks can and will go unnoticed.

Are there any exploits for this currently out there?

Currently, yes, there are some proofs-of-concept floating around out there, although before this week, that was uncertain. But given that this is likely a 2-year-old bug, I would be quite surprised if someone, somewhere (you came to my talk at Percona Live last week, didn’t you? Remember what I said about assuming that you’re already owned?) didn’t have a solid, viable exploit.

So, then, what should I do?

Ubuntu, RedHat/CentOS, Amazon, and Fedora have already released patched versions of the OpenSSL library. Upgrade now. Do it now, as in right now. If you’ve compiled your own version of OpenSSL from source, upgrade to 1.0.1g or rebuild your existing source with the -DOPENSSL_NO_HEARTBEATS flag.

Once that’s been done, stop any certificate-using services, regenerate the private keys for any services that use SSL (Apache, MySQL, whatever), and then obtain a new SSL certificate from whatever certificate authority you’re using. Once the new certificates are installed, restart your services. You can also, of course, do the private key regeneration and certificate cycling on a separate machine, and only bring the service down long enough to install the new keys and certificates. Yes, you will need to restart MySQL. Or Apache. Or whatever. Full stop, full start, no SIGHUP (or equivalent).

Unfortunately, that’s still not all. Keeping in mind the nature of this bug, you should also change / reset any user passwords and/or user sessions that were in effect prior to patching your system and recycling your keys. See, for example, the session hijacking exploit referenced above. Also note that Google services were, prior to their patching of the bug, vulnerable to this issue, which means that it’s entirely possible that your Gmail login (or any other Google-related login) could have been compromised.

Can I get away with just upgrading OpenSSL?

NO. At a bare minimum, you will need to restart your services, but in order to really be sure you’ve fixed the problem, you also need to cycle your keys and certificates (and revoke your old ones, if possible). This is the time-consuming part, but since you have no way of knowing whether or not someone has previously compromised your private keys (and you can bet that now that the bug is public, there will be a lot of would-be miscreants out there looking for servers to abuse), the only safe thing to do is cycle them. You wouldn’t leave your locks unchanged after being burgled, would you?

Also note that once you do upgrade OpenSSL, you can get a quick list of the services that need to be restarted by running the following:

sudo lsof | grep ssl | grep DEL

Where can I get help and/or more information?

In addition to the assorted links already mentioned, you can read up on the nuts and bolts of this bug, or various news articles such as this or this. There are a lot of articles out there right now, most of which are characterizing this as a major issue. It is. Test your vulnerability, upgrade your OpenSSL and rotate your private keys and certificates, and then change your passwords.

The post Heartbleed: Separating FAQ From FUD appeared first on MySQL Performance Blog.

Percona Monitoring Plugins 1.1.2, now with Amazon RDS support

Percona is glad to announce the release of Percona Monitoring Plugins 1.1.2.

Changelog:

* Added Nagios plugin and Cacti template for Amazon RDS
* Added Nagios config template to the documentation
* Added an option to pmp-check-pt-table-checksum to check MAX(ts) of latest checksum
* Added generic Nagios plugin for PT tables
* Extended pmp-check-mysql-processlist with max user connections check
* Zabbix MySQL.running-slave item failed with MySQL 5.6 (bug 1272358)
* ss_get_mysql_stats and MariaDB does not use have_response_time_distribution (bug 1285888)
* Cacti Monitoring plugins and SNMP via TCP (bug 1268552)

Shortly about the new features.

Amazon RDS support in Nagios and Cacti.
amazon-web-services We have created the Nagios plugin and Cacti template to give you a chance to add Amazon RDS instances into your monitoring system. The plugins are getting various RDS metrics from Amazon CloudWatch. With Nagios you can have the following checks per instance: RDS Status, RDS Load Average, RDS Free Storage, RDS Free Memory. Also check out the examples of Cacti RDS graphs.

Nagios config template. We never have a good template of Nagios config in the documentation with examples on how to put together the plugins. So we have updated Configuration Best Practices section with it.

Freshness check for checksumming. A new option of the plugin pmp-check-pt-table-checksum allows you to specify an interval in days over which to ensure pt-table-checksum was run. It is useful in cases when the cron job doing the checksumming suddenly stopped working.

New generic Nagios plugin for PT tables. pmp-check-mysql-ts-count looks at a table and counts the number of rows since the last interval, and alerts if this exceeds the threshold. This could be the table referenced by tools from Percona Toolkit such as pt-deadlock-logger, pt-kill, pt-fk-error-logger, or a custom table supplied.

MySQL max user connections check. The plugin pmp-check-mysql-processlist has got a new option to alert when @@max_user_connections is configured on MySQL and any user reaches this limit.

A new tarball is available from downloads area or RPM and DEB packages from our software repositories. The plugins are fully supported for customers with a Percona Support contract and free installation services are provided as part of some contracts. In addition as part of Percona’s Remote DBA installation and setup of these tools are included with our services. You can find links to the documentation, forums and more at the project homepage.

Percona Monitoring PluginsAbout Percona Monitoring Plugins
Percona Monitoring Plugins are high-quality components to add enterprise-grade MySQL monitoring and graphing capabilities to your existing in-house, on-premises monitoring solutions. The components are designed to integrate seamlessly with widely deployed solutions such as Nagios, Cacti and Zabbix and are delivered in the form of templates, plugins, and scripts which make it easy to monitor MySQL performance.

The post Percona Monitoring Plugins 1.1.2, now with Amazon RDS support appeared first on MySQL Performance Blog.

Tools and tips for analysis of MySQL’s Slow Query Log

MySQL's Slow Query LogMySQL has a nice feature, slow query log, which allows you to log all queries that exceed a predefined about of time to execute. Peter Zaitsev first wrote about this back in 2006 – there have been a few other posts here on the MySQL Performance Blog since then (check this and this, too) but I wanted to revisit his original subject in today’s post.

Query optimization is essential for good database server performance and usually DBAs need to ensure the top performance possible for all queries. In MySQL, the desirable way is to generate a query log for all running queries within a specific time period and then run a query analysis tool to identify the bad queries. Percona Toolkit’s pt-query-digest is one of the most powerful tools for SQL analysis. That’s because pt-query-digest can generate a very comprehensive report that spots problematic queries very efficiently. It works equally well with Oracle MySQL server. This post will focus mainly on pt-query-digest.

Slow query log is great at spotting really slow queries that are good candidates for optimization. Beginning with MySQL 5.1.21, the minimum value is 0 for long_query_time, and the value can be specified to a resolution of microseconds. In Percona Server additional statistics may be output to the slow query log. You can find the full details here. For our clients, we often need to identify queries that impact an application the most. It does not always have to be the slowest queries – queries that runs more frequently with lower execution time per call put more load on a server than queries running with lower frequency. We of course want to get rid of really slow queries but to really optimize application throughput, we also need to investigate queries that generate most of the load. Further, if you enable option log_queries_not_using_indexes  then MySQL will log queries doing full table scans which doesn’t always reflect that the query is slow, because in some situations the query optimizer chooses full table scan rather than using any available index or probably showing all records from a small table.

Our usual recommendation is to generate the slow log with long_query_time=0. This will record all the traffic but this will be I/O intensive and will eat up disk space very quickly depending on your workload. So beware of running with long_query_time=0 for only a specific period of time and revert it back to logging only very slow queries. In Percona Server there is nice option where you can limit the rate of logging, log_slow_rate_limit is the option to handle it. Filtering slow query log is very helpful too in some cases e.g. if we know the main performance issue is table scans we can log queries only doing full table scans or if we see I/O is bottleneck we can collect queries doing full scans and queries creating on disk temporary tables. Again, this is only possible in Percona Server with the log_slow_filter option. Also, you may want to collect everything on slow query log and then filter with pt-query-digest. Depending on I/O capacity, you might prefer one or another way, as collecting everything in slow query log allows us to investigate other queries too if needed. Finally, use pt-query-digest to generate an aggregate report over slow query log which highlights the problematic part very efficiently. Again, pt-query-digest can bring up server load high so our usual recommendation on it is to move slow query log to some staging/dev server and run pt-query-digest over there to generate the report.

Note: changing the long_query_time parameter value only affects newly created connections to log queries exceeds long_query_time threshold. In Percona Server there is feature which changes variable scope to global instead of local. Enabling slow_query_log_use_global_control  log queries for connected sessions too after changing long_query_time parameter threshold. You can read more about this patch here.

I am not going to show you a detailed report of pt-query-digest and explain each part of it here, because it is well defined already by my colleague Ovais Tariq in this post. However, I will show you some of the other aspects of pt-query-digest tool here.

Let me show you code snippets that enable slow query log for only a specific time period with long_query_time=0 and log_slow_verbosity to ‘full’. log_slow_verbosity is a Percona Server variable which logs extra stats such as information on query cache, Filesort, temporary tables, InnoDB statistics etc. Once you are done collecting logs, revert back the values for long_query_time to the previous value, and finally run pt-query-digest on the log to generate report. Note: run the below code in same MySQL session.

-- Save previous settings
mysql> SELECT @@global.log_slow_verbosity INTO @__log_slow_verbosity;
mysql> SELECT @@global.long_query_time INTO @__long_query_time;
mysql> SELECT @@global.slow_query_log INTO @__slow_query_log;
mysql> SELECT @@global.log_slow_slave_statements INTO @__log_slow_slave_statements;
-- Keep this in safe place, we'll need to run pt-query-digest
mysql> SELECT NOW() AS "Time Since";
-- Set values to enable query collection
mysql> SET GLOBAL slow_query_log_use_global_control='log_slow_verbosity,long_query_time';
mysql> SET GLOBAL log_slow_verbosity='full';
mysql> SET GLOBAL slow_query_log=1;
mysql> SET GLOBAL long_query_time=0;
mysql> SET GLOBAL log_slow_slave_statements=1;
-- Verify settings are OK
mysql> SELECT @@global.long_query_time, @@global.slow_query_log, @@global.log_slow_verbosity;
-- wait for 30 - 60 minutes
-- Keep this one too, also for pt-query-digest
mysql> SELECT NOW() AS "Time Until";
-- Revert to previous values
mysql> SET GLOBAL slow_query_log=@__slow_query_log;
mysql> SET GLOBAL long_query_time=@__long_query_time;
mysql> SET GLOBAL log_slow_verbosity=@__log_slow_verbosity; -- if percona server
mysql> SET GLOBAL log_slow_slave_statements=@__log_slow_slave_statements;
-- Verify settings are back to previous values
mysql> SELECT @@global.long_query_time, @@global.slow_query_log, @@global.log_slow_verbosity, @@global.slow_query_log_file;
-- Then with pt-query-digest run like (replace values for time-since, time-until and log name)
$ pt-query-digest --since='<time-since>' --until='<time-until>' --limit=100% /path/to/slow_query_log_file.log > /path/to/report.out
-- If you're not using Percona Server then you need to remove all references to log_slow_verbosity, slow_query_log_use_global_control and log_slow_slave_statements (priot MySQL 5.6).

My colleague Bill Karwin wrote bash script that does almost the same as the above code. You can find the script to collect slow logs here. This script doesn’t hold connection to the database session while you wait for logs to accumulate and it sets all the variables back to the state they were before. For full documentation view this.

Further, you can also get explain output into the report from the pt-query-digest tool. For that you need to use –explain parameter similar to as follows.

$ pt-query-digest --explain u=<user>,p=<password>,h=<hostname> /path/to/slow.log > /path/to/report.out

Explain output in query report will get you all the information for query execution plan and explain output signal towards how that particular query going to be executed. Note that, if you execute pt-query-digest over slow query log other than originated server of slow query log as I mentioned above e.g. staging/dev you may get different execution path for the query in the report or lower number of rows to examined, etc., because usually staging/dev servers has different data distribution, different MySQL versions, or different indexes. MySQL explain adds overhead as queries needs to be prepared on the server to generate intended query execution path. For this reason, you may want to run pt-query-digest with –explain on a production replica.

It’s worth mentioning that logging queries with log_slow_verbosity in Percona Server is really handy as it shows lots of additional statistics and it is more helpful in situations when the explain plan reports a different execution path than when the query is executed. On that particular topic, you may want to check this nice post.

pt-query-digest also supports filters. You can read more about it here. Let me show you an example. The following command will discard everything apart from insert/update/delete queries in pt-query-digest output report.

$ pt-query-digest --filter '$event->{arg} =~ m/^(insert|update|delete)/i' --since='<time-since>' --until='<time-until>' --limit=100% /path/to/slow_query_log_file.log > /path/to/report.out

If you’re looking for some GUI tools for pt-query-digest then I would recommend reading this nice blogpost from my colleague Roman. Further, our CEO Peter Zaitsev also wrote a post recently where he shows the comparison between performance_schema and slow query log. Check here for details.

In related new, Percona recently announced Percona Cloud Tools, the next generation of tools for MySQL. It runs a client-side agent (pt-agent) which runs pt-query-digest on the server with some intervals and uploads the aggregated data to the Percona Cloud Tools API which process it further.  Query Analytics is one tool from the Percona Cloud Tools that provides advanced query metrics. It  is a nice visualization tool. You may be interested to learn more about it here, and it’s also worth viewing this related webinar about Percona Cloud Tools from our CTO Vadim Tkachenko.

Conclusion:
pt-query-digest from Percona Toolkit is a versatile (and free) tool for slow query log analysis. It provides good insight about every individual query, especially in Percona Server with log_slow_verbosity enabled, e.g. log queries with microsecond precision, log information about the query’s execution plan. On top of that, Percona Cloud Tools includes Query Analytics which provides you with good visuals about query performance and also provides a view of historical data.

The post Tools and tips for analysis of MySQL’s Slow Query Log appeared first on MySQL Performance Blog.

Percona Server with TokuDB: Packing 15TB into local SSDs

Two weeks ago we released an Alpha release of Percona Server with TokuDB. Right now I am on a final stage of evaluation of TokuDB for using in our project Percona Cloud Tools and it looks promising.

What is the most attractive in TokuDB? For me it is compression, but not just compression: TokuDB provides great performance over compressed data.

In my synthetic tests I saw a compression ratio of 10:1 (TokuDB LZMA to InnoDB uncompressed), in the real production data it is less, 6:1, but still impressive.

In our servers we have 4 x SSD Crucial M500 960GB combined in RAID5, which give 2877.0 GB of usable space. With TokuDB we should be able to pack around 15TB of raw data. Of course we can try InnoDB compression, but the best we can get is 2x compression without sacrificing performance.

And of course TokuDB is transaction, fully ACID-compliant with automatic crash-recovery storage engine.

This all makes TokuDB a very attractive choice for handling terabytes of data (or as it popular to say nowadays “BigData”).

One of first operational questions we have is how to handle backups.
For backups we use LVM partitions and the mylvmbackup tool. Unfortunately Percona XtraBackup is not able to handle TokuDB tables (and probably won’t be able anytime soon). The other choice is to use TokuDB Hot back-up, available by Tokutek Enterprise Subscription. I did not test it myself, so I can’t provide any feedback.

And of course there are things which I do not fully like in TokuDB:

  • No Foreign Keys support. It is not a big issue for us, but I know for some users this is a showstopper.
  • Time-based checkpoints. You may not notice a direct effect from this, but we clearly see it in our benchmarks. Every 60 sec (default timeperiod between checkpoints) we see a drop in throughput during write-intensive benchmarks. It is very similar to drops in InnoDB we tried to solve (and are still trying), for example see Adaptive flushing in MySQL 5.6. My advice to the Tokutek team would be to also look into a fuzzy check-pointing, instead of time-based.
  • All TokuDB files are stored in a single directory, sometime with mangled filenames. This especially becomes bad in sharding or multi-tenant environments when tens of thousands of files are in the same directory

Well, I guess for now, we will take these limitations as TokuDB specific and will deal with them.

Next week we plan on a Beta release of Percona Server with TokuDB, so stay tuned!

The post Percona Server with TokuDB: Packing 15TB into local SSDs appeared first on MySQL Performance Blog.

Percona Server 5.6.16-64.1 is now available

Percona Server version 5.6.16-64.1

Percona Server version 5.6.16-64.1

Percona is glad to announce the release of Percona Server 5.6.16-64.1 on March 17th, 2014 (Downloads are available here and from the Percona Software Repositories.

Based on MySQL 5.6.16, including all the bug fixes in it, Percona Server 5.6.16-64.1 is the current GA release in the Percona Server 5.6 series. All of Percona’s software is open-source and free, all the details of the release can be found in the 5.6.16-64.1 milestone at Launchpad.

Bugs Fixed:

  • After installing the auth_socket plugin any local user might get root access to the server. If you’re using this plugin upgrade is advised. This is a regression, introduced in Percona Server 5.6.11-60.3. Bug fixed #1289599
  • The new client and server packages included files with paths that were conflicting with the ones in mysql-libs package on CentOS. Bug fixed #1278516.
  • A clean installation of Percona-Server-server-55 on CentOS would fail due to a typo in mysql_install_db call. Bug fixed #1291247.
  • libperconaserverclient18.1 Debian/Ubuntu packages depended on multiarch-support, which is not available on all the supported distribution versions. Bug fixed #1291628.
  • The InnoDB file system mutex was being locked incorrectly if Atomic write support for Fusion-io devices was enabled. Bug fixed #1287098.
  • Slave I/O thread wouldn’t attempt to automatically reconnect to the master after a network time-out (error: 1159). Bug fixed #1268729 (upstream #71374).

Renaming the libmysqlclient to libperconaserverclient

This release fixes some of the issues caused by the libmysqlclient rename to libperconaserverclient in Percona Server 5.6.16-64.0. The old name was conflicting with the upstream libmysqlclient.

Except for packaging, libmysqlclient and libperconaserverclient of the same version do not have any differences. Users who previously compiled software against Percona-provided libmysqlclient will either need to install the corresponding package of their distribution, such as distribution or Oracle-provided package for CentOS and libmysqlclient18 for Ubuntu/Debian or recompile against libperconaserverclient. Another workaround option is to create a symlink from libperconaserverclient.so.18.0.0 to libmysqlclient.so.18.0.0.

Release notes for Percona Server 5.6.16-64.1 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.6.16-64.1 is now available appeared first on MySQL Performance Blog.