Percona XtraDB Cluster 5.6.19-25.6 is now available

Percona XtraDB Cluster 5.6.19-25.6Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on July 21st 2014. Binaries are available from downloads area or from our software repositories. We’re also happy to announce that Ubuntu 14.04 LTS users can now download, install, and upgrade Percona XtraDB Cluster 5.6 from Percona’s software repositories.

Based on Percona Server 5.6.19-67.0 including all the bug fixes in it, Galera Replicator 3.6, and on Codership wsrep API 25.6, Percona XtraDB Cluster 5.6.19-25.6 is now the current General Availability release. All of Percona‘s software is open-source and free, and all the details of the release can be found in the 5.6.19-25.6 milestone at Launchpad.

New Features:

  • Percona XtraDB Cluster now supports storing the Primary Component state to disk by setting the pc.recovery variable to true. The Primary Component can then recover automatically when all nodes that were part of the last saved state reestablish communications with each other. This feature can be used for automatic recovery from full cluster crashes, such as in the case of a data center power outage and graceful full cluster restarts without the need for explicitly bootstrapping a new Primary Component.
  • When joining the cluster, the state message exchange provides us with gcache seqno limits. That information is now used to choose a donor through IST first, and, if this is not possible, only then SST is attempted. The wsrep_sst_donor setting is honored, though, and it is also segment aware.
  • An asynchronous replication slave thread was stopped when the node tried to apply the next replication event while the node was in non-primary state. But it would then remain stopped after the node successfully re-joined the cluster. A new variable, wsrep_restart_slave, has been implemented which controls if the MySQL slave should be restarted automatically when the node re-joins the cluster.
  • Handling install message and install state message processing has been improved to make group forming a more stable process in cases when many nodes are joining the cluster.
  • A new wsrep_evs_repl_latency status variable has been implemented which provides the group communication replication latency information.
  • Node consistency issues with foreign key grammar have been fixed. This fix introduces two new variables: wsrep_slave_FK_checks and wsrep_slave_UK_checks. These variables are set to TRUE and FALSE respectively by default. They control whether Foreign Key and Unique Key checking is done for applier threads.

Bugs Fixed:

  • Fixed the race condition in Foreign Key processing that could cause assertion. Bug fixed #1342959.
  • The restart sequence in scripts/mysql.server would fail to capture and return if the start call failed to start the server. As a result, a restart could occur that failed upon start-up, and the script would still return 0 as if it worked without any issues. Bug fixed #1339894.
  • Updating a unique key value could cause the server to hang if a slave node had enabled parallel slaves. Bug fixed #1280896.
  • Percona XtraDB Cluster has implemented threadpool scheduling fixes. Bug fixed #1333348.
  • garbd was returning an incorrect return code, ie. when garbd was already started, return code was 0. Bug fixed #1308103.
  • rsync SST would silently fail on joiner when the rsync server port was already taken. Bug fixed #1099783.
  • When gmcast.listen_addr was configured to a certain address, the local connection point for outgoing connections was not bound to the listen address. This would happen if the OS has multiple interfaces with IP addresses in the same subnet. The OS would pick the wrong IP for a local connection point and other nodes would see connections originating from an IP address which was not listened to. Bug fixed #1240964.
  • An issue with re-setting galera provider (in wsrep_provider_options) has been fixed. Bug fixed #1260283.
  • Variable wsrep_provider_options couldn’t be set in runtime if no provider was loaded. Bug fixed #1260290.
  • Percona XtraDB Cluster couldn’t be built with Bison 3.0. Bug fixed #1262439.
  • MySQL wasn’t handling exceeding the max writeset size wsrep error correctly. Bug fixed #1270920.
  • Fixed the issue which caused a node to hang/fail when SELECTs/SHOW STATUS was run after FLUSH TABLES WITH READ LOCK was used on a node with wsrep_causal_reads set to 1 while there was a DML on other nodes. Bug fixed #1271177.
  • Lowest group communication layer (evs) would fail to handle the situation properly when a large number of nodes would suddenly start recognizing each other. Bugs fixed #1271918 and #1249805.
  • Percona XtraBackup SST would fail if the progress option was used with a large number of files. Bug fixed #1294431.

NOTE: When performing an upgrade from an older 5.6 version on Debian/Ubuntu systems, in order to upgrade the Galera package correctly, you’ll need to pin the Percona repository and run: apt-get install percona-xtradb-cluster-56. This is required because older Galera deb packages have an incorrect version number. The correct wsrep_provider_version after upgrade should be 3.6(r3a949e6).

This release contains 50 fixed bugs. The complete list of fixed bugs can be found in our release notes.

Release notes for Percona XtraDB Cluster 5.6.19-25.6 are available in our online documentation along with the installation and upgrade instructions.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

Percona XtraDB Cluster Errata can be found in our documentation.

[UPDATE 2014-07-24]: Package Percona-XtraDB-Cluster-client-56-5.6.19-25.6.824.el6.x86_64.rpm has been updated to resolve the conflict with Percona-XtraDB-Cluster-devel package.

The post Percona XtraDB Cluster 5.6.19-25.6 is now available appeared first on MySQL Performance Blog.

Reference architecture for a write-intensive MySQL deployment

We designed Percona Cloud Tools (both hardware and software setup) to handle a very high-intensive MySQL write workload. For example, we already observe inserts of 1bln+ datapoints per day. So I wanted to share what kind of hardware we use to achieve this result.

Let me describe what we use, and later I will explain why.

Server:

  • Chassis: Supermicro SC825TQ-R740LPB 2U Rackmount Chassis
  • Motherboard: Supermicro X9DRI-F dual socket
  • CPU: Dual Intel Xeon Ivy Bridge E5-2643v2 (6x 3.5Ghz cores, 12x HT cores, 25M L3)
  • Memory: 256GB (16x 16GB 256-bit quad-channel) ECC registered DDR3-1600
  • Raid: LSI MegaRAID 9260-4i 4-port 6G/s hardware RAID controller, 512M buffer
  • MainStorage: PCIe SSD HGST FlashMAX II 4.8TB
  • Secondary Storage (OS, logs): RAID 1 over 2x 3TB hard drives

Software:

When selecting hardware for your application, you need to look at many aspects – typically you’re looking for a solution for which you already have experience in working with and has also proved to be the most efficient option. For us it has been as follows:

Cloud vs Bare Metal
We have experience having hardware hosted at the data center as well as cash for upfront investments in hardware so we decided to go for physical self-hosted hardware instead of the cloud. Going this route also gave us maximum flexibility in choosing a hardware setup that was the most optimal for our application rather than selecting one of the stock options.

Scale Up vs Scale Out
We have designed a system from scratch to be able to utilize multiple servers through sharding – so our main concern is choosing the most optimal configuration for the server and provisioning servers as needed. In addition to raw performance we also need to consider power usage and overhead of managing many servers which typically makes having slightly more high-end hardware worth it.

Resource Usage
Every application uses resources in different ways so an optimal configuration will be different depending on your application. Yet all applications use the same resources you need to consider. Typically you want to plan for all of your resources to be substantially used – providing some margin for spikes and maintenance.

CPU

  • Our application processes a lot of data and uses the TokuDB storage engine which uses a lot of CPU for compression, so we needed powerful CPUs.
  • Many MySQL functions are not parallel, think executing single query or Alter table so we’re going for CPU with faster cores rather than larger amount of cores. The resulting configuration with 2 sockets giving 12 cores and 24 threads is good enough for our workloads.
  • Lower end CPUs such as Xeon E3 have very attractive price/performance but only support 32GB of memory which was not enough for our application.

Memory

  • For database boxes memory is mainly used as a cache, so depending on your application you may be better off investing in memory or storage for optimal performance. Check out this blog post for more details.
  • Accessing data in memory is much faster than even on the fastest flash storage so it is still important.
    For our workload having recent data in memory is very important so we get as much “cheap” memory as we can populating all 16 slots with 16GB dimms which have attractive cost per GB at this point.

Storage
There are multiple uses for the storage so there are many variables to consider

  • Bandwidth
    • We need to be able access data on the storage device quickly and with stable response time. HGST FlashMax II has been able to meet these very demanding needs.
  • Endurance
    • When using flash storage you need to worry about endurance – how much beating with writes flash storage can handle before it wears out. Some low cost MLC SSDs would wear out in the time frame of weeks if being written with maximum speed. HGST FlashMax II has endurance rating of 10 Petabytes written (for a random workload) – 30 Petabytes written (for a sequential workload)
    • We also use TokuDB storage engine which significantly reduces amount of writes compared to Innodb.
  • Durability
    • Does the storage provide true durability with data guaranteed to be persisted when write is acknowledged at the operating system level when power goes down or is loss possible?
      We do not want to risk database corruption in case of power failure so we were looking for storage solution which guarantees durability.
      HGST FlashMax II guarantees durability which has been confirmed by our stress tests.
  • Size
    • To scale application storage demands you need to scale both number of IO operations storage can handle and storage size. For flash storage it is often the size which becomes limiting factor.
      HGST FlashMax II 4.8 TB capacity is best available on the market which allows us to go “All Flash” and achieve very quick data access to all our data set.
  • Secondary Storage
    • Not every application need requires flash storage properties.
    • We have secondary storage with conventional drives for operating system and logs.
      Sequential read/write pattern works well with low cost conventional drives and also allow us to increase flash life time, having it handling less writes.
    • We’re using RAID with BBU for secondary storage to be able to have fully durable binary logs without paying high performance penalty.

Why PCIe SSD over SATA SSD?
There are arguments that SATA SSD provides just a good enough performance for MySQL and there is no need for PCIe. While these arguments are valid in one dimension, there are several more to consider.

First, like I said PCIe SSD still provides a best absolute response time and it is an important factor for an end user experience in SaaS systems like Percona Cloud Tools.
Second, consider maintenance operations like backup, ALTER TABLES or slave setups. While these operations are boring and do not get as much attention as a response time or throughput in benchmarks, it is still operations that DBAs performs basically daily, and it is very important to finish a backup or ALTER TABLE in a predictable time, especially on 3-4TB datasize range. And this is where PCIe SSD performs much better than SATA SSDs. For SATA SSD, especially bigger size, write endurance is another point of concern.

Why TokuDB engine?
The TokuDB engine is the best when it comes to insert operations to a huge dataset, and few more factors makes it a no-brainer:

  • TokuDB compression is a huge win. I estimate into this storage ( FlashMAX II 4.8TB) we will fit about 20-30TB of raw data.
  • TokuDB is SSD friendly, as it performs much less data writes per INSERT operation than InnoDB, which greatly extends SSD (which is, well, expensive to say the least) lifetime.

The post Reference architecture for a write-intensive MySQL deployment appeared first on MySQL Performance Blog.

Why TokuDB hates Transparent HugePages

If you try to install the TokuDB storage engine on a modern Linux distribution it might fail with following error message:

2014-07-17 19:02:55 13865 [ERROR] TokuDB will not run with transparent huge pages enabled.
2014-07-17 19:02:55 13865 [ERROR] Please disable them to continue.
2014-07-17 19:02:55 13865 [ERROR] (echo never > /sys/kernel/mm/transparent_hugepage/enabled)

You might be curious why TokuDB refuses to start with Transparent HugePages. Are they not a good thing… allowing smaller kernel page tables and less TLB misses when accessing data in the buffer pool? I was curious, so I asked Tim Callaghan this very question.

This problem originates with TokuDB using jemalloc memory allocator, which uses a particular trick to deal with memory fragmentation. The classical problem with memory allocators is fragmentation – if you allocated a say 2MB chunk from the operating system (typically using mmap),  as the process runs it is likely some of that 2MB memory block will become free but not all of it, hence it can’t be given back to operating system completely. jemalloc uses a clever trick being able to give back portions of memory allocated in such a way through madvise(…, MADV_DONTNEED) call.

Now what happens when you use transparent huge pages? In this case the operating system (and CPU, really) works with pages of a much larger size which only can be unmapped from the address space in its entirety – which does not work when smaller objects are freed which produce smaller free “holes.”

As a result, without being able to free memory efficiently the amount of allocated memory may grow unbound until the process starts to swap out – and in the end being killed by “out of memory” killer at least under some workloads. This is not a behavior you want to see from the database server. As such requiring to disable huge pages is a better choice.

Having said that this is pretty crude requirement/solution – disabling huge pages on complete operating system image to make one application work while others might be negatively impacted. I hope with a future jemalloc version/kernel releases there will be solution where jemalloc simply prevents huge pages usage for its allocations.

Using jemalloc and its approach to remove pages from resident space also makes TokuDB a lot different than typical MySQL instances running Innodb from the process space. With Innodb VSZ and RSS are often close. In fact we often monitor VSZ to ensure it is not excessively large to avoid danger of process starting to swap actively or be killed with OOM killer. TokuDB however often can look like this

[root@smt1 mysql]# ps aux | grep mysqld
mysql 14604 21.8 50.6 12922416 4083016 pts/0 Sl Jul17 1453:27 /usr/sbin/mysqld –basedir=/usr –datadir=/var/lib/mysql –plugin-dir=/usr/lib64/mysql/plugin –user=mysql –log-error=/var/lib/mysql/smt1.pz.percona.com.err –pid-file=/var/lib/mysql/smt1.pz.percona.com.pid
root 28937 0.0 0.0 103244 852 pts/2 S+ 10:38 0:00 grep mysqld

In this case TokuDB is run with defaults on 8GB system – it takes approximately 50% of memory in terms of RSS size, however the VSZ of the process is over 12GB – this is a lot more than memory available.

This is completely fine for TokuDB. If I would not have Transparent HugePages disabled, though, my RSS would be a lot closer to VSZ causing intense swapping or even process killed by OOM killer.

In addition to explaining this to me, Tim Callaghan was also kind enough to share some links on this issue from other companies such as Oracle, NuoDB , Splunk, SAP, SAP(2), which provide more background information on this topic.

The post Why TokuDB hates Transparent HugePages appeared first on MySQL Performance Blog.

DBaaS, OpenStack and Trove 101: Introduction to the basics

We’ll be publishing a series of posts on OpenStack and Trove over the next few weeks, diving into their usage and purpose. For readers who are already familiar with these technologies, there should be no doubt as to why we are incredibly excited about them, but for those who aren’t, consider this a small introduction to the basics and concepts.

What is Database as a Service (DBaaS)?
In a nutshell, DBaaS – as it is frequently referred to – is a loose moniker to the concept of providing a managed cloud-based database environment accessible by users, applications or developers. Its aim is to provide a full-fledged database environment, while minimizing the administrative turmoil and pains of managing the surrounding infrastructure.

Real life example: Imagine you are working on a new application that has to be accessible from multiple regions. Building and maintaining a large multiregion setup can be very expensive. Furthermore, it introduces additional complexity and strain on your system engineers once timezones start to come into play. The challenge of having to manage machines in multiple datacenters won’t simplify your release cycle, nor increase your engineers’ happiness.

Let’s take a look at some of the questions DBaaS could answer in a situation like this:

– How do I need to size my machines, and where should I locate them?
Small environments require less computing power and can be a good starting point, although this also means they may not be as well-prepared for future growth. Buying larger-scale and more expensive hardware and hosting can be very expensive and can be a big stumbling block for a brand new development project. Hosting machines in multiple DC’s could also introduce administrative difficulties, like having different SLA’s and potential issues setting up WAN or VPN communications. DBaaS introduces an abstraction layer, so these consideration aren’t yours, but those of the company offering it, while you get to reap all the rewards.

– Who will manage my environment from an operational standpoint?
Staffing considerations and taking on the required knowledge to properly maintain a production database are often either temporarily sweeped under the rug or, when the situation turns out badly, a cause for the untimely demise of quite a few young projects. Rather than think about how long ago you should have applied that security patch, wouldn’t it be nice to just focus on managing the data itself, and be otherwise confident that the layers beyond it are managed responsibly?

– Have a sudden need to scale out?
Once you’re up and running, enjoying the success of a growing use base, your environment will need to scale accordingly. Rather than think long and hard on the many options available, as well as the logistics attached to those changes, your DBaaS provider could handle this transparently.

Popular public options: Here are a few names of public services you may have come across already that fall under the DBaaS moniker:

– Amazon RDS
– Rackspace cloud databases
– Microsoft SQLAzure
– Heroku
– Clustrix DBaaS

What differentiates these services from a standard remote database is the abstraction layer that fully automates their backend, while still offering an environment that is familiar to what your development team is used to (be it MySQL, MongoDB, Microsoft SQLServer, or otherwise). A big tradeoff to using these services is that you are effectively trusting an external company with all of your data, which might make your legal team a bit nervous.

Private cloud options?
What if you could offer your team the best of both worlds? Or even provide a similar type of service to your own customers? Over the years, a lot of platforms have been popping up to allow effective management and automation of virtual environments such as these, allowing you to effectively “roll your own” DBaaS. To get there, there are two important layers to consider:

  • Infrastructure Management, also referred to as Infrastructure-as-a-Service (IaaS), focusing on the logistics of spinning up virtual machines and keeping their required software packages running.
  • Database Management, previously referred to DBaaS, transparently coordinating multiple database instances to work together and present themselves as a single, coherent data repository.

Examples of IaaS products:
– OpenStack
– OpenQRM

Ecample of DBaaS:
– Trove

Main Advantages of DBaaS
For reference, the main reasons why you might want to consider using an existing DBaaS are as follows:

Reduced Database management costs

DBaaS removes the amount of maintenance you need to perform on isolated DB instances. You offload the system administration of hardware, OS and database to either a dedicated service provider, or in the case where you are rolling your own, allow your database team to more efficiently manage and scale the platform (public vs private DBaaS).

– Simplifies certain security aspects

If you are opting to use a DBaaS platform, the responsibility of worrying about this or that patch being applied falls to your service provider, and you can generally assume that they’ll keep your platform secure from the software perspective.

– Centralized management

One system to rule them all. A guarantee of no nasty surprises concerning that one ancient server that should have been replaced years ago, but you never got around to it. As a user of DBaaS, all you need to worry about is how you interface with the database itself.

– Easy provisioning

Scaling of the environment happens transparently, with minimal additional management.

– Choice of backends

Typically, DBaas providers offer you the choice of a multitude of database flavors, so you can mix and match according to your needs.

Main Disadvantages
– Reduced visibility of the backend

Releasing control of the backend requires a good amount of trust in your DBaaS provider. There is limited or no visibility into how backups are run and maintained, which configuration modifications are applied, or even when and which updates will be implemented. Just as you offload your responsibilities, you in turn need to rely on an SLA contract.

– Potentially harder to recover from catastrophic failures

Similarly to the above, unless your service providers have maintained thorough backups on your behalf, the lack of direct access to the host machines means that it could be much harder to recover from database failure.

– Reduced performance for specific applications

There’s a good chance that you are working on a shared environment. This means the amount of workload-specific performance tuning options is limited.

– Privacy and Security concerns

Although it is much easier to maintain and patch your environment. Having a centralized system also means you’re more prone to potential attacks targeting your dataset. Whichever provider you go with, make sure you are intimately aware of the measures they take to protect you from that, and what is expected from your side to help keep it safe.

Conclusion: While DBaaS is an interesting concept that introduces a completely new way of approaching an application’s database infrastructure, and can bring enterprises easily scalable, and financially flexible platforms, it should not be considered a silver bullet. Some big tradeoffs need to be considered carefully from the business perspective, and any move there should be accompanied with careful planning and investigation of options.

Embracing the immense flexibility these platforms offer, though, opens up a lot of interesting perspectives too. More and more companies are looking at ways to roll their own “as-a-Service”, provisioning completely automated hosted platforms for customers on-demand, and abstracting their management layers to allow them to be serviced by smaller, highly focused technical teams.

Stay tuned: Over the next few weeks we’ll be publishing a series of posts focusing on the combination of two technologies that allow for this type of flexibility: OpenStack and Trove.

The post DBaaS, OpenStack and Trove 101: Introduction to the basics appeared first on MySQL Performance Blog.

Putting MySQL Fabric to Use: July 30 webinar

Percona MySQL webinarsMartin and I have recently been blogging together about MySQL Fabric (in case you’ve missed this, you can find the first post of the series here), and on July 30th, we’re going to be presenting a webinar on this topic titled “Putting MySQL Fabric to Use.”

The focus of the webinar is to help you get started quickly on this technology, so we’ll include very few slides (mostly just a diagram or two) and then jump straight into shared screen mode, with lots of live console and source code examples.

In order to make the best use of time, we won’t show you how to install and configure MySQL Fabric. However, we can point you to a few resources to help you get ready and even follow our examples as we go:

  • The official manual is an obvious starting point
  • Our second post in the series includes configuration instructions
  • This git repo contains the test environment we’ll use to run our demos. Specifically, we’ll use the sharding branch, so if you intend to follow our examples as we go, we recommend checking that one out.

If you’re interested, you can register for this webinar here, and if there’s something specific you’d like to see (we had a request for PHP examples in the comments to our last post) feel free to post that as a comment. We can’t promise we’ll be able to cover all requests during the webinar, but we’ll incorporate examples to the repo as time allows.

Hope to see you then!

The post Putting MySQL Fabric to Use: July 30 webinar appeared first on MySQL Performance Blog.

Monitoring MySQL flow control in Percona XtraDB Cluster 5.6

Monitoring flow control in a Galera cluster is very important. If you do not, you will not understand why writes may sometimes be stalled. Percona XtraDB Cluster 5.6 provides 2 status variables for such monitoring: wsrep_flow_control_paused and wsrep_flow_control_paused_ns. Which one should you use?

What is flow control?

Flow control does not exist with regular MySQL replication, but only with Galera replication. It is simply the mechanism nodes are using when they are not able to keep up with the write load: to keep replication synchronous, the node that is starting to lag instructs the other nodes that writes should be paused for some time so it does not get too far behind.

If you are not familiar with this notion, you should read this blogpost.

Triggering flow control and graphing it

For this test, we’ll use a 3-node Percona XtraDB Cluster 5.6 cluster. On node 3, we will adjust gcs.fc_limit so that flow control is triggered very quickly and then we will lock the node:

pxc3> set global wsrep_provider_options="gcs.fc_limit=1";
pxc3> flush tables with read lock;

Now we will use sysbench to insert rows on node 1:

$ sysbench --test=oltp --oltp-table-size=50000 --mysql-user=root --mysql-socket=/tmp/pxc1.sock prepare

Because of flow control, writes will be stalled and sysbench will hang. So after some time, we will release the lock on node 3:

pxc3> unlock tables;

During the whole process, wsrep_flow_control_paused and wsrep_flow_control_paused_ns are recorded every second with mysqladmin ext -i1. We can then build a graph of the evolution of both variables:

wsrep_flow_control_pxc3

While we can clearly see when flow control was triggered on both graphs, it is much easier to know when flow control was stopped with wsrep_flow_control_paused_ns. It would be even more obvious if we have had several timeframes when flow control is in effect.

Conclusion

Monitoring a server is obviously necessary if you want to be able to catch issues. But you need to look at the right metrics. So don’t be scared if you are seeing that wsrep_flow_control_paused is not 0: it simply means that flow control has been triggered at some point since the server started up. If you want to know what is happening right now, prefer wsrep_flow_control_paused_ns.

The post Monitoring MySQL flow control in Percona XtraDB Cluster 5.6 appeared first on MySQL Performance Blog.

Apple: We’ll unleash OS X Yosemite beta on the MASSES July 24

Starting today, regular fanbois will be guinea pigs, it tells Reg

Anyone eager to take Apple’s OS X Yosemite for a spin will be able to get their hands on it from today, the firm has promised. Paid-up developers have been able to test-drive the desktop operating system for the past few weeks.…