The language of compression

Leif Walsh & friends

Leif Walsh will talk about the language of compression at Percona Live Amsterdam

Storage. Everyone needs it. Whether your data is in MySQL, a NoSQL, or somewhere in the cloud, with ever-growing data volumes – along with the need for SSDs to cut latency and replication to provide insurance – an organization’s storage footprint is an important place to look for savings. That’s where compression comes in (squeeze!) to save disk space.

Two Sigma software engineer Leif Walsh speaks the language of compression. Fluently. In fact, he’ll be speaking on

that exact subject September 22 during the Percona Live conference in Amsterdam.

I asked him about his talk, and about Amsterdam, the other day. Here’s what he had to say.

* * *

Tom: Hi Leif, how will your talk help IT decision-makers cut through the marketing mumbo-jumbo on what’s important to focus on and what is not
Leif: My talk will have three lessons aimed at those making storage decisions for their company:

  1. What are the key factors to consider when evaluating storage options, and how can they affect your bottom line?  This is not only how storage tech influences your hardware, operations, and management costs, but also how it can facilitate new development initiatives and cut time-to-market for your products.
  2. How should you read benchmarks and marketing materials about storage technology?  You’ll learn what to look for in promotional material, and how to think critically about whether that material is applicable to your business needs.
  3. What’s the most effective way to communicate with storage vendors about your application’s requirements?  A lot of time can be spent in the early stages of a relationship in finding a common language for users and vendors to have meaningful discussions about users’ needs and vendors’ capacity to meet those needs.  With the tools you’ll learn in my talk, you’ll be able to accelerate quickly to the high-bandwidth conversations you need to have in order to make the right decision, and consequently, you’ll be empowered to evaluate more choices to find the best one faster.

Tom: In addition to IT decision-makers, who else should attend your session and what will they take away afterward?
Leif: My talk is primarily about the language that everyone in the storage community should be using to communicate. Therefore, storage vendors should attend to get ideas for how  to express their benchmarks and their system’s properties more effectively, and application developers and operations people will learn strategies for getting better support and for making a convincing case to the decision makers in their own company.

Tom: Which session(s) are you most looking forward to besides your own?
Leif: Sam Kottler is a good friend and an intensely experienced systems engineer with a dynamic and boisterous personality, so I can’t wait to hear more about his experiences with Linux tuning.

As one of the original developers of TokuMX, I’ll absolutely have to check out Stephane’s talk about it, but I promise not to heckle. Charity Majors is always hilarious and has great experiences and insights to share, so I’ll definitely check out her talk too.

* * *

Catch Leif’s talk at Percona Live in Amsterdam September 21-23. Enter the promo code “BlogInterview” at registration and save €20! Register now!

The post The language of compression appeared first on MySQL Performance Blog.

Percona Live Europe 2015! Call for speakers; registration open

Percona Live Europe 2015! Call for speakers; registration now openPercona Live is moving from London to Amsterdam this year and the event is also expanding to three full days. Percona Live Europe 2015, September 21-23, will be at the Mövenpick Hotel Amsterdam City Centre. The call for speakers and Super Saver registration are now open. Hurry though because the deadline for submitting a speaking proposal is June 21st and Super Saver registration ends July 5th!

This year’s conference will feature one day of tutorials and two days of keynote talks and breakout sessions related to MySQL, NoSQL and Data in the Cloud. You’ll get briefed on the hottest topics, learn about operating a high-performing deployment and hear from top-industry leaders describe the future of the ecosystem – encompassing MySQL, MariaDB, Percona Server, MongoDB (and more). Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

Have something to say? Why not lead a breakout session or a tutorial?

Breakout sessions are 50 minutes including a Q&A. Tutorial sessions focus on an immediate and practical application of in-depth MySQL and NoSQL knowledge. Tutorial speakers should assume that attendees will have laptops to work through detailed and potentially hands-on presentations. Tutorials are typically three hours long including a Q&A, however, if you have content for a full day, submissions for 6-hour-long tutorials are also being accepted. If your tutorial or breakout session is approved, you’ll receive a complimentary full-conference pass.

Huge thanks to our Conference Committee!

There would be no Percona Live without the hard work of our conference committees. Meet this year’s Percona Live Europe 2015 Conference Committee – a dedicated group of experts in MySQL, NoSQL and Data in the Cloud:

  • Erik Beebe – Founder and CTO, ObjectRocket
  • Luis Motta Campos – Database Administrator, ebay Classifieds Group
  • Colin Charles – Chief Evangelist, MariaDB
  • César Trigo Esteban – Development Director, Gigigo
  • Kenny Gorman – Chief Technologist; Data. Office of the CTO, Rackspace
  • Amrith Kumar – Founder and CTO, Tesora
  • Giuseppe Maxia – Quality Assurance Architect, VMWare
  • Shlomi Noach – Senior Systems Engineer, Booking.com
  • Konstantin Osipov – Engineering Manager, Mail.Ru
  • Morgan Tocker – MySQL Community Manager, Oracle
  • Art van Scheppingen – Head of Database Engineering, Spil Games
  • Charity Majors- Production Engineering Manager, Facebook
  • Peter Zaitsev – Co-founder and CEO, Percona

Sponsorships

Sponsorship opportunities for Percona Live Europe 2015 are now available. Sponsors become part of a dynamic and fast-growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. This year’s conference will feature expanded accommodations and turnkey kiosks.

Planning to attend?

Super Saver registration discounts for Percona Live Europe 2015 are available through July 5th (at 11:30 p.m. CEST). Visit the Percona Live Europe 2015 website for more information about the conference. Interested community members can also register to receive email updates about Percona Live Europe 2015.

Percona has also negotiated a special hotel rate at the Mövenpick Hotel Amsterdam City Centre. If you book your hotel before July 6th your delicious breakfast is included.

I hope to see you in Amsterdam!

The post Percona Live Europe 2015! Call for speakers; registration open appeared first on MySQL Performance Blog.

Schema Design in MongoDB vs Schema Design in MySQL

For people used to relational databases, using NoSQL solutions such as MongoDB brings interesting challenges. One of them is schema design: while in the relational world, normalization is a good way to start, how should we design our collections when creating a new MongoDB application?

Let’s see with a simple example how we would create a data structure for MySQL (or any relational database) and for MongoDB. We will assume in this post that we want to store people information (their name) and the details from their passport (country and validity date).

Relational Design

In the relational world, the basic idea is to try to stick to the 3rd normal form and create two tables (I’ll omit indexes and foreign keys for clarity – MongoDB supports indexes but not foreign keys):

mysql> select * from people;
+----+------------+
| id | name       |
+----+------------+
|  1 | Stephane   |
|  2 | John       |
|  3 | Michael    |
|  4 | Cinderella |
+----+------------+
mysql> select * from passports;
+----+-----------+---------+-------------+
| id | people_id | country | valid_until |
+----+-----------+---------+-------------+
|  4 |         1 | FR      | 2020-01-01  |
|  5 |         2 | US      | 2020-01-01  |
|  6 |         3 | RU      | 2020-01-01  |
+----+-----------+---------+-------------+

One of the good things with such a design is that it’s equally easy to run any query (as long as we don’t consider joins as something difficult to use):

  • Do you want the number of people?
    SELECT count(*) FROM people
  • Do you want to know the validity date of Stephane’s passport?
    SELECT valid_until from passports ps join people pl ON ps.people_id = pl.id WHERE name = 'Stephane'
  • Do you want to know how many people do not have a passport? Run
    SELECT name FROM people pl LEFT JOIN passports ps ON ps.people_id = pl.id WHERE ps.id IS NULL
  • etc

MongoDB design

Now how should we design our collections in MongoDB to make querying easy?

Using the 3rd normal form is of course possible, but that would probably be inefficient as all joins should be done in the application. So out of the 3 queries above, only the query #1 could be easily run. So which other designs could we have?

A first option would be to store everything in the same collection:

> db.people_all.find().pretty()
{
	"_id" : ObjectId("51f7be1cd6189a56c399d3bf"),
	"name" : "Stephane",
	"country" : "FR",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{
	"_id" : ObjectId("51f7be3fd6189a56c399d3c0"),
	"name" : "John",
	"country" : "US",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{
	"_id" : ObjectId("51f7be4dd6189a56c399d3c1"),
	"name" : "Michael",
	"country" : "RU",
	"valid_until" : ISODate("2019-12-31T23:00:00Z")
}
{ "_id" : ObjectId("51f7be5cd6189a56c399d3c2"), "name" : "Cinderella" }

By the way, we can see here that MongoDB is schemaless: there is no problem in storing documents that do not have the same structure.

The drawback is that it is no longer clear which attributes belong to the passport, so if you want to get all passport information for Michael, you will need to correctly understand the whole data structure.

A second option would be to embed passport information inside people information – MongoDB supports rich documents:

> db.people_embed.find().pretty()
{
	"_id" : ObjectId("51f7c0048ded44d5ebb83774"),
	"name" : "Stephane",
	"passport" : {
		"country" : "FR",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{
	"_id" : ObjectId("51f7c70e8ded44d5ebb83775"),
	"name" : "John",
	"passport" : {
		"country" : "US",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{
	"_id" : ObjectId("51f7c71b8ded44d5ebb83776"),
	"name" : "Michael",
	"passport" : {
		"country" : "RU",
		"valid_until" : ISODate("2019-12-31T23:00:00Z")
	}
}
{ "_id" : ObjectId("51f7c7258ded44d5ebb83777"), "name" : "Cinderella" }

Or we could embed the other way (however this looks a bit dubious as some people may not have a passport like Cinderella in our example):

> db.passports_embed.find().pretty()
{
	"_id" : ObjectId("51f7c7e58ded44d5ebb8377b"),
	"country" : "FR",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "Stephane"
	}
}
{
	"_id" : ObjectId("51f7c7ec8ded44d5ebb8377c"),
	"country" : "US",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "John"
	}
}
{
	"_id" : ObjectId("51f7c7fa8ded44d5ebb8377d"),
	"country" : "RU",
	"valid_until" : ISODate("2019-12-31T23:00:00Z"),
	"person" : {
		"name" : "Michael"
	}
}
{
	"_id" : ObjectId("51f7c8058ded44d5ebb8377e"),
	"person" : {
		"name" : "Cinderella"
	}
}

That’s a lot of options! How can we choose? Here is where you should be aware of a fundamental difference between MongoDB and relational databases when it comes to schema design:

Collections inside MongoDB should be designed with the most frequent access patterns of the application in mind, while in the relational world, you can forget how data will be accessed if your tables are normalized.

So…

  • If you read people information 99% of the time, having 2 separate collections can be a good solution: it avoids keeping in memory data is almost never used (passport information) and when you need to have all information for a given person, it may be acceptable to do the join in the application.
  • Same thing if you want to display the name of people on one screen and the passport information on another screen.
  • But if you want to display all information for a given person, storing everything in the same collection (with embedding or with a flat structure) is likely to be the best solution.

Conclusion

We saw in this post one of the fundamental differences between MySQL and MongoDB when it comes to creating the right data structure for an application: with MongoDB, you need to know the data access pattern of the application. This should not be neglected as creating a wrong schema design is a recipe for disaster: queries will be difficult to write and to optimize, they will be slow and they will sometimes need to be replaced by custom code. All that can lead to low performance and frustration.

The next question is: which way is better? And of course, there is no definite answer: MongoDB fans will say that by making all access patterns equal, normalization make them equally bad, and normalization fans will say that a normalized schema provides good performance for most applications and that you can always denormalize to help a few queries run faster.

The post Schema Design in MongoDB vs Schema Design in MySQL appeared first on MySQL Performance Blog.