-
New England’s Victory (for Big Data)
While it might not have been New England’s weekend on the Big Gridiron, it was certainly New England’s day for Big Data at the New England Database Summit on Friday at MIT.
The summit was well attended, with 350 registrants and keynotes from prominent MySQL users such as Mark Callaghan. The coverage was quite broad, with presentations running the gamut from grad students (complete with bodyguards and intimidating academic advisors) to established companies such as StreamBase. The sponsor list was an A-list this year as well, with EMC and Microsoft being the two biggest backers.
There were far too many and diverse topics to write about all of them. That said, here were a few of the notable ones.
Keynote #1: Johannes Gehrke (Cornell): Declarative Data-Driven Coordination
Johannes Gehrke of Cornell kicked off with the first keynote on Declarative-Driven Coordination. His methodology shed light on an alternative to out-of-band communication. The presentation focused on how to successfully handle entangled queries.
More Sleep for Tom and Meg if They Can Just Coordinate
In brief, what he showed is a way for someone to see if their friend is on a flight and have the database go about satisfying mutual constraints. With a proof that is outlined in his Sigmod paper, his main theorem is that any schedule that is entangled-isolated is also oracle-serializable. It’s a clever approach, as long as one’s set of friends being entangled remains small.
Keynote #2: Mark Callaghan (Facebook): Performance is Overrated
The room got a little quiet when Mark took the stage. Some people were expecting a possible rehash of this summer’s brouhaha between Mike Stonebraker and Facebook on the fate of MySQL. But, instead Mark jumped into some very practical discussions about managing MySQL at scale.
First, he noted that manageability needs more attention since…
The cost of extra hardware can be predicted
The cost of downtime cannot
Downtime comes in many forms (server down and server too busy)
For Mark, manageability has a number of meanings. This includes the rate of interrupts/server for the operations team. Mark finds that while the server count grows quickly, his operations team grows slowly. Hence, it is imperative that the quality-of-service improve over time (i.e., Does work get done? Does work get done on time?).
Mark and his team use MySQL for a number of reasons. First, it was there when Mark arrived. Second, Mark and his team made it scale 10x. Finally, Mark likes MySQL for OLTP.
As Facebook has grown though, so have the number of servers. This is due to “Big Data” x high QPS. Hence, they have had to add servers to add IOPs. To address this, Mark noted that flash memory (SSD) is very interesting as are (we blush) write-optimized databases.
The last part of his presentation focused on advice for scaling: More Data, More QPS. His tips were quite straightforward:
Fix stalls to make use of capacity
Don’t make MySQL faster, make it less slow
Improve efficiency to use less
Repeat
Additional details can be found in Sheeri’s excellent live blog of the presentation.
New Tools and Systems Session: Willis Lang (University of Wisconsin): Energy-Conscious Data Management Systems
Just as Mark stressed that performance isn’t everything when he spoke about management, Willis Lang pointed out another key concern. His slides noted that “three decades of database research has optimized for the highest possible performance possible regardless of energy consumption.” (We agree and have written about this topic as well).
Willis and his team have been looking at various techniques for addressing this such as using variable speed disks. He has been systematically studying the power/performance trade-offs of hardware components. The preliminary memory-based results showed that interesting trade-off opportunities exist if one rethinks database design principles. His presentation focused on the improvements that can be seen with memory parking. Additional details on his research can be found here.
As mentioned previously, there were many good talks — much more could be written about the event. Other interesting speakers included David Karger who introduced Dido, which seeks to make database manipulation as easy as document editing, and Alvin Cheung whose Pyxis project eases application development with automatic code partitioning based on application and server characteristics.
Kudos to Samuel Madden (MIT) and Ugur Cetintemel (Brown University) for organizing the event. Additional details can also be found via the Twitter hashtag #nedb12 and the event homepage.
-
Building XtraDB on Windows
As you may or may not know, Windows is not yet a supported platform for XtraDB.
I thought I’d try to build it, and see what happens:
Download XtraDB 5.5 Source
cd C:\xtradb-5.5
mkdir bld
cd bld
cmake ..
VS08: File -> Open -> Solution -> C:\xtradb-5.5\bld\MySQL.sln
Build Ended With:
========== Build: 70 succeeded, 17 failed, 2 up-to-date, 10 skipped ==========
The first failure had to do with innobase:
18>Generating Code...
18>Build log was saved at "file://c:\..\innobase.dir\Debug\BuildLog.htm"
18>innobase - 9 error(s), 3 warning(s)
I checked the innobase build log and found this:
sql_prepare.cc
..\sql_prepare.cc(2199) : error C3861: 'gettimeofday': identifier not found
..\sql_prepare.cc(2232) : error C3861: 'gettimeofday': identifier not found
..\sql_prepare.cc(2639) : error C3861: 'gettimeofday': identifier not found
..\sql_prepare.cc(2672) : error C3861: 'gettimeofday': identifier not found
..\sql_prepare.cc(2808) : error C3861: 'gettimeofday': identifier not found
..\sql_prepare.cc(2848) : error C3861: 'gettimeofday': identifier not found
..\sql_prepare.cc(2940) : error C3861: 'gettimeofday': identifier not found
..\sql_prepare.cc(2973) : error C3861: 'gettimeofday': identifier not found
...
sql_parse.cc
..\sql_parse.cc(5750) : error C3861: 'gettimeofday': identifier not found
..\sql_parse.cc(5827) : error C3861: 'gettimeofday': identifier not found
The ‘gettimeofday’ function is a Unix-only (i.e., there is no direct analog of the gettimeofday() in Windows) .. more details here.
So, this part of the code has simply not been ported to Windows yet.
I got to looking, and it is a reported bug already:
https://bugs.launchpad.net/percona-server/+bug/737895
https://bugs.launchpad.net/percona-patches/+bug/421925
https://bugs.launchpad.net/percona-patches/+bug/390156
But, do not worry. If you want to take advantage of the InnoDB enhancements found in XtraDB on Windows, you can simply use MariaDB.
There are pre-built MariaDB binaries for Windows, and MariaDB contains XtraDB+ (which is XtraDB plus some additional enhancements on top of it).
Hope this helps.
-
Speaking at MySQL Meetup in Charlotte,NC
February is going to be a busy meetup month for me. In addition to speaking in Raleigh I will visit Charlotte to speak at Meetup out there on February 23rd. Last year I visited Charlotte meetup was in the great place and we had great food and great crowd of people showing up. I will talk about Optimizing MySQL Configuration which I believe is a great topic for this meetup as it will be helpful for developers to learn basics MySQL configuration as well as for Advanced MySQL DBAs to learn a trick or two they did not know. As an extra treat for attendees I will bring exclusive discount to MySQL Conference and Expo as well as some signed High Performance MySQL 2nd Edition books to give away. See you there.
-
On datatypes, domains and why I think it's time we reconsidered
What's in a datatype then? A MySQL SMALLINT? A C int? An Oracle BLOB? One thing is for sure, they are not very well standardized, not even within the same environment, and much less so across them. And what does it means, really? When should I use a BLOB, when a BINARY VARCHAR and when to use a long long?A datatype defines many attributes:What data I can store in it: Only numbers? Printable alaphanumeric characters? Unicode? Binary data? An object?What I can do with a value of that particular type and how does it behave? Can I concatenate two values? Add them (that is NOT the same as concatenate! The same operator (+) may be used, but it's not the same thing!)? Save it to a file?How is it stored and represented in technical terms. And this can be argued, a C int doesn't define how it is stored on disk, but a MySQL INT does! And a BLOB is a type that is largely there just because it defines how to handle things on disk!To link this up with MySQL, let's look at the built-in MySQL datatypes. Some of them are rarely used, and for all intents and purposes, many of them are overlapping and in some cases they are there JUST BECAUSE they have different semantics and behaviour! I am, for example, not a big fan of all the different INT types that MySQL supports; tinyint, smallint mediuming, bigint all in signed and unsigned shape. Why is this useful? Yes, I know a smallint is smaller on disk so there is 2 bytes less to read from disk compared to an int. Yeah right, big deal: Look now, if that is a big deal for your application, then you have bigger problems. Which doesn't mean that using a smallint isn't a means of, in some cases, improve performance a small bit. And that this is sometimes useful. But really, this should be hidden somewhere, we just have too many datatypes. The same goes for all the different BLOB datatypes! The BLOB datatypes has another issue by the way, the names of them a a bit silly: "TINYBLOB" is a contradiction, as BLOB stands for Binary LARGE Object! A Large object that is Tiny? And a something that is at the most 255 bytes isn't really "Large" anymore, that was way back in the 1970's or so.No, I'm not a big fan of determining which datatype to use by considering storage requirements to optimize performance. That is something you might do as a performance enhancing measure after the job is done, possibly. In C, which is a language I have used longer than I care to remember, int seems to serve the role of being the default numeric integer datatype, which is not a bad thing, in particular not in C as C is a language used to write low-level stuff in, stuff that is performance sensitive, and a C int is linked to hardware specific integers.FLOAT is another issue altogether. In very few business applications is there a need for FLOAT or DOUBLE in terms of IEEE floating point values. Really! This is a scientific number format, that is also supported by the Floating point unit in most computers of today, so operations on them are pretty fast these days. But I have seen too many business applications where the developer uses a FLOAT in the database for any kind of numeric value that is not an integer, which is a distinctly bad idea! It works as long as the numbers aren't too big, and when the operations on them are reasonably simple. This is not to say a FLOAT or DOUBLE isn’t useful, just that they are more targeted for scientific programming (Hey, they are after all defined by IEEE!). What most people really should use is DECIMAL (I'm using MySQL datatypes for reference, but you know what I mean). This is a datatype that accepts (hey, big surprise coming up here) decimal numbers, fixed point decimal numbers! Operations on DECIMAL are slower though, as they aren't really supported directly by either C (which is the language that MySQL is largely written in) or by the CPU / FPU. But they are better for business applications (for scientific applications, use FLOAT or DOUBLE. And continue to write your code in Fortran by all means!)But all these datatype are increasingly used also to define the domain of the data in them! The SQL DOMAIN feature is in the SQL Standard but is largely forgotten. It is there in PostgreSQL for you to try. The advantage of this, in my mind, isn't just that you can define in more exact terms what data should go into a database column, without writing code to do that, and to make it apparent in the database schema what data is expected, it also means that we can get away from the wide array of different built-in datatypes in, say, MySQL. Instead we can stick with a few optimized ones, let them have a simple inherent domain and then we can define the application specific domains in application code. Anyone for a IP_ADDRESS datatype? A ZIPCODE datatype? An URL type. All those a reasonable requests from the application POV, but it really shouldn't (in my mind) be defined by the database server (As: What are the semantics of a ZIP-code? They are different in different countries, and the post-office can change their minds (although that is something that takes a long time I guess)). Why? As this is application specific domain! And application specifics should be in application code! Simple as that!So what datatypes are useful then, the way I look at it:RAW Data - This should be a basic type. As long as can be, or as short. Any data can be stored in it, and nearly as much as you would like.Integers - Although an integer can be considered to be a special case of a more generic number, the integer domain is so generic, and the integer is such a common and basic type that is deserves a place here.Decimal – Fixed point decimal numbers.Float - Now we are getting close to the application domain here, but these guys deserve a place here anyway, as they are so common and hooked up to the programming languages and hardware.DATETIME - Only one of them is really needed. One of DATETIME, DATE, YEAR, MONTH, TIMESTAMP, TIME etc etc. that is. Just one, very generic base type, where the specific domain is, you've heard it before, application specific!Text - Reluctantly I add this to the list. Although this is just a RAW with some limitations (usually only UTF8 or ISO8859-15 or something characters), there are some operations and attributes that are so tied up with text strings that a TEXT type is reasonable to include.And that's about it! Yes! We could add a few standard domains of top of all this, like FLOAT and DOUBLE, SMALLINT, TINYINT etc etc. But as for the basic types, this is what I would like to see. As for the weirdo TIMESTAMP semantics in MySQL, add them as triggers on your table, in application code or, best, as a specific domain (I do know that not all weirdiness of TIMESTAMP can be represented by a standard SQL DOMAIN, which is both a sign of just how weird TIMESTAMP is, and of some of the limitation of the SQL DOMAIN).I have now mentioned SQL DOMAINs a few times, so lets spend some quality time with them right now before closing up, here is a simple session using PostgreSQL 8.4:First we create a domain:CREATE DOMAIN yesno AS CHAR(1)DEFAULT 'N'NOT NULL CHECK (VALUE = 'Y' OR VALUE = 'N');Then we create a table that uses that domain:CREATE TABLE user_active(username CHAR(10), is_active YESNO);And then we check if it works using a few SQL statements:INSERT INTO user_active VALUES('foo', NULL);ERROR: domain yesno does not allow null valuesINSERT INTO user_active VALUES('foo', 'Y');INSERT 0 1INSERT INTO user_active VALUES('foo', 'N');INSERT 0 1INSERT INTO user_active VALUES('foo', 'A');ERROR: value for domain yesno violates check constraint "yesno_check"Before I wrap this up, I want to say one more thing on the issue of SQL Domains: There is an alternative n PostgreSQL, which is user defined datatypes using a CREATE TYPE SQL command. To begin with, I like domains much better as they are declarative, which means I don’t have to read a whole bunch of (usually uncommented and undocumented) code to understand what they type does. Secondly, this little warning in the PostgreSQL 9.1 manual isn’t too encouraging (page 1289): “To create a new base type, you must be a superuser. (This restriction is made because an erroneous type definition could confuse or even crash the server.)”I have more to say on types, and I will follow up with a new post on this subject eventually, but this is enough for now./Karlsson
-
The Agony of Big Transactions in the MySQL Binlog
Databases fail in interesting ways. About a week ago I was called in on a support case for Tungsten Enterprise. We were getting failures in which the master MySQL 5.1.50 server would run out of connections, block applications for 5 to 10 minutes and eventually trigger automated failover to a slave. Running out of database connections is a classic symptom of blocking on a shared resource inside the server. The blocked transactions hold onto their connections, which are quickly exhausted if new transactions constantly arrive from applications. So where was the hold-up?Our first clue was to notice that Tungsten Replicator was processing a huge transaction at the time of one of the failovers. To find out more I dug into the MySQL binlog using the handy mysqlbinlog utility and looked at transactions committed around the time of the failure. The following pattern quickly became apparent across hundreds of transactions. The timestamps show that the commit occurred over 8 minutes after the first statement. #120201 11:57:59 server id 313819 end_log_pos 215822862 Query thread_id=55165463 exec_time=0 error_code=0SET TIMESTAMP=1328119079/*!*/;BEGIN/*!*/;# at 215822862#120201 11:49:23 server id 313819 end_log_pos 215823092 Query thread_id=55165463 exec_time=1 error_code=0SET TIMESTAMP=1328118563/*!*/;(SQL statement)/*!*/;# at 215823092#120201 11:57:59 server id 313819 end_log_pos 215823119 Xid = 32444278496COMMIT/*!*/;The root cause turned out to be simple. The large transaction we found in the Tungsten log resulted in over 1.3Gb of data in the MySQL binlog. Mysqld took several minutes to write this into the log. Meanwhile, since transactions apply serially into the binlog, every other transaction had to wait instead of just committing and releasing its connection. New transactions quickly consumed the spare connections up to the max_connections limit. Problem explained. The MySQL binlog is a wonderful thing but the fact that it is a serial resource has important consequences for busy servers that use replication. First, if you run out of connections, look at the binlog. Tungsten makes large transactions fairly easy to detect because they are fragmented in the replicator log. We noticed a transaction with 1300 fragments, which tipped us off to the problem. However, you can also find the problem by looking at the binlog directly as I did above. Or you can use any of several tools to analyze the binlog and look for periods of time with no transactions committed. Second, big transactions are a problem for MySQL. Admin operations or application bulk loads through web APIs can easily create huge numbers of updates. Break them up into pieces and do a chunk at a time. This is also better for slaves, which may end up lagging badly when large transactions reach them. Increasing max_connections to a high number is incidentally not a solution. It just makes problems with transactions less visible and will not help anyway if you have high transaction throughput. Third, if you have updates or deletes that affect many rows, use statement replication. Row replication is a great thing. We use it regularly in Tungsten deployments, especially for heterogeneous replication, e.g., from MySQL to Oracle. However, we have also seen customers write 5Gb or more into the binlog without even realizing it. Statement replication can write as little as a single statement. This flexibility is one of the outstanding features of MySQL replication.A possible fourth suggestion is to put the binlog on faster storage. However, it is unclear whether that would help--the binlogs in this case were already on fast, separate disk storage with large caches and sync_binlog=0. It seems MySQL writes large transactions rather slowly into the binlog for reasons that may not have much to do with storage speed. I hope to understand why that is at some later time. Perhaps somebody reading this article can suggest a reason.
|