Just Say No to Materialized MySQL Views

I'm a big fan of MySQL, but I ran into an interesting performance issue this weekend regarding views. MySQL added view support in 5.0, and I've used it to great effect, but it's not all roses.

Within MySQL, there are several ways to categorize views, but the one I care about is materialized vs. rewritten/merged. Materialized views are views that, as you might guess, are materialized when they're needed. In simpler terms, the view definition is used to create a temporary table and then that temporary table is used for the actual query. Rewritten or merged views are combined with the actual query by the SQL engine and executed as a single query.

The performance problem I ran across was with the former view type. It seems that the internals of MySQL manage materializing in a very inefficient way. I can't provide any sort of hard data to support that, but what I experienced was definitely points to the conclusion.

I had a view that had some fairly intensive stuff in it (both structurally and computationally). I knew it was materialized and was fine with a little bit of a slowdown, but I was seeing times in the 15 second range for certain queries that used it on my personal server (which, admittedly, is a only 3GHz Celeron). Manually materializing the view (i.e. running the view statement to create my own table) and then running the queries in question took a few hundred milliseconds tops, counting both the generation of the temp table and the actual query time (with most of it being the former).

The only explanation I can come up with was that the materializing of the view was happening repeatedly. Very repeatedly. Recasting the "view" as a normal table that I repopulate as needed has eliminated all the performance issues, and the queries are now typically running in tens of milliseconds.

I should mention that I've never had any performance issues with rewritten/merged views. MySQL seems to do a fantastic job efficiency-wise with them, and I continue to use them.

Interestingly, I had a very similar problem with Oracle a few months ago. The main difference was that I wasn't using views, but rather subqueries in the FROM clause (i.e., inline views). In that app, creating separate views was both impossible (we didn't own the database), and unnecessary (only one query used the view), and we were able to get around it with "factored subqueries" (the WITH clause).

7 Responses to “Just Say No to Materialized MySQL Views”


  1. 1 Jordan CLark

    Isn't this form of "materializing" data from different sources into a new table a form of denormalization? How much benefit is there when you weigh the cost of building these tables and storing the data in duplicate? Compared to just pulled out fresh data with a normal rewritten/merged view? I thought a big part of using normalizing data was that it was more efficient, with less data there is less to store and index, less to scan, less to update, and more can fit in fast RAM?

    Thanks!

  2. 2 barneyb

    Jordan,

    Yeah, it's absolutely denormalization.

    When MySQL does it, it's an implementation detail of the server; if they changed to something else the DB client wouldn't care. As such, it's not a denormalization you'd care about.

    When I do it manually, it's a much bigger deal. I'd love to have just changed my view to be of the "rewritten/merged" style, but with the current MySQL implementation, that's impossible. With the database not able to directly address my problem, it's up to me to choose between performance and maintenance overhead.

    For the cost of having to rebuild my "view" tables when data is updated and the potential to have slightly outdated statistics in the "views", I can drop a significant number of queries from the 10-15 second range down to 40-100 milliseconds. Often there are multiple instance per page request as well. So the tradeoff seems a no-brainer to me.

    I should also have mentioned that this is a CPU-bound limitation. On my laptop (slower disks, faster CPU, equivalent free RAM), the queries with the views run around a second a piece, compared with 10-15 seconds on the server.

  3. 3 Jordan CLark

    What part of the operation makes it CPU bound? Is the view aggregating data, or running functions to produce the output? Is it possible to denormalize just some fields manually into a new table and join, instead of duplicating everything? Or does this reduce the performance gains?

    Also just out of curiosity how much data is this working with, thousands, tens of thousands of rows, more? Does this size matter, like the larger it is, the more you can save?

    P.S. I wish your blog had a comment email subscription option.

  4. 4 barneyb

    Jordan,

    Sadly, the row count is tiny: less than a thousand rows.

    I didn't do exceptionally granular testing, but the main offenders were aggregating subqueries. It would be possible to duplicate the subqueries in the "view" throughout the queries that leverage the view to get most of the performance benefit, but the maintenance downside was significat. That's actually the path I went down first, because I didn't want to jack up my database, but it wasn't worth it. The app is very SELECT-centric with relatively few updates, so manually managing the "view" tables isn't actually much of a headache.

    There is a feed link at the bottom for subscribing to comments. I'd also naively assumed that WordPress would email thread participants when new comments were posted (like MovableType did), but I just looked it up, and that doesn't seem to be the case. I found a plugin that will provide that functionality; I'll be installing it in just a sec.

  5. 5 Mark Cahill

    I had a similar problem with views on our web reporting platform. The culprit that time was that the view was not using the index from the underlying table. I forced a primary key on the ID, and the query times went down 1000%. Just something to look into…

  6. 6 barneyb

    Mark,

    I'm pretty sure that the issue in my case was that the view engine was rematerializing the view over and over again within the same query. If it had just materialized it once and reused it, it would have been fine, even without an index. But still a good thought to keep in mind.

  7. 7 Dom

    Thanks for your article! I seem to have hit the same problem with a test database on MySQL 5.1. In particular I wanted to mimic the behaviour of queries in Access (from which I am wishing to migrate the data). In Access you can 'stack' queries on top of one another and supply parameter values to the top level query which are inherited and used by lower level queries.

    MySQL does not allow parameter values (aka user variables) with views, but there is a workaround using a stored function (which can access and then return the user variable). But the performance is terrible.

    Previously I have used sub-queries (in MySQL 4.1), though not with user variables, and they have always been impressively fast (but I had to write my own VBA code to nest sub-query definitions into higher-level queries). Views should be neater but with their inability to access user variables (except inefficiently as it seems through a stored function), and the performance hit you describe, I guess I must stick to coded sub-queries for now.

Leave a Reply