SQL Is Your Friend, Part 4

So far we’ve talked about the JSON datatype with the first part. Postgres 9.4 brings some new features that enhance it’s JSON capabilities even more, but I haven’t had a chance to play with the new shiny yet.

In part 2 we talked about using window functions. We also made our own functions in both to help extend functionality or DRY up repetitive queries.

I went over a couple of subquery techniques in part 3. These are exciting to me because it gives me even more capabilities to make sense of data at the data level. With windowing and subquerying I can give context to the data in-place, without having to pull it down and write piles of app server code. I like that.

Today we are going to talk about a performance optimization. What do you do when you’ve got the data indexed well, but the query still doesn’t respond in a quick enough time? As a measuring stick, I give a critical eye to anything that takes more than 500 milliseconds. Once it nears a full second I start looking for options.


Materialized Views


Postgres gives you the ability to save (materialize) the results from a query. It’s not a temp table, or even a table, but you can give the data indexes and query against it the same as a table.

-- Store off the query results
CREATE MATERIALIZED VIEW readings_mv AS
SELECT *, LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
FROM readings
WHERE reading_timestamp >= NOW() - '1 day'::INTERVAL;

-- Add some indexes
CREATE INDEX index_readings_mv_on_major ON readings_mv USING btree(major ASC NULLS LAST);
CREATE INDEX index_readings_mv_on_phone ON readings_mv USING btree(phone COLLATE "default" ASC NULLS LAST);
CREATE INDEX index_readings_mv_on_reading_timestamp ON readings_mv USING btree(reading_timestamp ASC NULLS LAST);

Note the name: readings_mv. It’s a materialized view that consists of readings table data. That’s a convention that I picked up along the way to help communicate what database objects are being used. This is helping out your team and also future you.

In this particular example the query needs to run against the past 24 hours of data and do some calculations (windowing). Now we’ve taken the heavy lifting and calculating and done that in advance out of band from the request/response cycle. We can also do more efficient subqueries that rely upon this same data. You don’t have to index the materialized view, but I tend to mirror table indexes (or add new ones where it makes sense).

Check out the SQL Fiddle. Scroll all the way down in the left pane to see the creation of the materialized view.


Refresh The Data


That’s great, but I said that I wanted to query against the past 24 hours and time is a moving target.

No worries, you can refresh the data with a simple query:

REFRESH MATERIALIZED VIEW readings_mv;

Easy peasy. It blocks, though, so while the materialized view is being updated you have to wait. Or you had to wait. Postgres 9.4, which just dropped this week, added the ability to refresh a view CONCURRENTLY.

REFRESH MATERIALIZED VIEW CONCURRENTLY readings_mv;

Here is the Postgres 9.4 changelog (of sorts) and a little more explanation on the REFRESH CONCURRENTLY functionality as it was being included in trunk.

You can put this into a rake task as well:

namespace :db do
  namespace :heroku do
    desc "Load the PROD database from Heroku to the local dev database"
    task :load => :environment do
      if Rails.env == 'development'
        Bundler.with_clean_env do
          config = Rails.configuration.database_configuration[Rails.env]
          system <<-CMD
            heroku pgbackups:capture --expire
            curl -o latest.dump `heroku pgbackups:url`
            pg_restore --verbose --clean --no-acl --no-owner -h localhost \
              -U #{config["username"]} -d #{config["database"]} latest.dump
            rm -rf latest.dump
          CMD
        end
      end
    end

    # https://devcenter.heroku.com/articles/scheduler
    desc 'Update the materialized view(s)'
    task :update_materialized_view => :environment do
      sql = 'REFRESH MATERIALIZED VIEW beacon_readings_lead_lag_mv;'
      ActiveRecord::Base.connection.execute(sql)
    end
  end
end

If you use Heroku you can this up with scheduler to run every 10 minutes. I included an additional task that I find useful to populate my local database with live data.


kthxbai


These are techniques that I find myself using when an app goes from simple CRUD to having to make sense of complex data. I hope that you found this series interesting and useful. I also give you permission to occasionally use raw SQL to take advantage of the features that your database brings to bear. You chose it for a reason, so use it.

Now go make big, interesting things.

1 Comment

Filed under programming, ruby, SQL