Tag Archives: ruby

My Twitter Bot

Almost a year ago I wrote about putting a Twitter Bot On Heroku. A few months ago my bot stopped responding. This was about the time that Heroku changed their billing plans, but to be honest I don’t know why it went offline.

Today I moved it over to EC2, and I am going to lift the veil and reveal who my Twitter bot is.

EC2 Setup

I picked an Ubuntu image because I prefer Ubuntu over Centos/RHEL. This doesn’t need a lot of horsepower, so the free tier should be just fine. This is also just a silly twitter bot.

So let’s get the foundation in place:

sudo apt-get update
sudo apt-get install git-core curl zlib1g-dev build-essential libssl-dev libreadline-dev libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt1-dev libcurl4-openssl-dev python-software-properties libffi-dev

And now we need to install ruby. I don’t like installing a ruby version manager on a server. This is a dedicated host, and I control the deployment environment. So I’ll just install ruby and gems at the system level.

cd
wget http://ftp.ruby-lang.org/pub/ruby/2.2/ruby-2.2.3.tar.gz
tar -xzvf ruby-2.2.3.tar.gz
cd ruby-2.2.3/
./configure
make
sudo make install
sudo gem install bundler --no-rdoc --no-ri

Super. Now we can ruby.

The Code

You could do some fancy deploy with Docker, and I thought about doing that. But this is just a silly Twitter bot. So instead I have just a single standalone EC2 instance, and I’ll clone the repo.

Yes, I wrote a Taylor Swift Twitter Bot.

Change into the project directory and run sudo bundle to install the gems. Bundler will complain about running as root, but we don’t care. We are running with scissors on Twitter.

Make It Go

Now we just need to make this thing run, and do that in such a way that if the server reboots that it keeps running. We use an init.d script for that.

I put this simple script in /etc/init.d:

cd home/ubuntu/swift-bot-tay-tay
dotenv rake

I still have the Rakefile from the Heroku setup, and I put a .env file on the box with the credentials. This is a cheesy script, and there are no “environments” so the dotenv gem works just fine.

*Disclaimer: that may not, in fact, be a proper init.d script. I actually kicked the script off from the command line using nohup dotenv rake & and if the server bounces we’ll just see what happens. We are running with scissors, after all.

Celebrate

It’s fun to play with a completely ridiculous project. I’m also amazed how something as simple and silly as a twitter bot elicits as much of a reaction as it does. @SwiftBotTayTay has nearly 900 followers and regularly gets responses to her tweets. Most of them are positive, too.

That makes me smile.

Leave a Comment

Filed under programming, ruby

Language Comparison Showdown

In my day job I work in a lab environment where we constantly experiment and explore. We recently did a project where we had several calls out to different APIs. Some of the calls were dependent, and some could be concurrent. We typically reach for Ruby when we need to stand up a server because it’s a language that we can move quickly in. When you work on several different projects in a quarter, moving fast is something we value and optimize for.

However, Ruby is not good at concurrency — at least not in the way we needed for that project. We reached for Scala in that project, and it worked great. One of our team members has done a fair amount of work in Java-land and Scala, so we had a good bridge and it was pretty fun.

When that project wrapped I was asked to take a look at how that server would look in a couple of alternative languages. That kicked off some conversations that ended up with this apples-to-apples comparison across several languages.

The project for this experiment is a simple one that is the foundation of most of what we do on the back-end in the lab: consume some API and do something with the JSON response.

I created a simple Rack app. Rack is a very thin ruby server. The app serves a JSON payload with a string, integer, floating point number, and date. It’s not a fancy multi-threaded webserver, and it’s running locally. The response time for a single request is 0m0.030s (according to time curl http://localhost:9292).

Here is the server repo: https://github.com/barrettclark/language-comparison-server

I wrote clients in the following languages:
* Clojure
* Elixir
* Go
* Node
* Python
* Rust
* Scala (Finagle)
* Swift 2.0

I think all of these languages handle concurrency to one degree or another. I didn’t really test that aspect of the languages in this exploration. This was more about just getting up and running, and seeing how it felt to just issue the request and handle the response. For the most part these are not languages that I have actually used.

That bears repeating: I don’t really know most of these languages. I’m just some guy who made a simple thing in a lot of languages. I wanted to see how easy it is to get up and running with these. The example isn’t completely contrived, but it’s also not necessarily how you would actually do stuff in the given language. I’m OK with that. I was looking to write code that was as idiomatic as I could muster using the standard lib to the extent possible to compare as apples to apples as possible.

And with that, I give you a big table:

Language Typed Compiled Paradigm (Wikipedia) Ceremony Project Setup 100 Requests
Clojure No Yes Functional High lein new app foo 0m3.049s
Elixir No Yes Functional Low mix new foo 0m0.633s
Go Yes Yes Imperative Low Manually create file(s) 0m1.359s
Node No No Event-driven High npm init is the simplest form 0m23.823s
Python No No Functional, OO, Imperative, Procedural Low Manually create file(s) 0m7.319s
Rust Yes Yes Functional, OO, Imperative Low cargo new foo --bin 0m1.287s
Scala Yes Yes Functional, OO, Imperative High Manually create file(s) 0m3.978s
Swift Yes Yes Functional, OO, Imperative High Create a new Xcode project (for now) 11 ms

Running the benchmarks:

  • Clojure: lein uberjar && time java -jar target/uberjar/client-0.1.0-SNAPSHOT-standalone.jar
  • Elixir: time ../../repeat.sh mix run lib/client.ex
  • Go: go build && time ../repeat.sh client-go
  • Node: time ../repeat.sh node index.js
  • Python: time ../repeat.sh python client.py
  • Rust: time ../../repeat.sh target/debug/client
  • Scala: ./sbt universal:packageZipTarball && tar xzvf target/universal/client-1.0.tgz && ./repeat.sh ./target/universal/client-1.0/bin/client
  • Swift: ./client.swift

A note on Clojure and Scala: I created loops that iterated 100 times inside the compiled code. The JVM startup time is about 2 seconds, so calling into the compiled code cold each time incurs that additional cost. The total time for each was just over 4 minutes when you include the JVM startup in each of the 100 calls.

Similarly, with Elixir, each time to run mix run it fires up the Erlang VM. The total time for 100 full executions was 1m11.299s.

Ceremony refers to the number of hoops you have to line up in order to make code in that environment. A JVM-based language definitely has a lot of hoops to jump through. Dealing with Xcode is a lot of ceremony. Node, with all it’s package and library management, is a lot of ceremony. Rails is also pretty high ceremony now, but I didn’t use Rails in this experiment.

Some Thoughts On All These Languages

Clojure

The repo: https://github.com/barrettclark/language-comparison-client-clojure

I’ve never done LISP. I didn’t use RPN on my HP calculator. I’m not really all that into Java and find that ecosystem a symphony of sadness. In fact, while playing with Clojure I updated openssl, which totally hosed the entire Java ecosystem on my laptop. Leiningen couldn’t pull packages from the Maven repo. I reinstalled Java, and all was right again.

If I worked with someone who was into Clojure I’m sure that would help a lot. Community is super helpful. I don’t, though, and it hasn’t really grabbed me yet.

Elixir

The repo: https://github.com/barrettclark/language-comparison-client-elixir

I first looked at Elixir a couple of years ago when Dave Thomas got excited about it and released his book (as an eBook originally). It was my first taste of functional programming. I did a few exercises on Exercism, but never had a practical application to build. I’ve wanted to come back to it.

On the first pass through this exercise I used HTTPoison to issue requests and Poison to parse the JSON. I found them extremely cumbersome, and felt like maybe Elixir wasn’t for me.

I mentioned that I found Elixir cumbersome on Twitter, and immediately had a handful of replies asking why. They were genuine questions, wondering what it was that didn’t work for me. Remember when I said that community is important? Elixir has a fantastic community who cares deeply about making the language feel good — to the point that confusing error messages are considered bugs.

I took another look with a more simplified solution. It felt good. I also had a handful of people contribute to that repo, which was I really appreciated.

Go

The repo: https://github.com/barrettclark/language-comparison-client-go

At first Go didn’t really strike a chord with me. It felt weird how simple and small the language was. I missed having constructs like map and reduce when working with a collection. Go relies heavily on pointers.

Side effects are a first-class citizen in Go.

Go is also the only language I’ve worked with that cares where you put your code.

However, I think I’ve come around on the language. It is simple, and incredibly powerful. It’s designed to help developers move fast. I want to do more with it. It’s also really easy to parse JSON in Go.

Node

The repo: https://github.com/barrettclark/language-comparison-client-node

I don’t hate Javascript, but I also don’t want to have “one language to rule them all”. I find grunt, gulp, bower, npm, whatevs overly convoluted.

I come from a Ruby background. Convention is a big thing in the Ruby community. I’ve found it really hard to discern a convention in Node.

Python

The repo: https://github.com/barrettclark/language-comparison-client-python

I wanted to go with each language’s standard lib to the extent possible, so I stayed away from Django. I still don’t know how to actually make a “python project”.

The code was pretty easy to write. The standard library looks pretty powerful. The onboarding documentation is a bit of a wall of information (once you find it). It’s a fine language. Being a rubyist, I would probably stick with Ruby when I wanted to use a scripting language, but I believe in horses for courses. There is definitely a place for Python.

Rust

The repo: https://github.com/barrettclark/language-comparison-client-rust

I was blown away with the Rust onboarding documentation. This is another language where you can tell that the community really cares. They also consider confusing error messages a bug in the Rust community. The language is still really young and changing rapidly. It’s been fun to play with.

Scala (Finagle)

The repo: https://github.com/barrettclark/language-comparison-client-scala

Scala, like Clojure, sits on the JVM. It uses a slightly different build process (sbt) that worked better for me than lein. I also work with someone who knows Scala, and that definitely helps take the pain out of lining up the pieces.

From my brief experience with Scala, the base standard library is really just Java. Idiomatic Scala seems to come in with additional layers. We used Finagle at work, and I ended up using that in this experiment.

There is a lot of ceremony and manual setup, but once you get the pieces in place getting and parsing JSON is really nice (especially with Argonaut). A downside of the Scala/Finagle layer cake is that documentation can be a challenge. It was hard for me to discern where to look and how to ask the right questions in The Google.

Swift (2.0)

The repo: https://github.com/barrettclark/language-comparison-client-swift

Later this year we will be able to run Swift on servers. It’s a bit of an odd fit in this comparison. You can run it from the command line, but you wouldn’t. I wasn’t able to get a good time benchmark from the command line, either.

Still, I really like the Swift language. Swift 2.0 has some good improvements. I’ve tried several different approaches to making HTTP calls in Swift. I used NSURLConnection.sendAsynchronousRequest:queue:completionHandler: for a long time. I’ve used Alamofire, which is nice but a really big library. SwiftHTTP is also a nice library that’s a lot simpler. JSON parsing can be done several different ways, too.

A pattern that I really like uses the promises/futures pattern. BrightFutures is a pretty nice library for that. Add in Thoughtbot’s Argo (like their Scala Argonaut) library to parse JSON, and you’ve got a really nice setup.

Xcode still struggles when you go too far into the FP rabbit hole, so there’s that. Still, I’d much rather build a native app than a responsive HTML site, in part because I really like Swift.

Conclusion

It’s fun to look around and see what other communities and languages are doing. I’m still not really comfortable in a FP world, but I like it.

I think my Power Rankings (in order of what I would want to use on my next back-end project) from this experiment are:

  1. Scala Finagle
  2. Go or Elixir
  3. Rust
  4. Node
  5. Python
  6. Clojure

I’m actually torn between Finagle, Go, Elixir, and also Rust. I work with people who do Scala and Elixir, so they have a bit of an advantage. I also really want to play more with Go and Rust. You can deploy both Go and Rust to Heroku, so maybe I’ll sneak them in on a smaller project. Swift doesn’t make the list yet because it doesn’t run on the server — yet.

This Is All Arbitrary

A final thought about all this. I did a very simple version of a very common task that I do at work. The benchmarks aren’t perfect. What do I even know about any of these languages? This was an experiment. I appreciate that I get to do these sorts of things.

The next time a project comes up, we will have some more talking points to help guide technology decisions, and maybe you will find this helpful as well.

Thank you.

8 Comments

Filed under programming

Using Ruby To Make iCal Events, Part 2

Earlier I discussed creating calendar events with Ruby to take the pain out of creating a lot of events. I just finished up a script that created 126 events in seconds. Here are the relevant bits (in addition to the earlier post).

Input

My input CSV looks like this (below). It’s Hal Higdon’s Intermediate 1 plan.

Date,Week,Day,Description
6/8/2015,1,MON,Cross
6/9/2015,1,TUE,3 miles
6/10/2015,1,WED,5 miles
6/11/2015,1,THU,3 miles
6/12/2015,1,FRI,Rest
6/13/2015,1,SAT,5 miles (pace)
6/14/2015,1,SUN,8 miles
6/15/2015,2,MON,Cross
6/16/2015,2,TUE,3 miles

I copied the grid at the bottom and pasted it into Excel. I created another tab to handle the ETL. You can highlight a row and paste special (right click shows you this option) to transpose the cells. That allows you to take a row and paste it as a column. Yes, I can Excel pretty good. Excel sucks at regex though, so I let Ruby handle teasing out the meaning of each of the workouts.

The Code

  def read_input(infile)
    CSV.foreach(infile, :headers => true) do |row|
      m, d, y     = row['Date'].split(/\//).map(&:to_i)
      date        = DateTime.civil(y, m, d, 7, 0, 0)
      description = row['Description']
      next if description == "Rest"
      minutes = if description.match(/^\d{1,2}/)
                  description.match(/^\d{1,2}/)[0].to_i * 10
                else
                  30
                end
      event = Icalendar::Event.new
      event.dtstart  = date
      event.duration = "P#{minutes}M"
      event.summary  = description
      event.comment  = "Event created with Ruby script"
      @cal.add_event(event)
    end
  end

It’s not pretty. That method is doing a lot. The event comment isn’t saved either. That’s OK. I still got what I need — a whole lot of events that I didn’t have to type myself, and a script (and process) that I can use moving forward.

Win-win.

Leave a Comment

Filed under programming, ruby

Using Ruby To Create iCal Events

Calendars can be so tedious, so I hacked mine.

Anytime we have a new season begin, or I start a new running training plan, I have a bunch of events that need to be created. Usually you just sit down and take the half hour (or whatever) to bang all the events out. You can copy and paste events in iCal from day to day, which helps. But it still sucks.

So I finally sat down and figured out how to script it.

Gemfile:

# A sample Gemfile
source "https://rubygems.org"
ruby '2.2.2'

gem "icalendar"

Script (make_events.rb):

require "rubygems"
require "bundler"
require "csv"
require "date"
Bundler.require

### Create iCal ics file with multiple events
class EventCreator
  attr_reader :cal

  def initialize
    @cal = Icalendar::Calendar.new
    make_events
  end

  def make_events
    dt = DateTime.civil(2015, 6, 6, 0, 7, 0)
    3.times do |i|
      event          = Icalendar::Event.new
      event.dtstart  = dt + i
      event.duration = "P30M"
      event.summary  = "Test Event #{i}!"
      @cal.add_event(event)
    end
  end

  def to_ics
    File.open("scripted.ics", "w") { |f| f.write @cal.to_ical }
  end
end

if __FILE__ == $0
  calendar = EventCreator.new
  calendar.to_ics
end

So all you have to do from the command line, after you’ve run bundle install is ruby make_events.rb. It will create a single ics file that you can import into iCal. When you open the file you will be asked what calendar you want to import the events into.

Note: The gem does not appear to support “full day” events. That’s ok. Next up for me is to create a running training plan given a CSV file. I’ll post that next.

4 Comments

Filed under programming, ruby

RailsConf 2015

I recently had the privilege of being a speaker and a guide at RailsConf in Atlanta.

First, a little about the Guide program. Guides are mentors to Opportunity Scholarship recipients, who are generally new to the Ruby/Rails communities or maybe new to programming in general. The guides are there to help make RailsConf a little easier and less intimidating. There are a lot of people, and I confess that I still get intimidated at times.

We were paired before the conference, and I reached out to my scholar (Eduardo) in advance to get to know him a little. There was also a Guide Mixer the night before the conference began that gave everyone an opportunity to meet each other. It’s always fun to be around a group of people who are really open and want to help. That’s one of the things that I love about the ruby community in general.

I will also confess that I’m generally fairly introverted, and being a guide was also a good way to help me break out of that shell. There were several times when I would see someone and introduce them to Eduardo. It was also great just meeting the fellow guides and having them as a peer set.

The guide program was great. Be on the look out for it at RubyConf.

OK. Now let’s shift our focus to the conference itself.

There were somewhere between 1500 and 2000 people in attendance. I wanted to get out of my introvert/ambivert shell and get the most out of being around so many people in my community. I got to Atlanta on Sunday to see some family and was at the hotel by mid-afternoon. I had already made a connection with someone via Twitter about going for a run during the conference, and we decided to meet in the lobby Sunday afternoon to talk shop and see about some dinner.

This is where the magic really began. At any given point during my time there I was with someone who knew someone who knew someone…. Ad hoc groups would form and start talking and before you knew it you’ve made new friends and are going somewhere to eat, or go to a talk together, or make plans to catch up later that evening. You couldn’t turn around in (or around) the hotel without seeing someone (or a group of people) from the conference.

I was a little nervous about my talk. I gave a fairly technical talk that involved a lot of code. Would there be some hotshot in the crowd who wanted to show me up? No. It went well (I think). I had some great questions and conversations after.

The last evening there was a giant mass of people who all wanted to be together, so we split up into 3 large groups. The restaurant wasn’t able to get any of us close, sadly. It was still a lovely evening. I hated saying goodnight to everyone, but I was spent.

There were some great talks throughout the week. 6 tracks going at the same time with 1 being hands-on workshops. I’m always inspired seeing what people are talking about and exploring. It was even more fun seeing my friends giving their talks. More than any of that, though, was an overwhelming sense of community. I made a lot of new friends and deepened existing friendships.

I look forward to seeing my ruby friends again – sooner than later I hope.

1 Comment

Filed under programming, ruby

Twitter Bot On Heroku

I made a twitter bot a week or two ago, and it has been a lot of fun watching people’s reactions to it. I won’t tell you what my triggers are or who the bot is because I think that’s part of the game. The bot lives entirely on it’s own. What I will tell you is how to host a Twitter bot on Heroku.

The Bot Code

You’ll use the Streaming API to search for stuff, and then the REST API to post “responses”. I followed this example.

require 'rubygems'
require 'bundler/setup'
Bundler.require

class Bot
  def self.go
    while true
      begin
        config = {
          :consumer_key        => ENV['TWITTER_CONSUMER_KEY'],
          :consumer_secret     => ENV['TWITTER_CONSUMER_SECRET'],
          :access_token        => ENV['TWITTER_ACCESS_TOKEN'],
          :access_token_secret => ENV['TWITTER_ACCESS_TOKEN_SECRET']
        }
        rClient = Twitter::REST::Client.new(config)
        sClient = Twitter::Streaming::Client.new(config)

        # topics to watch
        topics = ['hello', 'world', '#helloworld']
        sClient.filter(:track => topics.join(',')) do |tweet|
          next if tweet.text.match(/#{rClient.user.screen_name}/i)
          if tweet.is_a?(Twitter::Tweet) && !tweet.text.match(/^RT/) && tweet.lang == 'en'
            p "@#{tweet.user.screen_name}: #{tweet.text} (ID: #{tweet.id}) (#{tweet.lang})"
            rClient.update("@#{tweet.user.screen_name} Pithy response", in_reply_to_status_id: tweet.id)
          end
        end
      rescue Exception => e
        puts 'error occurred, waiting for 5 seconds' + e.class.to_s
        sleep 5
      end
    end
  end
end

I put the bot in a class, so it’s a classy bot. It also is a bot that will run FOREVER. I am also using a Gemfile and I set some environment variables for all the secret stuff. You can set those from the command line:

heroku config:set TWITTER_CONSUMER_KEY=SOSEKRET
etc…

Gemfile

The Gemfile is really simple. You just need the Twitter gem.

source "https://rubygems.org"
ruby '2.1.4'

gem "twitter"

bundle init is a quick and easy way to get a basic Gemfile that you can start adding stuff to. Those first 3 lines at the top of the script is how you require the contents of the Gemfile in a plain old ruby script. There is a big world that exists off-rails, and I recommend you check it out.

Make It Go

You now have everything that you need to make it go. You can do it in irb to see how it performs. I would recommend you comment out the line that makes the response until you’re sure it does what you want.

To run it in irb, set your environment variables, or copy the secret stuff into your script, and then run:

irb -r ./bot.rb

That’ll put you at the command prompt in irb where you can run Bot.go and watch it go.

What About Heroku?

Right, Heroku. If you push this as it is to Heroku, it won’t do anything. You need a Procfile. It looks like this:

bundle exec rake

Yes really.

Rake is awesome, and custom rake tasks are the bee’s knees. So let’s make one.

require './bot.rb'

desc "Bot!"
task :bot do
  Bot.go
end

task :default => 'bot'

There is a rake task bot, and the default rake task is set to that, so if you simply run bundle exec rake you get a running bot.

Push those 4 files to Heroku, and you’ve got yourself a twitter bot!

Leave a Comment

Filed under programming, ruby

SQL Is Your Friend, Part 4

So far we’ve talked about the JSON datatype with the first part. Postgres 9.4 brings some new features that enhance it’s JSON capabilities even more, but I haven’t had a chance to play with the new shiny yet.

In part 2 we talked about using window functions. We also made our own functions in both to help extend functionality or DRY up repetitive queries.

I went over a couple of subquery techniques in part 3. These are exciting to me because it gives me even more capabilities to make sense of data at the data level. With windowing and subquerying I can give context to the data in-place, without having to pull it down and write piles of app server code. I like that.

Today we are going to talk about a performance optimization. What do you do when you’ve got the data indexed well, but the query still doesn’t respond in a quick enough time? As a measuring stick, I give a critical eye to anything that takes more than 500 milliseconds. Once it nears a full second I start looking for options.


Materialized Views


Postgres gives you the ability to save (materialize) the results from a query. It’s not a temp table, or even a table, but you can give the data indexes and query against it the same as a table.

-- Store off the query results
CREATE MATERIALIZED VIEW readings_mv AS
SELECT *, LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
FROM readings
WHERE reading_timestamp >= NOW() - '1 day'::INTERVAL;

-- Add some indexes
CREATE INDEX index_readings_mv_on_major ON readings_mv USING btree(major ASC NULLS LAST);
CREATE INDEX index_readings_mv_on_phone ON readings_mv USING btree(phone COLLATE "default" ASC NULLS LAST);
CREATE INDEX index_readings_mv_on_reading_timestamp ON readings_mv USING btree(reading_timestamp ASC NULLS LAST);

Note the name: readings_mv. It’s a materialized view that consists of readings table data. That’s a convention that I picked up along the way to help communicate what database objects are being used. This is helping out your team and also future you.

In this particular example the query needs to run against the past 24 hours of data and do some calculations (windowing). Now we’ve taken the heavy lifting and calculating and done that in advance out of band from the request/response cycle. We can also do more efficient subqueries that rely upon this same data. You don’t have to index the materialized view, but I tend to mirror table indexes (or add new ones where it makes sense).

Check out the SQL Fiddle. Scroll all the way down in the left pane to see the creation of the materialized view.


Refresh The Data


That’s great, but I said that I wanted to query against the past 24 hours and time is a moving target.

No worries, you can refresh the data with a simple query:

REFRESH MATERIALIZED VIEW readings_mv;

Easy peasy. It blocks, though, so while the materialized view is being updated you have to wait. Or you had to wait. Postgres 9.4, which just dropped this week, added the ability to refresh a view CONCURRENTLY.

REFRESH MATERIALIZED VIEW CONCURRENTLY readings_mv;

Here is the Postgres 9.4 changelog (of sorts) and a little more explanation on the REFRESH CONCURRENTLY functionality as it was being included in trunk.

You can put this into a rake task as well:

namespace :db do
  namespace :heroku do
    desc "Load the PROD database from Heroku to the local dev database"
    task :load => :environment do
      if Rails.env == 'development'
        Bundler.with_clean_env do
          config = Rails.configuration.database_configuration[Rails.env]
          system <<-CMD
            heroku pgbackups:capture --expire
            curl -o latest.dump `heroku pgbackups:url`
            pg_restore --verbose --clean --no-acl --no-owner -h localhost \
              -U #{config["username"]} -d #{config["database"]} latest.dump
            rm -rf latest.dump
          CMD
        end
      end
    end

    # https://devcenter.heroku.com/articles/scheduler
    desc 'Update the materialized view(s)'
    task :update_materialized_view => :environment do
      sql = 'REFRESH MATERIALIZED VIEW beacon_readings_lead_lag_mv;'
      ActiveRecord::Base.connection.execute(sql)
    end
  end
end

If you use Heroku you can this up with scheduler to run every 10 minutes. I included an additional task that I find useful to populate my local database with live data.


kthxbai


These are techniques that I find myself using when an app goes from simple CRUD to having to make sense of complex data. I hope that you found this series interesting and useful. I also give you permission to occasionally use raw SQL to take advantage of the features that your database brings to bear. You chose it for a reason, so use it.

Now go make big, interesting things.

1 Comment

Filed under programming, ruby, SQL

SQL Is Your Friend, Part 3

Yo dawg, I heard you like queries. So I put a query in your query so you can query while you query.

Now we are starting to get into some really cool and powerful SQL. Subqueries! Let’s jump right in, shall we?


Subqueries For Change


I want to count up all the times something changed across some set of data. You could pull all the data down and iterate over it in your code, and maybe that runs just fine. You can also let the database do the heavy lifting and pull down just the results that you need.

Using the LEAD windowing function we can look forward one record to see what happens next, and include that in the “current” record. We saw that in yesterday’s post. That’s not quite what we need, though. Fortunately we can query against that query, or use that as a subquery for the query we actually want to run.

SELECT *
FROM (
  SELECT id,
         major,
         minor,
         LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
  FROM readings
) AS amazing_subquery
WHERE minor != next_minor;

Right, so what we have there is a nested subquery. The outer query runs against the output of the inner query. If you were to run an EXPLAIN plan against this you would see how many rows the inner query returns and at what cost. I do tend to run the query plans against any of these and then keep an eye on them as the data scales up.

See the nested subquery for yourself in this SQL Fiddle.


Subqueries For A Field


We can also pull in a specific data point with a subquery and stuff it into a row. Let’s say I just want to know the last reading ID for an event.

SELECT id,
         phone,
         major,
         minor,
         reading_timestamp,
         ( SELECT MAX(id)
           FROM readings r2
           WHERE r2.phone = r1.phone AND r2.id > r1.id
         ) AS last_id
FROM readings r1
ORDER BY 2, 5;

Sorcery! You can see that I refer to the main query in the subquery. That’s pretty handy, but you don’t have to. You could just ask Postgres to give you the max value from this or some other table.

The reading ID may not be the most useful thing, so maybe instead you pull a timestamp. If you did that then you could do math and calculate the duration, which Postgres can also do for you. You can also wrap that up in another outer query. That’s when I start looking to DRY up the base query into a function.

You can play with this some more in this SQL Fiddle


Common Table Expression (CTE)


You can see where all this subquerying can maybe get out of hand. I’ll confess that I have stopped myself when I’ve nested a few queries deep and asked if maybe there is a better way. I try to avoid Russian Dolls when I program. You can also JOIN on a subquery, which is powerful when you need it and also pretty ugly.

Enter the Common Table Expression, or CTE. This is also known as a WITH query. Let’s take a look at that first query and see how that would look in CTE.

WITH amazing_subquery AS (
  SELECT id,
         major,
         minor,
         LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
  FROM readings
)
SELECT *
FROM amazing_subquery
WHERE minor != next_minor;

Here is the SQL Fiddle. This is a pretty simplified example, so maybe that doesn’t so much knock your socks off.

Let’s take this up to 11, shall we?

WITH interval_query AS (
  -- Start by making a series of 12 time intervals
  SELECT (ts ||' hour')::INTERVAL AS hour_interval
  FROM generate_series(0,11) AS ts
), time_series AS (
  -- Then take that interval and convert it into a TIMESTAMP
  SELECT date_trunc('hour', now()) + INTERVAL '60 min' * ROUND(date_part('minute', now()) / 60.0) - interval_query.hour_interval AS start_time
  FROM interval_query
), time_intervals AS (
  -- Then take that TIMESTAMP and add an hour to it to make an hour timebox
  SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time
  FROM time_series
  ORDER BY start_time
)
-- RIGHT JOIN on those times s.t. we get all the times even when data is missing
SELECT f.start_time, f.end_time, r1.minor, count(DISTINCT r1.phone) readings
FROM readings r1
RIGHT JOIN time_intervals f
  ON r1.reading_timestamp >= f.start_time AND r1.reading_timestamp < f.end_time
GROUP BY f.start_time, f.end_time, r1.minor
ORDER BY f.start_time, r1.minor

BOOM. Time series data with counts. Here is that SQL Fiddle.


55502354


Now it is entirely possible that there are better ways to do this. I am no DBA. I’m just a simple caveman programmer who likes to take data out for a spin and see what it can do for me.

When I had projects in Sybase and SQL Server I would build these complex queries up over a series of temp tables. You can do that with Postgres, too. I just tend to prefer CTE and subqueries. However, there is another technique that Postgres brings to the table that I do use from time to time. Tomorrow.

1 Comment

Filed under programming, ruby, SQL

SQL Is Your Friend, Part 2

Yesterday I began to make the case that it’s OK to skirt your ORM and write some raw SQL every once in a while.

I’ll be honest with you. I have no idea how well the ActiveRecord postgres adapter handles the JSON datatype. I mentioned in that post that I always store entire JSON payloads. I also never select the entire column. I always pick what I want out, like I showed in the post.

Today, we have a brand spanking new version of Postgres! Version 9.4 shipped today, and it has an update that I had already planned to talk about. But we aren’t quite ready to talk about it yet.

I’m working through this blog series the same way I work through a refactoring process:

  • I have some data, and I want to query it.
  • Huh. That’s a giant pile of SQL. How can I clean that up?
  • Maybe there is a more performant way to access this gnarly data?

So today we will be in the first 2 bullets.


Windowing Functions


You have a lot of data. You can take that data and fold it on itself to see how one record is related to or changes from another.

Let’s say you want to look and see when someone has moved from one place to another. You can take a row and fold in data from the next row to make that comparison by using the LEAD windowing function. Like so:

SELECT major,
       minor,
       LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
FROM readings
ORDER BY reading_timestamp;

See the query in action with this SQL Fiddle

So what’s happening here?

We’ve asked for the next record — the LEAD. We use the OVER keyword to tell Postgres how to construct the window. This is how we fold the data. So I’ve said that I want the data sorted by the reading timestamps, and given that same sort I want to pull in the value for the next record’s minor column.


Repetition Is Lame


Now imagine that query is the basis for a couple of different queries. You can store that query in the database in a function. You saw how to create a function in the previous post. This is something that I actually do every once in a while.

I will, at this point, take a step back and say that I am not advocating that you stuff all of your business logic in the database. I have fought in that stored procedure holy war on the side of keeping business logic in the application that really needs to know about it.

I see this in a different light, though. This is a code optimization. I’m in control of the code and the logic, and I’ve decided where it makes the most sense for what code to run. Yes, the SQL in the function is compiled and optimized and whatever in Postgres, but I don’t honestly know how important that is. This is just me DRYing up redundant queries and making joins less hairy.

CREATE OR REPLACE FUNCTION readings_with_lead()
RETURNS TABLE (
  id INT, phone TEXT, major INT, minor INT,
  reading_timestamp TIMESTAMP,
  prev_beacon_minor INT, next_beacon_minor INT
)
LANGUAGE sql
AS $$
SELECT
  id,
  phone,
  major,
  minor,
  reading_timestamp,
  lag(minor) OVER (ORDER BY phone, reading_timestamp) AS prev_beacon_minor,
  lead(minor) OVER (ORDER BY phone, reading_timestamp) AS next_beacon_minor
FROM readings
ORDER BY phone, reading_timestamp;$$

This gives us a nice way to call on an enriched version of our data. It quacks like a table, too, so you can join on it and do anything else you would do with a normal table.

SELECT *
FROM readings_with_lead()
WHERE next_beacon_minor IS NOT NULL;

See the function in action with this SQL Fiddle


Leveling Up Your Windowing


Now I want to take the data and start to piece together sequences of events. What I really want to do is put together specific events in sequence. And I can totally do that with a windowing function:

SELECT phone,
       major,
       minor,
       reading_timestamp,
       FIRST_VALUE(reading_timestamp) OVER (PARTITION BY phone, major ORDER BY phone, reading_timestamp)
FROM readings
ORDER BY 1, 4;

I’ve asked Postgres to give me the first reading_timestamp value across every row in a window. That window is defined as a grouping of like phone numbers and major values, and those partitions are also sorted.

See it in action with this SQL Fiddle

Once you have this information you can start looking at all sorts of interesting things. How long was an event? What happened in that event?

We will look at some more techniques that make this even more interesting soon. Next up: subqueries.

8 Comments

Filed under programming, ruby, SQL

SQL Is Your Friend, Part 1

This is the kickoff of a series of posts on using Postgres and SQL to do some interesting things that you may not have known were possible. If you typically use an ORM, like ActiveRecord, you probably do not write a lot of raw SQL. I like Rails, and ActiveRecord has it’s place. This is not a rant where I shake my fist in the air and tell ActiveRecord to get off my lawn.

Instead I want to highlight a handful of techniques that I find really useful, and also perhaps suggest that it’s OK to venture off the golden path and do a little raw SQL where it makes sense. The fact of the matter is that you cannot simply switch databases and have the ORM just handle it for you. At least not once your application starts to get interesting. So use the right tool for the job, and I hope to highlight a few tools that Postgres brings to the table.

TL;DR: SQL is your friend.


JSON Datatype


You may already be familiar with the HSTORE datatype that came in Postgres 9.0. It’s nice, but does not handle nesting. In Postgres 9.2 we got something even awesomer: the JSON datatype. It’s pretty spectacular, and I have used it to store entire raw payloads now anytime I consume resources.  You can also use this datatype in ActiveRecord migrations without requiring any additional gems.

What makes the JSON datatype especially nice is that you can reach into the data and pull out specific elements. You can also create custom functions and even operators to do even more powerful things with the data in the JSON column.

Here is an example that takes a column called “beacons” which contains a list of bluetooth beacons that a phone has ranged. We want to look into that list and pull out the minor number from the closest beacon, and the accuracy from the next closest beacon. We also cast values from the JSON datatype to a more meaningful datatype. Yes, the cast to TEXT and then to the actual datatype is a little awkward, but it works.

SELECT beacons->0,
  json_array_length(beacons),
  json_extract_path(beacons->0, 'minor')::TEXT::INT AS minor2,
  json_extract_path(beacons->1, 'accuracy')::TEXT::FLOAT AS accuracy2
FROM readings;

See this code in action with this SQL Fiddle


Custom Function and Operator


There are a lot of good functions in Postgres to handle the JSON datatype, but let’s say that I want to do something that’s not built in. I want to look at ALL of the values for a given key.

Take a look at this discussion. You can create a function in Postgres, and even create a new operator for that function:

CREATE OR REPLACE FUNCTION json_pluck(j json, field TEXT)
RETURNS TEXT[]
LANGUAGE sql
AS
$$ SELECT array_agg(j->>$2) FROM json_array_elements($1) j $$;

DROP OPERATOR IF EXISTS | (json, text);
CREATE OPERATOR | (procedure = json_pluck, leftarg = 'json', rightarg = 'text');

Wait. What even is that?! You create a function called json_pluck. Then you create a custom operator | that allows you to invoke json_pluck by simply calling the pipe character and telling it for what key to return the values.

Like so:

SELECT beacons->0,
  json_array_length(beacons),
  json_extract_path(beacons->0, 'minor')::TEXT::INT AS minor2,
  json_extract_path(beacons->1, 'accuracy')::TEXT::FLOAT AS accuracy2,
  beacons | 'minor' AS minors
FROM readings;

You’ll note that I use CREATE OR REPLACE wherever possible. You can’t do that with an operator, though, so I preface that with a guard to drop it if it already exists, and then create it.

See this code in action with this SQL Fiddle

Yeah. That’s pretty cool, no? You can drop all of those into ActiveRecord migrations, too:

class CreateFunctionJsonPluck < ActiveRecord::Migration
  def up
    sql = <<-SQL
      CREATE OR REPLACE FUNCTION json_pluck(j json, field TEXT)
      RETURNS TEXT[]
      LANGUAGE sql
      AS
      $$ SELECT array_agg(j->>$2) FROM json_array_elements($1) j $$;
    SQL
    ActiveRecord::Base.connection.execute(sql)

    sql = <<-SQL
      DROP OPERATOR IF EXISTS | (json, text);
      CREATE OPERATOR | (procedure = json_pluck, leftarg = 'json', rightarg = 'text');
    SQL
    ActiveRecord::Base.connection.execute(sql)
  end

  def down
    sql = "DROP OPERATOR IF EXISTS | (json, text);"
    ActiveRecord::Base.connection.execute(sql)
    sql = "DROP FUNCTION IF EXISTS json_pluck(j json, field TEXT);"
    ActiveRecord::Base.connection.execute(sql)
  end
end

So that’s the JSON datatype. In this series plan to discuss window functions, subqueries, WITH queries (Common Table Expression), and materialized views.

8 Comments

Filed under programming, ruby, SQL