Category Archives: ruby

My Twitter Bot

Almost a year ago I wrote about putting a Twitter Bot On Heroku. A few months ago my bot stopped responding. This was about the time that Heroku changed their billing plans, but to be honest I don’t know why it went offline.

Today I moved it over to EC2, and I am going to lift the veil and reveal who my Twitter bot is.

EC2 Setup

I picked an Ubuntu image because I prefer Ubuntu over Centos/RHEL. This doesn’t need a lot of horsepower, so the free tier should be just fine. This is also just a silly twitter bot.

So let’s get the foundation in place:

sudo apt-get update
sudo apt-get install git-core curl zlib1g-dev build-essential libssl-dev libreadline-dev libyaml-dev libsqlite3-dev sqlite3 libxml2-dev libxslt1-dev libcurl4-openssl-dev python-software-properties libffi-dev

And now we need to install ruby. I don’t like installing a ruby version manager on a server. This is a dedicated host, and I control the deployment environment. So I’ll just install ruby and gems at the system level.

cd
wget http://ftp.ruby-lang.org/pub/ruby/2.2/ruby-2.2.3.tar.gz
tar -xzvf ruby-2.2.3.tar.gz
cd ruby-2.2.3/
./configure
make
sudo make install
sudo gem install bundler --no-rdoc --no-ri

Super. Now we can ruby.

The Code

You could do some fancy deploy with Docker, and I thought about doing that. But this is just a silly Twitter bot. So instead I have just a single standalone EC2 instance, and I’ll clone the repo.

Yes, I wrote a Taylor Swift Twitter Bot.

Change into the project directory and run sudo bundle to install the gems. Bundler will complain about running as root, but we don’t care. We are running with scissors on Twitter.

Make It Go

Now we just need to make this thing run, and do that in such a way that if the server reboots that it keeps running. We use an init.d script for that.

I put this simple script in /etc/init.d:

cd home/ubuntu/swift-bot-tay-tay
dotenv rake

I still have the Rakefile from the Heroku setup, and I put a .env file on the box with the credentials. This is a cheesy script, and there are no “environments” so the dotenv gem works just fine.

*Disclaimer: that may not, in fact, be a proper init.d script. I actually kicked the script off from the command line using nohup dotenv rake & and if the server bounces we’ll just see what happens. We are running with scissors, after all.

Celebrate

It’s fun to play with a completely ridiculous project. I’m also amazed how something as simple and silly as a twitter bot elicits as much of a reaction as it does. @SwiftBotTayTay has nearly 900 followers and regularly gets responses to her tweets. Most of them are positive, too.

That makes me smile.

Leave a Comment

Filed under programming, ruby

Using Ruby To Make iCal Events, Part 2

Earlier I discussed creating calendar events with Ruby to take the pain out of creating a lot of events. I just finished up a script that created 126 events in seconds. Here are the relevant bits (in addition to the earlier post).

Input

My input CSV looks like this (below). It’s Hal Higdon’s Intermediate 1 plan.

Date,Week,Day,Description
6/8/2015,1,MON,Cross
6/9/2015,1,TUE,3 miles
6/10/2015,1,WED,5 miles
6/11/2015,1,THU,3 miles
6/12/2015,1,FRI,Rest
6/13/2015,1,SAT,5 miles (pace)
6/14/2015,1,SUN,8 miles
6/15/2015,2,MON,Cross
6/16/2015,2,TUE,3 miles

I copied the grid at the bottom and pasted it into Excel. I created another tab to handle the ETL. You can highlight a row and paste special (right click shows you this option) to transpose the cells. That allows you to take a row and paste it as a column. Yes, I can Excel pretty good. Excel sucks at regex though, so I let Ruby handle teasing out the meaning of each of the workouts.

The Code

  def read_input(infile)
    CSV.foreach(infile, :headers => true) do |row|
      m, d, y     = row['Date'].split(/\//).map(&:to_i)
      date        = DateTime.civil(y, m, d, 7, 0, 0)
      description = row['Description']
      next if description == "Rest"
      minutes = if description.match(/^\d{1,2}/)
                  description.match(/^\d{1,2}/)[0].to_i * 10
                else
                  30
                end
      event = Icalendar::Event.new
      event.dtstart  = date
      event.duration = "P#{minutes}M"
      event.summary  = description
      event.comment  = "Event created with Ruby script"
      @cal.add_event(event)
    end
  end

It’s not pretty. That method is doing a lot. The event comment isn’t saved either. That’s OK. I still got what I need — a whole lot of events that I didn’t have to type myself, and a script (and process) that I can use moving forward.

Win-win.

Leave a Comment

Filed under programming, ruby

Using Ruby To Create iCal Events

Calendars can be so tedious, so I hacked mine.

Anytime we have a new season begin, or I start a new running training plan, I have a bunch of events that need to be created. Usually you just sit down and take the half hour (or whatever) to bang all the events out. You can copy and paste events in iCal from day to day, which helps. But it still sucks.

So I finally sat down and figured out how to script it.

Gemfile:

# A sample Gemfile
source "https://rubygems.org"
ruby '2.2.2'

gem "icalendar"

Script (make_events.rb):

require "rubygems"
require "bundler"
require "csv"
require "date"
Bundler.require

### Create iCal ics file with multiple events
class EventCreator
  attr_reader :cal

  def initialize
    @cal = Icalendar::Calendar.new
    make_events
  end

  def make_events
    dt = DateTime.civil(2015, 6, 6, 0, 7, 0)
    3.times do |i|
      event          = Icalendar::Event.new
      event.dtstart  = dt + i
      event.duration = "P30M"
      event.summary  = "Test Event #{i}!"
      @cal.add_event(event)
    end
  end

  def to_ics
    File.open("scripted.ics", "w") { |f| f.write @cal.to_ical }
  end
end

if __FILE__ == $0
  calendar = EventCreator.new
  calendar.to_ics
end

So all you have to do from the command line, after you’ve run bundle install is ruby make_events.rb. It will create a single ics file that you can import into iCal. When you open the file you will be asked what calendar you want to import the events into.

Note: The gem does not appear to support “full day” events. That’s ok. Next up for me is to create a running training plan given a CSV file. I’ll post that next.

4 Comments

Filed under programming, ruby

RailsConf 2015

I recently had the privilege of being a speaker and a guide at RailsConf in Atlanta.

First, a little about the Guide program. Guides are mentors to Opportunity Scholarship recipients, who are generally new to the Ruby/Rails communities or maybe new to programming in general. The guides are there to help make RailsConf a little easier and less intimidating. There are a lot of people, and I confess that I still get intimidated at times.

We were paired before the conference, and I reached out to my scholar (Eduardo) in advance to get to know him a little. There was also a Guide Mixer the night before the conference began that gave everyone an opportunity to meet each other. It’s always fun to be around a group of people who are really open and want to help. That’s one of the things that I love about the ruby community in general.

I will also confess that I’m generally fairly introverted, and being a guide was also a good way to help me break out of that shell. There were several times when I would see someone and introduce them to Eduardo. It was also great just meeting the fellow guides and having them as a peer set.

The guide program was great. Be on the look out for it at RubyConf.

OK. Now let’s shift our focus to the conference itself.

There were somewhere between 1500 and 2000 people in attendance. I wanted to get out of my introvert/ambivert shell and get the most out of being around so many people in my community. I got to Atlanta on Sunday to see some family and was at the hotel by mid-afternoon. I had already made a connection with someone via Twitter about going for a run during the conference, and we decided to meet in the lobby Sunday afternoon to talk shop and see about some dinner.

This is where the magic really began. At any given point during my time there I was with someone who knew someone who knew someone…. Ad hoc groups would form and start talking and before you knew it you’ve made new friends and are going somewhere to eat, or go to a talk together, or make plans to catch up later that evening. You couldn’t turn around in (or around) the hotel without seeing someone (or a group of people) from the conference.

I was a little nervous about my talk. I gave a fairly technical talk that involved a lot of code. Would there be some hotshot in the crowd who wanted to show me up? No. It went well (I think). I had some great questions and conversations after.

The last evening there was a giant mass of people who all wanted to be together, so we split up into 3 large groups. The restaurant wasn’t able to get any of us close, sadly. It was still a lovely evening. I hated saying goodnight to everyone, but I was spent.

There were some great talks throughout the week. 6 tracks going at the same time with 1 being hands-on workshops. I’m always inspired seeing what people are talking about and exploring. It was even more fun seeing my friends giving their talks. More than any of that, though, was an overwhelming sense of community. I made a lot of new friends and deepened existing friendships.

I look forward to seeing my ruby friends again – sooner than later I hope.

1 Comment

Filed under programming, ruby

Twitter Bot On Heroku

I made a twitter bot a week or two ago, and it has been a lot of fun watching people’s reactions to it. I won’t tell you what my triggers are or who the bot is because I think that’s part of the game. The bot lives entirely on it’s own. What I will tell you is how to host a Twitter bot on Heroku.

The Bot Code

You’ll use the Streaming API to search for stuff, and then the REST API to post “responses”. I followed this example.

require 'rubygems'
require 'bundler/setup'
Bundler.require

class Bot
  def self.go
    while true
      begin
        config = {
          :consumer_key        => ENV['TWITTER_CONSUMER_KEY'],
          :consumer_secret     => ENV['TWITTER_CONSUMER_SECRET'],
          :access_token        => ENV['TWITTER_ACCESS_TOKEN'],
          :access_token_secret => ENV['TWITTER_ACCESS_TOKEN_SECRET']
        }
        rClient = Twitter::REST::Client.new(config)
        sClient = Twitter::Streaming::Client.new(config)

        # topics to watch
        topics = ['hello', 'world', '#helloworld']
        sClient.filter(:track => topics.join(',')) do |tweet|
          next if tweet.text.match(/#{rClient.user.screen_name}/i)
          if tweet.is_a?(Twitter::Tweet) && !tweet.text.match(/^RT/) && tweet.lang == 'en'
            p "@#{tweet.user.screen_name}: #{tweet.text} (ID: #{tweet.id}) (#{tweet.lang})"
            rClient.update("@#{tweet.user.screen_name} Pithy response", in_reply_to_status_id: tweet.id)
          end
        end
      rescue Exception => e
        puts 'error occurred, waiting for 5 seconds' + e.class.to_s
        sleep 5
      end
    end
  end
end

I put the bot in a class, so it’s a classy bot. It also is a bot that will run FOREVER. I am also using a Gemfile and I set some environment variables for all the secret stuff. You can set those from the command line:

heroku config:set TWITTER_CONSUMER_KEY=SOSEKRET
etc…

Gemfile

The Gemfile is really simple. You just need the Twitter gem.

source "https://rubygems.org"
ruby '2.1.4'

gem "twitter"

bundle init is a quick and easy way to get a basic Gemfile that you can start adding stuff to. Those first 3 lines at the top of the script is how you require the contents of the Gemfile in a plain old ruby script. There is a big world that exists off-rails, and I recommend you check it out.

Make It Go

You now have everything that you need to make it go. You can do it in irb to see how it performs. I would recommend you comment out the line that makes the response until you’re sure it does what you want.

To run it in irb, set your environment variables, or copy the secret stuff into your script, and then run:

irb -r ./bot.rb

That’ll put you at the command prompt in irb where you can run Bot.go and watch it go.

What About Heroku?

Right, Heroku. If you push this as it is to Heroku, it won’t do anything. You need a Procfile. It looks like this:

bundle exec rake

Yes really.

Rake is awesome, and custom rake tasks are the bee’s knees. So let’s make one.

require './bot.rb'

desc "Bot!"
task :bot do
  Bot.go
end

task :default => 'bot'

There is a rake task bot, and the default rake task is set to that, so if you simply run bundle exec rake you get a running bot.

Push those 4 files to Heroku, and you’ve got yourself a twitter bot!

Leave a Comment

Filed under programming, ruby

SQL Is Your Friend, Part 4

So far we’ve talked about the JSON datatype with the first part. Postgres 9.4 brings some new features that enhance it’s JSON capabilities even more, but I haven’t had a chance to play with the new shiny yet.

In part 2 we talked about using window functions. We also made our own functions in both to help extend functionality or DRY up repetitive queries.

I went over a couple of subquery techniques in part 3. These are exciting to me because it gives me even more capabilities to make sense of data at the data level. With windowing and subquerying I can give context to the data in-place, without having to pull it down and write piles of app server code. I like that.

Today we are going to talk about a performance optimization. What do you do when you’ve got the data indexed well, but the query still doesn’t respond in a quick enough time? As a measuring stick, I give a critical eye to anything that takes more than 500 milliseconds. Once it nears a full second I start looking for options.


Materialized Views


Postgres gives you the ability to save (materialize) the results from a query. It’s not a temp table, or even a table, but you can give the data indexes and query against it the same as a table.

-- Store off the query results
CREATE MATERIALIZED VIEW readings_mv AS
SELECT *, LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
FROM readings
WHERE reading_timestamp >= NOW() - '1 day'::INTERVAL;

-- Add some indexes
CREATE INDEX index_readings_mv_on_major ON readings_mv USING btree(major ASC NULLS LAST);
CREATE INDEX index_readings_mv_on_phone ON readings_mv USING btree(phone COLLATE "default" ASC NULLS LAST);
CREATE INDEX index_readings_mv_on_reading_timestamp ON readings_mv USING btree(reading_timestamp ASC NULLS LAST);

Note the name: readings_mv. It’s a materialized view that consists of readings table data. That’s a convention that I picked up along the way to help communicate what database objects are being used. This is helping out your team and also future you.

In this particular example the query needs to run against the past 24 hours of data and do some calculations (windowing). Now we’ve taken the heavy lifting and calculating and done that in advance out of band from the request/response cycle. We can also do more efficient subqueries that rely upon this same data. You don’t have to index the materialized view, but I tend to mirror table indexes (or add new ones where it makes sense).

Check out the SQL Fiddle. Scroll all the way down in the left pane to see the creation of the materialized view.


Refresh The Data


That’s great, but I said that I wanted to query against the past 24 hours and time is a moving target.

No worries, you can refresh the data with a simple query:

REFRESH MATERIALIZED VIEW readings_mv;

Easy peasy. It blocks, though, so while the materialized view is being updated you have to wait. Or you had to wait. Postgres 9.4, which just dropped this week, added the ability to refresh a view CONCURRENTLY.

REFRESH MATERIALIZED VIEW CONCURRENTLY readings_mv;

Here is the Postgres 9.4 changelog (of sorts) and a little more explanation on the REFRESH CONCURRENTLY functionality as it was being included in trunk.

You can put this into a rake task as well:

namespace :db do
  namespace :heroku do
    desc "Load the PROD database from Heroku to the local dev database"
    task :load => :environment do
      if Rails.env == 'development'
        Bundler.with_clean_env do
          config = Rails.configuration.database_configuration[Rails.env]
          system <<-CMD
            heroku pgbackups:capture --expire
            curl -o latest.dump `heroku pgbackups:url`
            pg_restore --verbose --clean --no-acl --no-owner -h localhost \
              -U #{config["username"]} -d #{config["database"]} latest.dump
            rm -rf latest.dump
          CMD
        end
      end
    end

    # https://devcenter.heroku.com/articles/scheduler
    desc 'Update the materialized view(s)'
    task :update_materialized_view => :environment do
      sql = 'REFRESH MATERIALIZED VIEW beacon_readings_lead_lag_mv;'
      ActiveRecord::Base.connection.execute(sql)
    end
  end
end

If you use Heroku you can this up with scheduler to run every 10 minutes. I included an additional task that I find useful to populate my local database with live data.


kthxbai


These are techniques that I find myself using when an app goes from simple CRUD to having to make sense of complex data. I hope that you found this series interesting and useful. I also give you permission to occasionally use raw SQL to take advantage of the features that your database brings to bear. You chose it for a reason, so use it.

Now go make big, interesting things.

1 Comment

Filed under programming, ruby, SQL

SQL Is Your Friend, Part 3

Yo dawg, I heard you like queries. So I put a query in your query so you can query while you query.

Now we are starting to get into some really cool and powerful SQL. Subqueries! Let’s jump right in, shall we?


Subqueries For Change


I want to count up all the times something changed across some set of data. You could pull all the data down and iterate over it in your code, and maybe that runs just fine. You can also let the database do the heavy lifting and pull down just the results that you need.

Using the LEAD windowing function we can look forward one record to see what happens next, and include that in the “current” record. We saw that in yesterday’s post. That’s not quite what we need, though. Fortunately we can query against that query, or use that as a subquery for the query we actually want to run.

SELECT *
FROM (
  SELECT id,
         major,
         minor,
         LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
  FROM readings
) AS amazing_subquery
WHERE minor != next_minor;

Right, so what we have there is a nested subquery. The outer query runs against the output of the inner query. If you were to run an EXPLAIN plan against this you would see how many rows the inner query returns and at what cost. I do tend to run the query plans against any of these and then keep an eye on them as the data scales up.

See the nested subquery for yourself in this SQL Fiddle.


Subqueries For A Field


We can also pull in a specific data point with a subquery and stuff it into a row. Let’s say I just want to know the last reading ID for an event.

SELECT id,
         phone,
         major,
         minor,
         reading_timestamp,
         ( SELECT MAX(id)
           FROM readings r2
           WHERE r2.phone = r1.phone AND r2.id > r1.id
         ) AS last_id
FROM readings r1
ORDER BY 2, 5;

Sorcery! You can see that I refer to the main query in the subquery. That’s pretty handy, but you don’t have to. You could just ask Postgres to give you the max value from this or some other table.

The reading ID may not be the most useful thing, so maybe instead you pull a timestamp. If you did that then you could do math and calculate the duration, which Postgres can also do for you. You can also wrap that up in another outer query. That’s when I start looking to DRY up the base query into a function.

You can play with this some more in this SQL Fiddle


Common Table Expression (CTE)


You can see where all this subquerying can maybe get out of hand. I’ll confess that I have stopped myself when I’ve nested a few queries deep and asked if maybe there is a better way. I try to avoid Russian Dolls when I program. You can also JOIN on a subquery, which is powerful when you need it and also pretty ugly.

Enter the Common Table Expression, or CTE. This is also known as a WITH query. Let’s take a look at that first query and see how that would look in CTE.

WITH amazing_subquery AS (
  SELECT id,
         major,
         minor,
         LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
  FROM readings
)
SELECT *
FROM amazing_subquery
WHERE minor != next_minor;

Here is the SQL Fiddle. This is a pretty simplified example, so maybe that doesn’t so much knock your socks off.

Let’s take this up to 11, shall we?

WITH interval_query AS (
  -- Start by making a series of 12 time intervals
  SELECT (ts ||' hour')::INTERVAL AS hour_interval
  FROM generate_series(0,11) AS ts
), time_series AS (
  -- Then take that interval and convert it into a TIMESTAMP
  SELECT date_trunc('hour', now()) + INTERVAL '60 min' * ROUND(date_part('minute', now()) / 60.0) - interval_query.hour_interval AS start_time
  FROM interval_query
), time_intervals AS (
  -- Then take that TIMESTAMP and add an hour to it to make an hour timebox
  SELECT start_time, start_time + '1 hour'::INTERVAL AS end_time
  FROM time_series
  ORDER BY start_time
)
-- RIGHT JOIN on those times s.t. we get all the times even when data is missing
SELECT f.start_time, f.end_time, r1.minor, count(DISTINCT r1.phone) readings
FROM readings r1
RIGHT JOIN time_intervals f
  ON r1.reading_timestamp >= f.start_time AND r1.reading_timestamp < f.end_time
GROUP BY f.start_time, f.end_time, r1.minor
ORDER BY f.start_time, r1.minor

BOOM. Time series data with counts. Here is that SQL Fiddle.


55502354


Now it is entirely possible that there are better ways to do this. I am no DBA. I’m just a simple caveman programmer who likes to take data out for a spin and see what it can do for me.

When I had projects in Sybase and SQL Server I would build these complex queries up over a series of temp tables. You can do that with Postgres, too. I just tend to prefer CTE and subqueries. However, there is another technique that Postgres brings to the table that I do use from time to time. Tomorrow.

1 Comment

Filed under programming, ruby, SQL

SQL Is Your Friend, Part 2

Yesterday I began to make the case that it’s OK to skirt your ORM and write some raw SQL every once in a while.

I’ll be honest with you. I have no idea how well the ActiveRecord postgres adapter handles the JSON datatype. I mentioned in that post that I always store entire JSON payloads. I also never select the entire column. I always pick what I want out, like I showed in the post.

Today, we have a brand spanking new version of Postgres! Version 9.4 shipped today, and it has an update that I had already planned to talk about. But we aren’t quite ready to talk about it yet.

I’m working through this blog series the same way I work through a refactoring process:

  • I have some data, and I want to query it.
  • Huh. That’s a giant pile of SQL. How can I clean that up?
  • Maybe there is a more performant way to access this gnarly data?

So today we will be in the first 2 bullets.


Windowing Functions


You have a lot of data. You can take that data and fold it on itself to see how one record is related to or changes from another.

Let’s say you want to look and see when someone has moved from one place to another. You can take a row and fold in data from the next row to make that comparison by using the LEAD windowing function. Like so:

SELECT major,
       minor,
       LEAD(minor) OVER (ORDER BY reading_timestamp) AS next_minor
FROM readings
ORDER BY reading_timestamp;

See the query in action with this SQL Fiddle

So what’s happening here?

We’ve asked for the next record — the LEAD. We use the OVER keyword to tell Postgres how to construct the window. This is how we fold the data. So I’ve said that I want the data sorted by the reading timestamps, and given that same sort I want to pull in the value for the next record’s minor column.


Repetition Is Lame


Now imagine that query is the basis for a couple of different queries. You can store that query in the database in a function. You saw how to create a function in the previous post. This is something that I actually do every once in a while.

I will, at this point, take a step back and say that I am not advocating that you stuff all of your business logic in the database. I have fought in that stored procedure holy war on the side of keeping business logic in the application that really needs to know about it.

I see this in a different light, though. This is a code optimization. I’m in control of the code and the logic, and I’ve decided where it makes the most sense for what code to run. Yes, the SQL in the function is compiled and optimized and whatever in Postgres, but I don’t honestly know how important that is. This is just me DRYing up redundant queries and making joins less hairy.

CREATE OR REPLACE FUNCTION readings_with_lead()
RETURNS TABLE (
  id INT, phone TEXT, major INT, minor INT,
  reading_timestamp TIMESTAMP,
  prev_beacon_minor INT, next_beacon_minor INT
)
LANGUAGE sql
AS $$
SELECT
  id,
  phone,
  major,
  minor,
  reading_timestamp,
  lag(minor) OVER (ORDER BY phone, reading_timestamp) AS prev_beacon_minor,
  lead(minor) OVER (ORDER BY phone, reading_timestamp) AS next_beacon_minor
FROM readings
ORDER BY phone, reading_timestamp;$$

This gives us a nice way to call on an enriched version of our data. It quacks like a table, too, so you can join on it and do anything else you would do with a normal table.

SELECT *
FROM readings_with_lead()
WHERE next_beacon_minor IS NOT NULL;

See the function in action with this SQL Fiddle


Leveling Up Your Windowing


Now I want to take the data and start to piece together sequences of events. What I really want to do is put together specific events in sequence. And I can totally do that with a windowing function:

SELECT phone,
       major,
       minor,
       reading_timestamp,
       FIRST_VALUE(reading_timestamp) OVER (PARTITION BY phone, major ORDER BY phone, reading_timestamp)
FROM readings
ORDER BY 1, 4;

I’ve asked Postgres to give me the first reading_timestamp value across every row in a window. That window is defined as a grouping of like phone numbers and major values, and those partitions are also sorted.

See it in action with this SQL Fiddle

Once you have this information you can start looking at all sorts of interesting things. How long was an event? What happened in that event?

We will look at some more techniques that make this even more interesting soon. Next up: subqueries.

8 Comments

Filed under programming, ruby, SQL

SQL Is Your Friend, Part 1

This is the kickoff of a series of posts on using Postgres and SQL to do some interesting things that you may not have known were possible. If you typically use an ORM, like ActiveRecord, you probably do not write a lot of raw SQL. I like Rails, and ActiveRecord has it’s place. This is not a rant where I shake my fist in the air and tell ActiveRecord to get off my lawn.

Instead I want to highlight a handful of techniques that I find really useful, and also perhaps suggest that it’s OK to venture off the golden path and do a little raw SQL where it makes sense. The fact of the matter is that you cannot simply switch databases and have the ORM just handle it for you. At least not once your application starts to get interesting. So use the right tool for the job, and I hope to highlight a few tools that Postgres brings to the table.

TL;DR: SQL is your friend.


JSON Datatype


You may already be familiar with the HSTORE datatype that came in Postgres 9.0. It’s nice, but does not handle nesting. In Postgres 9.2 we got something even awesomer: the JSON datatype. It’s pretty spectacular, and I have used it to store entire raw payloads now anytime I consume resources.  You can also use this datatype in ActiveRecord migrations without requiring any additional gems.

What makes the JSON datatype especially nice is that you can reach into the data and pull out specific elements. You can also create custom functions and even operators to do even more powerful things with the data in the JSON column.

Here is an example that takes a column called “beacons” which contains a list of bluetooth beacons that a phone has ranged. We want to look into that list and pull out the minor number from the closest beacon, and the accuracy from the next closest beacon. We also cast values from the JSON datatype to a more meaningful datatype. Yes, the cast to TEXT and then to the actual datatype is a little awkward, but it works.

SELECT beacons->0,
  json_array_length(beacons),
  json_extract_path(beacons->0, 'minor')::TEXT::INT AS minor2,
  json_extract_path(beacons->1, 'accuracy')::TEXT::FLOAT AS accuracy2
FROM readings;

See this code in action with this SQL Fiddle


Custom Function and Operator


There are a lot of good functions in Postgres to handle the JSON datatype, but let’s say that I want to do something that’s not built in. I want to look at ALL of the values for a given key.

Take a look at this discussion. You can create a function in Postgres, and even create a new operator for that function:

CREATE OR REPLACE FUNCTION json_pluck(j json, field TEXT)
RETURNS TEXT[]
LANGUAGE sql
AS
$$ SELECT array_agg(j->>$2) FROM json_array_elements($1) j $$;

DROP OPERATOR IF EXISTS | (json, text);
CREATE OPERATOR | (procedure = json_pluck, leftarg = 'json', rightarg = 'text');

Wait. What even is that?! You create a function called json_pluck. Then you create a custom operator | that allows you to invoke json_pluck by simply calling the pipe character and telling it for what key to return the values.

Like so:

SELECT beacons->0,
  json_array_length(beacons),
  json_extract_path(beacons->0, 'minor')::TEXT::INT AS minor2,
  json_extract_path(beacons->1, 'accuracy')::TEXT::FLOAT AS accuracy2,
  beacons | 'minor' AS minors
FROM readings;

You’ll note that I use CREATE OR REPLACE wherever possible. You can’t do that with an operator, though, so I preface that with a guard to drop it if it already exists, and then create it.

See this code in action with this SQL Fiddle

Yeah. That’s pretty cool, no? You can drop all of those into ActiveRecord migrations, too:

class CreateFunctionJsonPluck < ActiveRecord::Migration
  def up
    sql = <<-SQL
      CREATE OR REPLACE FUNCTION json_pluck(j json, field TEXT)
      RETURNS TEXT[]
      LANGUAGE sql
      AS
      $$ SELECT array_agg(j->>$2) FROM json_array_elements($1) j $$;
    SQL
    ActiveRecord::Base.connection.execute(sql)

    sql = <<-SQL
      DROP OPERATOR IF EXISTS | (json, text);
      CREATE OPERATOR | (procedure = json_pluck, leftarg = 'json', rightarg = 'text');
    SQL
    ActiveRecord::Base.connection.execute(sql)
  end

  def down
    sql = "DROP OPERATOR IF EXISTS | (json, text);"
    ActiveRecord::Base.connection.execute(sql)
    sql = "DROP FUNCTION IF EXISTS json_pluck(j json, field TEXT);"
    ActiveRecord::Base.connection.execute(sql)
  end
end

So that’s the JSON datatype. In this series plan to discuss window functions, subqueries, WITH queries (Common Table Expression), and materialized views.

8 Comments

Filed under programming, ruby, SQL

Remembering Ruby

I’ve seen some discussion over the years on how to onboard new Ruby developers.  What about re-onboarding “old” developers though?

I’ve been a Ruby developer for several years.  I remember the bump to Rails 1.0.  We used to have to hand-crank our app servers while walking uphill.  You kids don’t know how good you have it.  I digress while I shake my cane.
 
As I was saying, I’ve been in the community for a long time, but I have also been away from ruby for several months on other languages.  I consider Ruby my “home” language, though, and the Ruby community is a special community that I enjoy feeling a connection with.
 
So how do I reorient myself?  How do I stay connected with the language and the community that I am not always directly involved in?
  • Twitter
  • Local Ruby Brigade
  • Practice
  • Playing Around
Twitter
 
I follow a lot of developers on Twitter, as one does.  Sometimes I binge add people, and sometimes I cull the list.  I love the community, but I don’t love the drama.  The drama sometimes eats at me, actually.  The things that we get worked up about also eat at me, but sometimes we get a little caught up in the swirl.
 
I like seeing the evolution of the language and ideas.  I like seeing other languages come into play, too.  I don’t think Ruby is necessarily the best language solution for all problems, and I believe that it’s important to have a varied language toolkit at your disposal.
 
Local Ruby Brigade
 
User groups are a fantastic resource.  The Dallas Ruby Brigade has always been very open and nurturing in my experience.  There are monthly meetings where people give talks, and I’ve given a few over the years.  There are also weekly meetups where we code and socialize.  We also have an active mailing list where we discuss current events, upcoming meetings, and also ask and answer questions.
 
I felt safe as an inexperienced developer, and I am tremendously grateful.  A local user group is a really great place to keep your chops up and get experience talking in a group setting (whether in the front or in the audience).
 
After a couple of years I felt like it was my turn to “give back” and get more involved, which has been very fulfilling.
 
Practice
 
Certainly you can practice the wrong things, but I find practice very valuable.  Repetition helps me learn and cements ideas for me.  I also learn by doing.  I need a concept to have a practical application or else I just will not see it.  For the past several months I have done Exercism exercises in several languages.  For a while I was doing Ruby backend services, so I did Javascript exercises to keep up those chops.  Then we started doing a lot of native apps, so I focused on Ruby exercises.  I’ve also looked at a few katas here and there.  Sometimes they’re really hard and I don’t get them.  Sometimes they’re too easy and I don’t get much value out of those either.  Usually they’re just right though.
 
Scratch An Itch (Play Around)
 
I’ll occasionally try to cook up a little project to work on.  I can’t tell you how many times I’ve redone my blog.  Not each of those has seen the light of day, and that’s ok.  It’s more about the process.  I usually also have a new data model in mind, which means that the data needs to be transformed.  I love data, and this is a really fun exercise for me.  ActiveRecord is great, but it can also be pretty limiting.  Sometimes raw SQL really is the best way to express what you need to do.
 
Other
 
I also try to make it to at least one Ruby conference a year, as well as submit talks on various things.  Thinking about what I might be able to shed light on or what things of value I might be able to share helps me take a critical look at my own growth and evolution.
 
Addendum
 
I realized as I was walking around thinking about doing something in Xcode that this isn’t just Ruby for me.  I bounce around languages a lot based on the projects I work on.  One thing that is unique to Ruby, though, is the community.

Leave a Comment

Filed under programming, ruby