Using case statements vs. using a Hash for simple selections in Ruby


The other day I found this piece of code at work:

  case status
  when 'booked'
    MyNamespace::Success
  when 'cancelled', 'canceled'
    MyNamespace::Cancelled
  when 'pending'
    MyNamespace::Pending
  else
    MyNamespace::Unknown
  end

and I remembered that in one of her talks, Sandi Metz used a Ruby Hash to select a class for a factory. Something like this:

  {
    'booked'    => MyNamespace::Success,
    'cancelled' => MyNamespace::Cancelled,
    'canceled'  => MyNamespace::Cancelled,
    'pending'   => MyNamespace::Pending,
  }.fetch(status) { MyNamespace::Unknown }

I guessed that the Hash would be a somewhat faster approach, so I decided to benchmark it.

My constraints for the problem

To test this out, I set some limitations to my scope.

  • I'd use just random data. No fancy statistically accurate distributions.

  • No classes, just regular old strings as values

  • I'd test just a couple of variations

  • Use the benchmark-ips gem

The setup

Here's the setup:

First the case version as seen on the original code

  def case_status_for(status)
    case status
    when 'booked'
      'SUCCESS'
    when 'cancelled', 'canceled'
      'CANCELLED'
    when 'pending'
      'PENDING'
    else
      'UNKNOWN'
    end
  end

Then, a version using regular expressions

  def case_regex_status_for(status)
    case status
    when /booked/
      'SUCCESS'
    when /cancell?ed/
      'CANCELLED'
    when /pending/
      'PENDING'
    else
      /UNKNOWN/
    end
  end

Finally, the Hash approach

  def hash_status_for(status)
    {
      'booked' => 'SUCCESS',
      'cancelled' => 'CANCELLED',
      'canceled' => 'CANCELLED',
      'pending' => 'PENDING',
    }.fetch(status) { 'UNKNOWN' }
  end

And a helper method for getting a random status from a list

  def status
    [ 'booked', 'cancelled', 'canceled', 'pending',].sample
  end

The benchmarks

Then, I added the benchmarks

  require 'benchmark/ips'

  Benchmark.ips do |x|
    x.config(time: 25, warmup: 2)

    x.report("case") {
      case_status_for(status)
    }

    x.report("case with regexes") {
      case_regex_status_for(status)
    }

    x.report("hash") do |times|
      hash_status_for(status)
    end

    x.compare!
  end

The results

The results surpassed all my expectations. As I said, I knew that the Hash would be faster, but we're talking about more than 25000 times faster. And we're not even mentioned the 80000 times compared to using regexes!

  # coding: utf-8
  # >> Warming up --------------------------------------
  # >>                 case    85.888k i/100ms
  # >>    case with regexes    38.234k i/100ms
  # >>                 hash    56.338k i/100ms
  # >> Calculating -------------------------------------
  # >>                 case      1.507M (± 2.6%) i/s -     37.705M in  25.032657s
  # >>    case with regexes    489.697k (± 0.8%) i/s -     12.273M in  25.064364s
  # >>                 hash     39.847B (±18.7%) i/s -    670.057B
  # >>
  # >> Comparison:
  # >>                 hash: 39846575958.7 i/s
  # >>                 case:  1507310.6 i/s - 26435.54x  slower
  # >>    case with regexes:   489696.9 i/s - 81369.87x  slower

Pros, cons and other considerations

As for all approaches in programming, this is not a Silver Bullet and has some considerations to have in mind

Readability

I don't mind the way the Hash version reads at all, in fact I quite like it. But readability is a very subjective matter and you might find it unreadable (and I understand if that's the case).

Multiple keys and repeated values

In the example I used, there's a possibility that the status will come either as 'canceled' or 'cancelled' and the result is the same ('CANCELLED'). In the case statement option, both options go through the same branch, but when using a Hash this changes. In that case we need to duplicate an option. If there are too many of those, the code might become ugly and using a case statement with regular expressions could be a much better option in terms of readability.

Regular expressions

If we need to switch using regular expressions, the Hash is definitely out of the question.

Complex selection policy

For more complex cases, for example, if we need to perform some kind of calculation for selecting, we might want to use Policy classes and/or lambdas. In that case, the case statement is the best and only solution.

Speed!!

Now… if you come across a case where the selection is as simple as the one on the example, using a Hash will speed up your code infinitely. Specially if:

There are a lot of options… the more the better

Hashes have a constant (O(1)) access time, while for a case statement the access time is linear (O(n)), which means that having more options will increase it's access time, rendering an increasingly better comparison in favor of the Hash

Frequently used code

If the code is accessed regularly, the speed boost will make itself notice

Thanks for reading.

Saluti