The other day I found this piece of code at work:

case status
when 'booked'
  MyNamespace::Success
when 'cancelled', 'canceled'
  MyNamespace::Cancelled
when 'pending'
  MyNamespace::Pending
else
  MyNamespace::Unknown
end

and I remembered that in one of her talks, Sandi Metz used a Ruby Hash to select a class for a factory. Something like this:

{
  'booked'    => MyNamespace::Success,
  'cancelled' => MyNamespace::Cancelled,
  'canceled'  => MyNamespace::Cancelled,
  'pending'   => MyNamespace::Pending,
}.fetch(status) { MyNamespace::Unknown }

I guessed that the Hash would be a somewhat faster approach, so I decided to benchmark it.

My constraints for the problem

To test this out, I set some limitations to my scope.

  • I’d use just random data. No fancy statistically accurate distributions.
  • No classes, just regular old strings as values
  • I’d test just a couple of variations
  • Use the benchmark-ips gem

The setup

Here’s the setup:

First the case version as seen on the original code

def case_status_for(status)
  case status
  when 'booked'
    'SUCCESS'
  when 'cancelled', 'canceled'
    'CANCELLED'
  when 'pending'
    'PENDING'
  else
    'UNKNOWN'
  end
end

Then, a version using regular expressions

def case_regex_status_for(status)
  case status
  when /booked/
    'SUCCESS'
  when /cancell?ed/
    'CANCELLED'
  when /pending/
    'PENDING'
  else
    /UNKNOWN/
  end
end

Finally, the Hash approach

def hash_status_for(status)
  {
    'booked' => 'SUCCESS',
    'cancelled' => 'CANCELLED',
    'canceled' => 'CANCELLED',
    'pending' => 'PENDING',
  }.fetch(status) { 'UNKNOWN' }
end

And a helper method for getting a random status from a list

def status
  [ 'booked', 'cancelled', 'canceled', 'pending',].sample
end

The benchmarks

Then, I added the benchmarks

require 'benchmark/ips'

Benchmark.ips do |x|
  x.config(time: 25, warmup: 2)

  x.report("case") {
    case_status_for(status)
  }

  x.report("case with regexes") {
    case_regex_status_for(status)
  }

  x.report("hash") do |times|
    hash_status_for(status)
  end

  x.compare!
end

The results

The results surpassed all my expectations. As I said, I knew that the Hash would be faster, but we’re talking about more than 25000 times faster. And we’re not even mentioned the 80000 times compared to using regexes!

# coding: utf-8
# >> Warming up --------------------------------------
# >>                 case    85.888k i/100ms
# >>    case with regexes    38.234k i/100ms
# >>                 hash    56.338k i/100ms
# >> Calculating -------------------------------------
# >>                 case      1.507M (± 2.6%) i/s -     37.705M in  25.032657s
# >>    case with regexes    489.697k (± 0.8%) i/s -     12.273M in  25.064364s
# >>                 hash     39.847B (±18.7%) i/s -    670.057B
# >>
# >> Comparison:
# >>                 hash: 39846575958.7 i/s
# >>                 case:  1507310.6 i/s - 26435.54x  slower
# >>    case with regexes:   489696.9 i/s - 81369.87x  slower

Pros, cons and other considerations

As for all approaches in programming, this is not a Silver Bullet and has some considerations to have in mind

Readability

I don’t mind the way the Hash version reads at all, in fact I quite like it. But readability is a very subjective matter and you might find it unreadable (and I understand if that’s the case).

Multiple keys and repeated values

In the example I used, there’s a possibility that the status will come either as =’canceled’= or =’cancelled’= and the result is the same (=’CANCELLED’=). In the case statement option, both options go through the same branch, but when using a Hash this changes. In that case we need to duplicate an option. If there are too many of those, the code might become ugly and using a case statement with regular expressions could be a much better option in terms of readability.

Regular expressions

If we need to switch using regular expressions, the Hash is definitely out of the question.

Complex selection policy

For more complex cases, for example, if we need to perform some kind of calculation for selecting, we might want to use Policy classes and/or lambdas. In that case, the case statement is the best and only solution.

Speed!!

Now… if you come across a case where the selection is as simple as the one on the example, using a Hash will speed up your code infinitely. Specially if:

There are a lot of options… the more the better
Hashes have a constant (O(1)) access time, while for a case statement the access time is linear (O(n)), which means that having more options will increase it’s access time, rendering an increasingly better comparison in favor of the Hash
Frequently used code
If the code is accessed regularly, the speed boost will make itself notice

Thanks for reading.

Saluti