Friday, March 30, 2012
Some fun with ruby 1.9(.3) and string encoding..
Okay, I should probably start by directing you here: http://blog.grayproductions.net/articles/understanding_m17n If you wanna really get dirty in character encoding in ruby, read up (by the way, i think m17n = multilingualization or somesuch).
Anyway, so I absorbed some percentage of that, but was a little surprised to see some of the default behavior of ruby 1.9, in particular what happens when you do some string concatenation/interpolation with mixed ASCII and UTF-8 encoded strings. Surprisingly, if you combine two such strings, it will sometimes result in an ASCII-encoded string, sometimes UTF-8-encoded string, depending on whether there are multibyte chars or not in the UTF-8 substring!:
As a result, at Goodreads, we had to do some monkey-patching as we were getting some US-ASCII strings back from some rails helper code (
Note: we also started marking a bunch of our code files with the magic comment (more here) because that seems to be the most effective way to force ruby to default new strings to UTF-8 encoding (there are a few other options, but this has been easiest and most effective). Being an emacser, I tend to propagate the
Anyway, so I absorbed some percentage of that, but was a little surprised to see some of the default behavior of ruby 1.9, in particular what happens when you do some string concatenation/interpolation with mixed ASCII and UTF-8 encoded strings. Surprisingly, if you combine two such strings, it will sometimes result in an ASCII-encoded string, sometimes UTF-8-encoded string, depending on whether there are multibyte chars or not in the UTF-8 substring!:
> irb
1.9.3p125 :001 > foo = "foo"
=> "foo"
1.9.3p125 :002 > bar = "bar"
=> "bar"
1.9.3p125 :003 > baz = "báz"
=> "báz"
1.9.3p125 :004 > foo.encoding.name
=> "UTF-8"
1.9.3p125 :005 > bar.encoding.name
=> "UTF-8"
1.9.3p125 :006 > baz.encoding.name
=> "UTF-8"
1.9.3p125 :007 > foobar1 = "#{foo.force_encoding(Encoding::US_ASCII)}#{bar}#{bar}"
=> "foobarbar"
1.9.3p125 :008 > foobar2 = "#{foo.force_encoding(Encoding::US_ASCII)}#{bar}#{baz}"
=> "foobarbáz"
1.9.3p125 :009 > foobar1.encoding.name
=> "US-ASCII"
1.9.3p125 :010 > foobar2.encoding.name
=> "UTF-8"
As a result, at Goodreads, we had to do some monkey-patching as we were getting some US-ASCII strings back from some rails helper code (
pluralize()
, number_with_delimiter()
) as well as some ruby built-in classes (to_s()
from NilClass
, Float
, Fixnum
, Array
). There must be a better way, but we've now got this force-utf8 monkey patch file with stuff like this:
module ActionView
module Helpers
module NumberHelper
def number_with_delimiter_with_force_utf8(*args)
number_with_delimiter_without_force_utf8(*args).force_encoding(Encoding::UTF_8)
end
alias_method_chain :number_with_delimiter, :force_utf8
end
end
end
# bunch of to_s that need fixing...maybe see if there's a [Class1, Class2].each way of
# doing this that's a little DRYer...
class Array
def join_with_force_utf8(*args)
join_without_force_utf8(*args).force_encoding(Encoding::UTF_8)
end
alias_method_chain :join, :force_utf8
end
class Fixnum
def to_s_with_force_utf8(*args)
to_s_without_force_utf8(*args).force_encoding(Encoding::UTF_8)
end
alias_method_chain :to_s, :force_utf8
end
class Float
def to_s_with_force_utf8(*args)
to_s_without_force_utf8(*args).force_encoding(Encoding::UTF_8)
end
alias_method_chain :to_s, :force_utf8
end
class NilClass
def to_s_with_force_utf8(*args)
to_s_without_force_utf8(*args).force_encoding(Encoding::UTF_8)
end
alias_method_chain :to_s, :force_utf8
end
Note: we also started marking a bunch of our code files with the magic comment (more here) because that seems to be the most effective way to force ruby to default new strings to UTF-8 encoding (there are a few other options, but this has been easiest and most effective). Being an emacser, I tend to propagate the
# -*- coding: utf-8 -*-
form...Tuesday, February 21, 2012
How to test code in your ActiveRecord after_commit callbacks disabling transactional fixtures per-test without hiding bugs
Okay, so at Goodreads we're finally moving to rails 3.2, and in the process we've discovered the cool
So we had reason to try converting some of our
Okay, but what about testing? By default, "transactional fixtures" are enabled. I'm not really sure why that's the terminology, as we use this behavior, but I really, really hate testing with fixtures. In fact, I just spent about 18 hours straight yesterday ripping a ton of them out of our codebase to get this all working. But I digress.
The problem is that if we want our after_commit callbacks to fire, they can't be wrapped by transactions around the entire test, because the commit never happens (the transaction gets rolled back after the test runs, pass or fail). So we had to find another way. The only options we could come up with were:
As with any good blog post, we chose door number 3. Reasoning:
So we're using rails 3.2.1, hopefully this is stable across several revisions, I'd hate to have to chase this down again. :( But here's the key:
So two things:
And here's the code I wrote. I created a module that we just monkey-patch/mix in to
gist here
A few notes:
You just need to call a single method at the top of your test (or a setup method if you want to apply it to a all tests in a suite--again, here's a place I prefer rspec, as you can effectively have a separate setup method for an arbitrary grouping of tests within a given test suite, but meh):
---
Goodreads is hiring! Please check us out and make the world a better place for readers!
after_commit
and after_rollback
callbacks that easily let you execute things outside the transaction that wraps an active record save/destroy. This was exactly what we needed for a few cases where we were doing non-mission-critical updates in callbacks that can on occasion take a bit of time (hitting memcached or redis servers when our resque queue was overwhelmed) inside transactions.So we had reason to try converting some of our
after_(save|update|destroy|create)
callbacks to after_commit calls. The easy step is figuring out how to deal with code that used to do name_changed? or name_was helpers (they retain their expected behavior only within the transaction; once the commit takes place they all get reset). So all we did was keep some after_xxx
methods around that set instance variables, then let the after_commit
methods read those instance variables (then reset to some disabled state lest we re-process the same code again if models get saved repeatedly for some reason).Okay, but what about testing? By default, "transactional fixtures" are enabled. I'm not really sure why that's the terminology, as we use this behavior, but I really, really hate testing with fixtures. In fact, I just spent about 18 hours straight yesterday ripping a ton of them out of our codebase to get this all working. But I digress.
The problem is that if we want our after_commit callbacks to fire, they can't be wrapped by transactions around the entire test, because the commit never happens (the transaction gets rolled back after the test runs, pass or fail). So we had to find another way. The only options we could come up with were:
- Hack the callbacks to call the commit callbacks without actually committing the records (a la something like this)
- Disable all transactional fixtures and handle the data cleanup ourselves (set
use_transactional_fixtures = false
in yourTestCase
class) - hack
Test::Unit
to not wrap (or unwrap) the tests in transactions
As with any good blog post, we chose door number 3. Reasoning:
- The first option leaves us wide open to bugs, especially of the variety where we might depend on name_changed?-type logic, which may still return true if the transaction hasn't yet been committed yet...which would result in tests passing even if the code would fail in production (!). This was too much of a risk for me to stomach
- The second option sounds like a lot of work, a pain every time you add a new model/table, and potentially slow if we're deleting data from every table after each test (could do "
TRUNCATE TABLE foo
" to be fairly fast, but it seemed to be about 0.5 sec each time we did this for our schema. With thousands of tests, it starts to add up. We're also kind of sick of having "data leakage" between tests, and this approach seems to encourage it. One note: I prefer rspec syntax, but we're kinda stuck with Test::Unit for historical reasons...I'm not sure if rspec's hierarchical structure would allow enabling/disabling transactional fixtures at a more granular level? I guess I think I remember having to do it at the class-level there too (all tests are either transactional or not, not determined at the individual test level). - So option three: find a way to hack Test::Unit to not wrap the tests in transactions, allowing us to select on a per-test basis whether to wrap in a transaction or not. I actually never dug deep enough to find where that code was...and considered monkey-patching or subclassing to maintain a separate queue of tests that are to be run outside transactions. In the end I took a bit of a shortcut and actually rollback the transactions at the beginning of the test. Yeah, the first thing we do is rollback the transaction, turn off transactional fixtures (so activerecord doesn't try to roll things back), run test, then restore everything the way it was (minus the transaction....it appears ActiveRecord plays well and doesn't try to rollback transactions that aren't there. Bam, done!
So we're using rails 3.2.1, hopefully this is stable across several revisions, I'd hate to have to chase this down again. :( But here's the key:
# activerecord-3.2.1/lib/active_record/fixtures.rb:
...
module ActiveRecord
module TestFixtures
...
def run_in_transaction?
use_transactional_fixtures &&
!self.class.uses_transaction?(method_name)
end
...
def teardown_fixtures
return unless defined?(ActiveRecord) && !ActiveRecord::Base.configurations.blank?
unless run_in_transaction?
ActiveRecord::Fixtures.reset_cache
end
# Rollback changes if a transaction is active.
if run_in_transaction?
@fixture_connections.each do |connection|
if connection.open_transactions != 0
connection.rollback_db_transaction
connection.decrement_open_transactions
end
end
@fixture_connections.clear
end
ActiveRecord::Base.clear_active_connections!
end
...
end
end
So two things:
- run_in_transaction?: by setting use_transactional_fixtures = false, we can force this method to return false
- so in teardown_fixtures, it won't bother trying to do any rollback at all...so we can "safely" roll back the transaction before even beginning our test
And here's the code I wrote. I created a module that we just monkey-patch/mix in to
ActiveSupport::TestCase
. I aimed for a one-liner to deactivate fixtures (and not require an explicit cleanup call at the end, cause someone's gonna forget).gist here
A few notes:
setup_fixtures
is a complement to teardown_fixtures, in ActiveRecord::TestFixtures. I'm just calling it here to restore any fixture data that tests actually needdelete_everything
is a method specific to our code that knows which tables/models to delete. There are a few options for this...maybe just do a "SHOW TABLES" query and delete what you've got (except maybe schema_migrations ;) ). Maybe query all descendant classes of ActiveRecord::Base. We chose to maintain a list of models and tables so we can be a little more selective of which tables we clear out (to save a little processing time). It'll mean a little more maintenance, but we're a little more stable in terms of our schema, so saving a few minutes on running our full set of tests is probably worth it.
You just need to call a single method at the top of your test (or a setup method if you want to apply it to a all tests in a suite--again, here's a place I prefer rspec, as you can effectively have a separate setup method for an arbitrary grouping of tests within a given test suite, but meh):
test "some_method is supposed to do something interesting" do
disable_transactional_fixtures # optionally pass in args telling specific tables/models to delete
# do your tests...
# everything gets deleted on magically on its own! :)
end
---
Goodreads is hiring! Please check us out and make the world a better place for readers!
Labels: fixtures, rails 3.2, testing, transactions
Sunday, May 09, 2010
mailx, msmtp and gmail
So we wanted to send automated emails out from our ubuntu server via gmail. We found a recipe and all was fine...till google changed their certificate. We had sort of blindly followed directions, which included downloading a single CA certificate, and pointing msmtp to that one cert.
(From the comment below, I'm guessing our recipe came from http://philogroky.blogspot.com/2009/08/fixing-msmtp-to-send-mails-via-gmail.html, so you may want to start there to see the early steps for mailx and msmtp...though if you're on ubuntu, think you'll be set with sudo apt-get install msmtp bsd-mailx (or some other flavor of mailx)...
We eventually caught on and realized that was silly when we had a whole slew of CA certificates that we already trusted on the server. So instead, point msmtp to that!
So here's our .msmtp file and we've been pretty happy ever since!:
(From the comment below, I'm guessing our recipe came from http://philogroky.blogspot.com/2009/08/fixing-msmtp-to-send-mails-via-gmail.html, so you may want to start there to see the early steps for mailx and msmtp...though if you're on ubuntu, think you'll be set with sudo apt-get install msmtp bsd-mailx (or some other flavor of mailx)...
We eventually caught on and realized that was silly when we had a whole slew of CA certificates that we already trusted on the server. So instead, point msmtp to that!
So here's our .msmtp file and we've been pretty happy ever since!:
account gmail
auth on
host smtp.gmail.com
port 587
user ouraccount@somedomain.com
password somepassword
from ouraccount@somedomain.com
tls on
tls_starttls on
# tls_trust_file argument is the full path to the certificate
# changed to this as suggested on
# http://philogroky.blogspot.com/2009/08/fixing-msmtp-to-send-mails-via-gmail.html
# hopefully never have to update this stupid thing again...
tls_trust_file /etc/ssl/certs/ca-certificates.crt
maildomain gmail.com
account default : gmail
Friday, April 16, 2010
Getting cassandra up to hit with ruby and C++ clients
So at Discovereads we're giving cassandra a spin on the dance floor to see how she moves. We also need to connect via ruby and C++ clients (most of our writing will come from C++ and mostly reading (though some writing as well) from the ruby).
Seems everyone has a blog post like this one where he or she says "it took forever, kept looking at other sites and none of them just worked, so hopefully I'll save someone else some time and put the steps I took here". Well, this is mine. Pretty skeptical it'll actually help anyone else, but I hope it does!
I got a lot of help (on the ruby side) from http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
And after accomplishing hitting from the ruby client I moved on to trying to hit with the libcassandra lib from posulliv: http://posulliv.github.com/2010/02/22/cpp-cassandra.html , http://github.com/posulliv
First, get cassandra (the ruby way), following evan weaver's instructions (link above)
Now, to get the C++ lib working, there were a few unsatisfied dependencies:
- boost (I used 1.4.2.0): http://sourceforge.net/projects/boost/files/boost/1.42.0/
- thrift (I used thrift 0.2.0): http://incubator.apache.org/thrift/download/
Cassandra C++ lib
get boost:
Seems everyone has a blog post like this one where he or she says "it took forever, kept looking at other sites and none of them just worked, so hopefully I'll save someone else some time and put the steps I took here". Well, this is mine. Pretty skeptical it'll actually help anyone else, but I hope it does!
I got a lot of help (on the ruby side) from http://blog.evanweaver.com/articles/2009/07/06/up-and-running-with-cassandra/
And after accomplishing hitting from the ruby client I moved on to trying to hit with the libcassandra lib from posulliv: http://posulliv.github.com/2010/02/22/cpp-cassandra.html , http://github.com/posulliv
First, get cassandra (the ruby way), following evan weaver's instructions (link above)
Now, to get the C++ lib working, there were a few unsatisfied dependencies:
- boost (I used 1.4.2.0): http://sourceforge.net/projects/boost/files/boost/1.42.0/
- thrift (I used thrift 0.2.0): http://incubator.apache.org/thrift/download/
Cassandra C++ lib
get boost:
- tar xzf boost_1_42_0.tar.gz
- cd boost_1_42_0
- ./bootstrap.sh
- ./bjam
- sudo ./bjam install --prefix=/usr/local (i think i'm being redundant with the /usr/local but oh well)
- tar xzf thrift-0.2.0-incubating.tar.gz
- cd thrift-0.2.0
- (note: i was getting desperate when encountering errors so installed libevent (sudo port libevent) in here, but i don't think it was necessary for thrift to install. so skip this and see if everything still works....)
- quick check of autoconf version:
- autoconf -V (note the capital "V")
- if version is 2.61 you're fine otherwise, see http://wiki.apache.org/thrift/ThriftInstallationMacOSX and try to figure something out
- ./configure
- make
- sudo make install
- git clone git://github.com/posulliv/libcassandra.git
- cd libcassandra
- ./config/autorun.sh
- ./configure
- make
- It was complaining about about variodic macros and C99, so i went into the Makefile and removed the two instances of -Werror. yes, total hack, but it did compile.
- sudo make install
Wednesday, August 20, 2008
boo!