Saturday, January 21, 2012

Shortcomings of aliased field or attribute names in Mongoid - Part 1

NOTE:
  • The behavior and shortcomings explained below apply to Mongoid versions 2.4.0 (released on 5th Jan, 2012) and releases previous to that. A recent commit made on 10 Jan, 2012 fixes all these shortcomings.
  • For those using the affected versions (all Rails 3.0 developers), this monkey patch will address the shortcomings.

In my previous post I wrote about getting a list of aliased field names. From that post it might be evident that dealing with aliased field names is not that straight forward in Mongoid. I am using Mongoid v2.2.4 which the latest version working with Rails 3.0. Mongoid v2.3 and later require ActiveModel 3.1 and hence Rails 3.1.

Anyways, aliased field names have these shortcomings :
  1. Accessor methods are defined only with the aliased names and not the actual field names.
  2. Dirty attribute tracking methods are not defined for the aliased names.
  3. attr_protected, if used, should be used with both short and long forms of field names.
Writing about all three in a single post would result in an awfully long post. So I will put details about each of these in their own  posts, starting with the first one in this post.

Accessor methods are defined only with the aliased names and not the actual field names.


Consider the following model definition:
class User
  include Mongoid::Document

  field :fn, as: :first_name
  field :ln, as: :last_name
end
I would have expected the additional accessor methods names 'first_name', 'first_name=', 'last_name' and 'last_name='  to be simple wrapper methods which just forward the calls to the original accessor methods :- 'fn', 'fn=', 'ln' and 'ln='. But Mongoid just doesn't create the shorter form of the accessor methods at all.
user = User.new
user.respond_to?(:fn)         # Returns false
user.respond_to?(:ln)         # Returns false
user.respond_to?(:first_name) # Returns true
user.respond_to?(:last_name)  # Returns true
This doesn't appear like a problem at first sight because an application developer would use the long form of the methods in the application code. Trouble begins in the dirty tracking methods which use the actual attribute name and consequently the shorter form of field names. Take a look at these parts of Mongoid and ActiveModel:
  • Definition of setter method for any attribute - Github link for v2.2.4
    define_method("#{meth}=") do |value|
      write_attribute(name, value)
    end
    
    Notice that the field name (i.e. the short form) is being passed to write_attribute, which eventually gets passed to ActiveModel's dirty attribute tracking method attribute_will_change!

  • Definition of the ActiveModel method : attribute_will_change! -- Githib link for v3.0.11
    def attribute_will_change!(attr)
      begin
        value = __send__(attr)
        value = value.duplicable? ? value.clone : value
      rescue TypeError, NoMethodError
      end
    
      changed_attributes[attr] = value
    end
    
On line no : 3 the method with the same name as that of the attribute's short name is invoked with __send__. Since Mongoid doesn't define such methods this mostly results in NoMethodError which is caught and swallowed and nothing happens. This is comparatively harmless. But if at all a method with the same already exists, then that method gets called and a lot of unwanted things can happen. In the case of the User model above, the 'fn' just results in NoMethodError, where as the 'ln' field could result in any of the following methods :

Object.ln
FileUtils.ln
Rake::DSL.ln

That could result in pretty nasty errors about these ln methods and you wouldn't even know why these are being called!. Now whether it is a good practice to name your attributes in a way that clash with already defined methods is a totally different thing. But just remember that the cause of a weird error is probably aliasing.

Wednesday, January 18, 2012

Getting the list of aliased key/attribute names from a Mongoid model

At some point today when I was writing some model specs for one of my Mongoid models, I required the list of all of the attribute/key names. Mongoid provides a handy "fields" method for this, which returns an hash of key names and Mongoid::Fields::Serializable object pairs. Getting the list of names from that was easy : Model.fields.keys.

This gives the list of the actual key names. The actual key names, in my case, are very short strings (1 to 3 characters) and I have long aliases for those in my models. What I eventually realized was that I wanted the list of  the longer aliased names. Looking around the mongoid code did not give me any direct method. Turns out that the aliased names result in nothing more than a few additional 'wrapper' methods (like the accessors, dirty methods, etc) and there is no table/hash kind of thing maintained anywhere to give the mapping between the actual key name and the aliased ones. So my current guess is that the list of these aliased names is not available directly anywhere.

So I came up with this hackish way of getting that list of aliased names.

p = Post.new
actual_field_names = p.fields.keys
all_field_names = p.methods.collect { |m| m.to_s.match(/_changed\?$/).try(:pre_match) }
                    .select { |e| e }
aliased_field_names = all_field_names - actual_field_names

As mentioned earlier, this is pretty hackish. If you know of a straight forward way, do let me know.

Note : I eventually found out that I did not actually need this list of aliased names. I did not use this in my project. Nevertheless it works just fine.

Sunday, January 1, 2012

MongoDB concurrency - Global write lock and yielding locks

There has been lot of hue and cry about MongoDB's global write lock. Quite a few people have said (in blog posts, mailing lists etc) that this design ties down MongoDB to a great extent in terms of performance. I too was surprised (actually shocked) when I first read that the whole DB is locked whenever a write happens - i.e a create or update. I can't even read a different document during this time. It did not make any sense to me initially. Previous to this revelation I was very pleased to see MongoDB not having transactions and always thought about that feature as a tool which avoided locking the DB when running expensive transactions. However this global lock sent me wondering whether MongoDB is worth using at all.. !! I was under the assumption that the art of "record level locking" had been mastered by the database developers. This made MongoDB look like a tool of stone age.

Well I was wrong. Turns out that "Record level locking" is not that easy (and the reasons for that warrant a different post altogether) and from what I understand MongoDB has no plans of implementing such a thing in the near future. However this doesn't mean the DB will be tied up for long durations (long on some scale) for every write operation. The reason is that MongoDB is designed and implemented in ways different than other databases and there are mechanisms in place to avoid delays to a large extent. Here are a couple of things to keep in mind :

MongoDB uses memory mapped files to access it's DB files. So a considerable chunk of your data resides in the RAM and hence results in fast access - fast read all the time and very fast write without journaling and pretty fast write with journaling. This means that for several regular operations MongoDB will not hit the disk before sending the response at all - including write operations. So the global lock that is applied exists only for the duration of time needed to update the record in the RAM. This is orders of magnitude faster than writing to the disk. So the DB is locked for a very tiny amount of time. This global lock is after all not as bad as it sounds at first.

But then the entire database cannot be in RAM. Only a part of it (often referred to as working set) is in RAM. When a record not present in RAM is requested/updated MongoDB hits the disk. Oh no, wait.. so does that mean the DB is locked while Mongo tries read/write that (slow) disk? Definitely not. This is where the "yield" feature comes in. Since 2.0 MongoDB will yield the lock if it is hitting the disk. This means that once Mongo realizes it is going for the disk, it sort of temporarily releases the lock until the data from disk is loaded and available in RAM.

Although I still prefer record level locking in MongoDB, these two above mentioned features are sufficient to reinstate my respect and love for MongoDB. :)