2006-07-23

Taming the File System Zoo

I'm not quite sure I undestand why we have all these classes and modules: Dir, File, FileUtils, FileTest and Pathname. I understand what each of these does, of course, but I don't understand why the clearly related functionality has been spread about.

I think a FileSystem class or module would be in order -- a system we could use in much the same manner as we use the command shell to access our filesystem.


fs = FileSystem.new
fs.cd('/')
entries = fs.ls
entries.each do |e|
if fs.directory?(e)
...
else
fs.open(e) { |f| ... }
end
end


It's not so common that we open files and harbor them, even less so for directories. So the FileSystem can have those defined within it as well, and there's no reason to even draw them up yourself. FileSystem can do it for you:


fs['afile.txt'] #=> <#FileSystem::File...>
fs['adir/'] #=> <#FileSystem::Dir...>


Clear, straightforward and convenient. But best of all, this single point of entry into all things "file system" may lead to more interesting possibilities, in some respect similar to what FUSE offers us in general, only confined to Ruby's realm.

I also wonder how remote file systems might fit into this... interesting considerations all.

2006-07-22

Singleton 2 Legion

For as long as I've coded Ruby there has been some question as to the appropriateness of the term singleton class. This is the term generally used in Ruby parlance to refer to the language construct: class << obj; self; end, and is the context of class methods and module methods.

While the term is fitting in the sense that each object only has one of these classes (hence "single"), issue arises from a terminology clash with an already well accepted object-oriented programming term: http://en.wikipedia.org/wiki/Singleton_pattern. In OOP-speak a singleton class is a class that can only be instantiated once. Many have argued that the Singleton pattern itself is a flawed concept, since one can just instantiate an object and assign it to a constant to achieve the same end. So why restrict anyone from a second instance of a class if they so choose? That's a reasonable argument, although not always as practical as it might seem. Nonetheless, the terminology clash still remains.

In light of nomenclature issue, many have offered alternative terms for Ruby's singleton class. Indeed, Ruby itself has used two alternatives in the past: virtual class and metaclass, and many people still prefer that later of these choices. Other suggestions include the well known _whyism 'eigenclass', the abbreviated 'sclass', the pronoun contingents 'myclass' and 'ownclass', as well as my own concoctions 'adhoc', 'quaclass' and 'nonce'. Yet despite all these names, none of which ever stick, what is this singly thing really?

Recently, our beloved Pit Capitan put it concisely when he said "there are only two types of methods in Ruby, instance methods and singleton methods". Well said. Unfortunately it's not really true. Instance methods aren't really methods at all, they are just definitions for methods to be. That's why when you use #instance_method you get something called an UnboundMethod, not a Method. It only becomes a method when it's bound to an object. To clarify further how that distinction is a bit misleading, consider that when a singleton method is defined it falls into an inheritance hierarchy along side so-called instance methods. In other words, the singleton methods and instance methods exist on the same playing field, and the former can call on the other via #super. In fact, we can access these singleton methods in instance fashion via (class<<obj;self;end).instance_method(sym). So these methods are not really distinct in this manner after all. In fact there is no distinction between methods other than bound and unbound. And this distinction arises from the capability of class Module and class Class to harbor a set of method definitions that are not their own. Which is of course exactly why these constructs exist in the first place. So when we say singleton method, we are not referring to something different from instance method. We only mean that these methods are kept in a special "singleton class", made just for a specific object, and consequently, are automatically bound to that object.

Now, I'm going to claim that the term 'singleton' is a poor choice for a completely different reason than any given before. It may come as a bit of surprise, but singletons are not inherently single. They are only made so by an explicit restriction in the Ruby's source code. It is quite simple, actually, to remark the if-clause out of the source, recompile Ruby and then do:


o = Object.new
def o.x
"x"
end
def o.x
super + "x"
end
o.x => "xx"


The reason Ruby makes the per-object classes single is because it would be terribly inefficient to define a whole new class for every new singleton method defined. That's understandable, but it also comes at a cost. We can not reliably receive an object from the outside world and define our own singleton on it because we may be clobbering someone else's singleton method without even knowing it. (NOTE: It doesn't matter so much if you're just redefining a method altogether, but if you're calling super it very much matters.) Generally we don't even think about such things, but truth be told, object singleton methods often fail to survive code refactoring. Object singleton's really only exist as a side effect of Ruby's class model (and Smalltalk's) which utilize the singleton class as a means of separating a module/class method definitions from it's own actual methods. This could have been done another way of course, but then a class/module would be something wholly different from any other object. The use of the singleton allowed classes and modules to be just like any other object. So singletons are really a bit of cleverness that lie at the very heart of how Ruby works.

But does it mean that they have to be singly? Could we remove the restriction and open up a additional robustness to these per-object classes? We could. In fact, this was the very first hack I ever made to the Ruby source code. My change simply checked to see if the method was already defined in the first "singleton" layer and continue upward until it either found a usable layer or ran out of layers, in which case it created a new one. Combined with the capability to selectively undefine particular methods of particular layers and we gain a flexible pre-object class hierarchy system that can be used without the caveats that currently make singletons so limited outside of class and module definitions.

And then we no long can call them "singleton" but rather "legion" ;)