2006-12-03

Negative Arrays

My current work involves the creation of an extensive configuration file format representing project information. In a number of cases I have had need of an inclusion list, representing files and file patterns to include for some operation --say, what files to include in a package. In such a case I generally end up with a least two parameters which I basically label include and exclude. While include is the list of files to use, exclude is list of files to exclude from the included list. Using exclude makes it easier to specify a large selection and then subsequently omit a file or two. The include list typically has a suitable default value, so a third parameter is sometimes also of use, append, which concats to the defaults as opposed to replacing the include parameter outright.

Since these three parameters help define what is essentially one list of data, it would be nice if they could be specified as a single parameter too. So I gave the problem some thought.

Taking inspiration from the notion of a negated symbol (see facets/symbol/not). It occurred to me that any object that can be added or subtracted is taking part in the same "algebraic group" as whole numbers. And just as a whole number can be negative, why not also an array?


a = [:a,:b,:c,:d,:e]
n = -[:d,:e]
a + n #=> [:a,:b,:c]


So this could be very helpful. And it shouldn't be too much hard to implement.


class Array
def @-
@negative = !@negative
self
end

def negative?
@negative
end

alias :add :+
alias :sub :-

def +(other)
if negative?
if other.negative?
-add(other)
else
-sub(other)
end
else
if other.negative?
sub(other)
else
add(other)
end
end
end

...
end


I'm sure tighter implementation is possible, but you get the idea. So then include and exclude parameters could be specified in a single parameter.


files = [ '**/*', -[ 'Installedfiles' ] ]


Neat! But unfortunately it doesn't really solve the whole problem since YAML doesn't understand this negative listing concept either. It could still be of use in general Ruby scripts though. Notations such as this often prove very powerful. And in fact the idea does move us in a possible workable direction. There's no reason a string can't be marked as negative as well. After all it's just a flag. In fact, if we move the core method @- to Object itself, then any object can be so indicated. The above line could then be written:


files = [ '**/*', -'Installedfiles' ]


Methods such as Dir.multiglob(*files) (another Facet) could use this extra bit of information to provide the desired results, equivalent to:


files = Dir.glob('**/*') - Dir.glob('InstalledFiles')


Of course, this still doesn't quite help us with the YAML configuration file, but with a little fudging we can get a useful format.


files: [ '**/*', '-InstalledFiles' ]


As for the append parameter that was mentioned as the beginning, we could just add a special notation for this as well, say, '$' to mean defaults.

Okay. So will I use this bit of trickery to reduce three parameters to one? Perhaps. While the result is wonderfully practical in usage, it's not necessarily so simple to implement. Either a filter would have to split the one entry into three parts when loading, or an untold number of methods would have to augmented to take the trick into consideration. The later I imagine would simply prove too extensive w/o pre-established support for the negation concept. The former might be reasonable however. I'll give it a try.

In any case it was in interesting thought experiment. Although perhaps you have a better way to represent this kind of information?

2006-11-11

We all live in a Yellow Submarine

Anyone having read my previous posts, here and on ruby-talk, knows I love to explore ideas. It doesn't matter if they are considered "good" or "bad". Most cannot be judged without first exponding on them anyway. In fact, half the time I have no idea where an idea might lead until I sit down and blog it.

Tonight's idea comes by way of frustration with project organization, specifically module namespaces. A minor issue for small projects; large projects on the other hand... Well, consider my current problem. I have a class called Project. Now related to Project are a number of modularized tool sets. Each tool set can be used independently of the Project class, but typically will be used via it. So where do I locate the tool sets? My first instinct is to go ahead and put them in the class itself.


module MyApp
class Project
module ToolSetA
module ToolSetB


But I find this less the optimal. While it may be a small and unlikely matter, class is not a bonofide namespace --it is a class. And while it depends on these tool sets, the tool sets do not necessarily depend on it. As such we would never be able to include these tool sets in another namespace --such as the toplevel if it struck my or some users fancy. So we are left then to use some alternate organization.


class Project
module ProjectToolsets
module ToolSetA
module ToolSetB


or as a compromise


class Project
module Toolsets
module ToolSetA
module ToolSetB


The downside here, of course, are the long-winded names. But a better solution eludes me, other than one possibility: the use of all capitals for pure namespace modules.


class Project
module PROJECT
module ToolSetA
module ToolSetB


It's a bit strange in appearance but it works well. One quark however is that this new rule begs for my project's toplevel namepsace to be all caps too. Do I want to go there?


module MYAPP
class Project
module PROJECT
module ToolSetA
module ToolSetB


I'm not sure it's the solution I'm after, but to its merit, it does draw a nice distinction between namespace modules and other modules and classes.

In the course of this consideration I began to wonder about the distinction between Class and Module. The difference is almost not-existent in reality. If you peer into the Ruby source code you will find that interoperability between them is purposefully prevented. After all Class is a subclass of Module. Yet they are made distinct for a good reason. They provide conceptually different ideas. A class represents an data archetype; a module represents a reusable component. In fact, one could easily argue that Module itself could use an additional distiction between Namespace and Mixin. Even so, I could not help but wonder if it might yet be possible to have a single Encapsulation, relegating the differences to the elements within them instead of the encapsulation types themselves. I imagined this:


class Something
def x
def y
mod_def a
mod_def x
class SomethingElse


Instantiating via Something.new or subclassing would provide the instance methods x and y. Including however would provide the mod_def methods a and x instead along with adding SomethingElse to the including namespace. More refined means of controlling namespace become possible. For instance include_constants could limit inclusion to constants only; vice-verse with include_methods. Methods could be defined as both instance and module methods. And while we're at it, throw in a class_def as an alternative to def self.x.

It's interesting. I've often thought about the idea of eliminating the distinction between Class and Module. This is the first time it's occurred to me that it could be done while retaining the utility of that distinction by passing responsibility down to the methods themselves.

I suppose now the question is, what are the downsides to this? That'll require further consideration, but one clear point is that methods are less cleanly divided. You could have module methods scattered about your class definitions, weaving in and out of your instance methods. I suspect we would make an effort to nicely organize them however. Besides it means having fewer modules to name --and I'm all for anything that reduces the number of names I have to make-up.

It would be interesting to see how far one could go in implementing this in pure Ruby. Some details of Ruby will hold back a perfect implementation, but the essence of it is certainly possible. For starters, here's a neat trick for doing without the distinction between class and module.


class Module
def new
mod = self
Class.new{ include mod }.new
end
end


Have fun! Unfortunately I'm not. I'm still stuck on the forementioned namespace issue! Oh well. Back to the coding board...

2006-11-10

Separation of Church and State

Have you ever had a class so choke full of interrelated data and function members that had trouble avoiding name clashes between the two. Of course it's a rare problem when you're in full control of the members, but when you're designing extensible classes, it become a major issue and you have to resort to some less-than-lovely work around.

Let me give you a simple scenario.


class Package
# Release date
#
attr :release

# Release the package.
#
def release
puts "Telling the world on #{@release}..."
# ...
end
end


The issue here is clear. On one hand, we want to use release as a noun to represent the date of release. On the other, we want to use it as a verb for releasing the package to the world. Of course, under completely isolated circumstances we could just change one of the names and deal. But when we are working on the basis of extensibility, where these and additional data or functional members may be added readily, say via a plug-in system, then a solution is not as simple.

So what can we do? The bottom line is that in some way or another the two member types must be distinguished from one another.

One could transform one set of the members with a slightly different name via some uniform convention. For instance, all data memebers could start with "my_", so release as a date would be my_release. Ruby actually makes this it a bit nicer in that we can use '?' or '!' prepended to method names. A fair solution might then be:


def release?
@release
end


or


def release!
puts "Telling the world on #{release?}..."
#...
end


It's not a perfect solution however, especially as a matter of convention. It goes against the grian. '?' typically indicates a true/false query. And '!' indicates in place or caution. Consider how others will "smell" your code when they see a question mark for every reference to a data member.

The other more traditional solution is to use delegation. In this case we make a subclass for either or both of the member types. For instance:


class Package

class Data
attr :release
end

attr :data

def initialize
@data = Data.new
end

def release
puts "Telling to the world on #{data.release}..."
#...
end

end


Albeit a bit longer. It works very well. Delegation is a powerful tool. One could even emulate the former solution via method_missing, trapping method calls that end in '?' and rerouting them to @data. Another advantage is that we can readily pass around the data independent of the function members. On the flip side however, we are regulated to this special data.member interface. and likewise any reverse access by the data members to the functional members, if ever needed, would require us to also pass a reference to the Package instance into the Data instance.

In considering all this of course, it becomes apparent that Ruby already has a means of distinguishing data members from functional members via instance variables. Clearly @release references the date. But Ruby does not give us the power to treat "instance members" publicly or programaticlly. We can't, for instance, use project.@release to access the release date. Nor can we wrap data members in order to massage their data, say:


def @release
super || Time.now
end
public :@release


I'm sure many readers will take such notion for simply god aweful. But I think careful consideration at least warrants the fair question. "Is a distinct separation between data and functional members useful?" The mere existence of instance variables indicates that the distinction is in fact useful. In contrast, data members could have been made indistinguishable from functional members, or local variable persistence could be used in their stay. So if the distinction is useful, why hide public access to data members behind functional members acting as mini-delegates?

To be a bit more pragmatic, how would a solution to our example pane out if data members were in fact accessible? Interestingly it could look exactly like the original example. Public access to the release date however would simply come via project.@release or preferably even project@release. And there would be no need for any name (mis)conventions or special-interface delegation.

Of course let's be honest here. '@' itself is the Special Delegate of State to the Ruby "Church". Too bad he's only allowed to preach to the chior.

2006-10-02

I See You


You have to check this out. Okay. I know. I know. It's been blogged by others many times now. In fact, you can check out these other blog posts too: Why_the_lucky_stiff has it on Redhanded, and J'ey put out the word as well. But it's just so damn cool, I just had to blog it too!

Go ahead. Try it at home:


require 'drb'

def brite_shot(text)
url = "druby://eviladmins.org:9000"
obj = DRbObject.new_with_uri(url)
File.open("out.jpg",'wb') do |a|
a << obj.write_simple(text)
end
end

brite_shot("I SEE YOU!")


(And if a still picture isn't enough for you. How about an animation.)

And so, a REAL picture is made for me by remote many thousands of miles away. How far I wonder? Well, with a GeoIP lookup and Google Maps, I found out where this picture was made. The world really is getting smaller!

I See You!

2006-09-21

I No Longer Believe In Methods

Yes, that's right. I no longer believe in methods. Why? Becuase this is OOP and if the methods aren't honest to goodness objects from the start, without exception, then they are not really methods. They're just glorified functions.

Let me show you what I mean. I have a Project class in which is defined a set of tools that can manipulate it. Clearly the proper implementation abstraction for these tools is the method.


Class Project
def announce
puts "This is my new project!"
end
end


But these tools must be distinguishable from other supporting methods. Moreover additional information may be associated with these tools, like help information or valid states of the Project for the tool to be useful. How do we encode this information with Ruby? Since methods aren't first class objects, we are forced to return to functional programming. In which case, it is best to define some sort of DSL.


module ProjectDSL
def help( name, text )
@help ||= {}
@help[name] = text
end

def valid( name, &test )
@valid ||= {}
@valid[name] = test
end
end

Class Project
extend ProjectDSL

def announce
puts "This is my new project!"
end

help :announce, "Announces your project to the world!"
valid :announce { @version > "0.0.0" }
end


Now this kind of thing has come up enough in large projects, such as Nitro, that a general means of annotation proved to be most effective.


require 'facet/annotation'

Class Project
def announce
puts "This is my new project!"
end

ann :announce, :help => "Announces your project to the world!",
:valid => lambda { @version > "0.0.0" }
end


Annotations works very well, and if you find yourself in need of this kind of "method metadata" it is an excellent approach.

But I've worked with all this long enough now to be able to have a wide perspective on it and it's become very clear to me that the whole "special needs" arises out of the fact that Ruby is still just a functional lanaguage in this respect, and not a full OOPL. (And please no objections that method() and instance_method() make it otherwise, these do not provide a persitant object, but return a new object every time they are invoked.) So what might the above look like if it were not so? There would of course be more than one way to go about it, but imagine this:


class Tool < Method
attr_accessor :help

def valid( &test )
@valid = test
end
end

Class Project
def announce => Tool
puts "This is my new project!"
end

announce.help = "Announces your project to the world!"

announce.valid { @version > "0.0.0" }
end


I'm taking some liberties with the syntax here for clarity. But in anycase it certainly paints a provocative idea. And I would argue that it paints the approprite OOP idea too.

2006-08-26

The Persnickety Order of Self Serving Modules

This one caught me off guard.


irb(main):001:0> module M
orb(main):002:1> def x; "x"; end
orb(main):003:1> end
=> nil
orb(main):005:0> module Q
orb(main):006:1> extend self
orb(main):007:1> include M
orb(main):008:1> end
=> Q
orb(main):009:0> Q.x
NoMethodError: undefined method `x' for Q:Module
from (orb):9
from :0


Why isn't the included #x coming along for the ride in the self extension of Q? This kind of dyanmicism is vital when dynamically loading behaviors. And so I suspect it must do with the dread Dynamic Module Inclusion Problem? And it would appear that I am right:


orb(main):012:0> module M
orb(main):013:1> def x; "x"; end
orb(main):014:1> end
=> nil
orb(main):015:0> module Q
orb(main):016:1> include M
orb(main):017:1> extend self
orb(main):018:1> end
=> Q
orb(main):019:0> Q.x
=> "x"


Yes, another of the edge cases. But the preponderance weighs heavy on the Coding Spirit. The Dynamic Module Inclusion Problem is getting old.

2006-08-21

Main Campaign "Out of Object Now!"

The word from Matz on Kernel as toplevel object:


I don't feel that making Kernel as toplevel self is not a good idea,
because:

* toplevel def does not define methods on Kernel, but Object.
* toplevel include does not include modules into Kernel, but Object.
* toplevel private etc. do not work on Kernel, but Object.


I wouldn't call that an explanation exactly, more an explicative of the current behavior. I'm sure Matz has his reasons, and we can just assume that he wants to keep the namespace distinct. Fair enough, and of course he can do that. (It's not REALLY a democracy after all!) So Kernel drops out of the race. But the Kernel's running mate, Main, is still here campaigning.

You might be surprised to learn (as I was when I first discovered it):


Object.public_instance_methods(false) +
Object.private_instance_methods(false) +
Object.protected_instance_methods(false)
=> []


That's right. There's not a single method defined in Object. All the methods ri tells you belong to Object actually are inherited from Kernel. But from there, any toplevel method we define does end-up in Object. Hence the clear separation of namespace I mentioned above.

Now a separate module, eg. Main, would do just as well. Kernel need not be used. And as I've expressed before, Main could be induced into Object for all the same effects. The base hierarchy then being Object < Main < Kernel.

But wait a second! Why are all these toplevel methods sneaking into all my Object's anyway? I can just as easily add them to Object myself if that's what I want. I don't need some cheap toplevel proxy to do it for me. In fact, that can be a problem too.


module Foo
def self.method_missing( name, *args )
super unless require "foo/#{name}" rescue nil
send( name,*args )
end
end


Then some unsuspecting nuby comes along (okay I admit, it was I and it happened to me today!) and innocently adds to the top level:


def check( name )
name == "foo"
end


Well, so much for my lazily required Foo.check routine. It's been whacked from the top down!

You see where I'm going now? Primaries are over Main has taken Kernel out of the running with a new divisive platform. "Out of Object Now!"

2006-08-18

Vote Kernel for Toplevel Object


As a follow up to my last post on the "pain that is main", I want to offer a potential improvement for Ruby 2.0. I approached the topic on ruby-talk this week and while Matz initially took some interest, he hasn't followed up since his last comment:


matz: Why? If it is really required it's fairly
easy to add toplevel methods like we did for #include.


But it isn't always easy. In order get my Taskable module to work, for instance, I had to make exceptions for the toplevel case, which is far from ideal and is fragile [Ed- in fact I'm still getting bugs that I haven't yet pinned down]. These subtile difficulties arise becuase main acts as a partial proxy for the Object class. Anyone who has created a proxy object before knows the subtile issues that can come into play. In this case, only the bare minimal interface is supported --essentially the method #include. Yet, even if we take matz' advice and add in all the missing proxy methods, we still won't be 100% out of the woods. The Object class and main are fundamentally two distinct objects --self is not the same, nor are their singleton classes, &c. In the vast majority of cases this will never present an issue, but the distinction can creep in. Here's an highlight of one way it can:


module Q
define_method :q do
base = self
define_method :r do
base == self.class
end
end
end

class C
extend Q
q
end

c = C.new
p c.r

# per matz' direction

def define_method( name, &act )
Object.class_eval {
define_method( name, &act )
private name
}
end

extend Q
q
p r

produces

true
false


So in effect Ruby is mildly schizophrenic. The false reading is because main != Object. So, you can't neccessarily create a DSL for Object to be used in main, and you can't neccessairly create a DSL for main to be used in any Object. Hence the devolution to DRYless code.

There is a potentially elegant solution however, and I'd really like to understand others insights into this (esspecially Matz' of course): Instead of main being a special proxy object, just let it be a self extended module.


module Main
extend self
# programs are written as if in here
end


This would provide all the facilities required of the toplevel without all the proxy troubles. Also, while I'm not so convinced of the merits of every toplevel method becoming a private method of all objects (and with Main that can be easily prevented), it has proven workable in practice so it's not a significant factor of consideration here. Main can simply be include in Object to achieve that effect.


class Object
include Main
end


But when we do that it becomes very clear what Main appears to be: Kernel. That strikes me as esspecially interesting. Then again, there may be good reasons to keep the Kernel as a separate module, in which case we'd just have a class hierachy:


Object.ancestors
=> [Object, Main, Kernel]


Nevertheless, it is clear the Kernel could just as well serve as the toplevel object, which, IMHO, makes this an elegant proposition to consider. Perhaps I'll start a campaign as November elections roll around: "Vote Kernel for Toplevel Object!" ;-)

2006-08-17

Main is a DRY Pain

I have a module that depends on define_method and ancestors. It works great when I include it in other modules or classes. But if I try including it into the toplevel it fails miserably.


NameError: undefined local variable or method `define_method' for main:Object


That seems fairly peculiar when you consider that defining methods at toplevel is perfectly acceptable. One must then wonder, what is this toplevel thing anyway?


self #=> main


Okay, it calls itself "main". Great. But that doesn't really tell us anything. Let's check it's class:


self.class #=> Object


Ah. So it's an instance of Object. An instance of Object!? How is that possible? A normal instance of Object and main can't be exactly the same. Indeed, they are not.


(public_methods - Object.public_instance_methods).sort
=> ["include", "private", "public"]
singleton_methods
=> ["include", "private", "public", "to_s"]


Notice include has been defined here specifically for main. So something special's going on when you include a module into the toplevel. Hmm... What about methods defined at the toplevel? If main is an instance of Object then are they singleton methods? Well, no. Turns out they get "magically" defined as private methods of the Object class itself, and main's singleton class space is actually something else entirely.


class << self
def x; "x"; end
end

class Q
def q; x; end
end

Q.new.q
=> NameError: undefined local variable or method `s' for #<Q:0xb7ce9a3c>


Which means the sure solution for my problem...


module Kernel
include MyModule
end


Can you guess?


module M
def m; "m"; end
end

module Kernel
include M
end

m
=> NameError: undefined local variable or method `m' for main:Object

class Q
def q; m; end
end

Q.new.q
=> NameError: undefined local variable or method `m' for #<Q:0xb7cbb324>


Lands me squre in the face of the Module Inclusion Problem.

All this leads me to two points. First, I'm stuck! There seems to be no solution to my problem other than rewritting a second version of my module specifically for the toplevel. Talk about lack of DRY! And 2) Why in the world isn't main a self extended module?

2006-07-23

Taming the File System Zoo

I'm not quite sure I undestand why we have all these classes and modules: Dir, File, FileUtils, FileTest and Pathname. I understand what each of these does, of course, but I don't understand why the clearly related functionality has been spread about.

I think a FileSystem class or module would be in order -- a system we could use in much the same manner as we use the command shell to access our filesystem.


fs = FileSystem.new
fs.cd('/')
entries = fs.ls
entries.each do |e|
if fs.directory?(e)
...
else
fs.open(e) { |f| ... }
end
end


It's not so common that we open files and harbor them, even less so for directories. So the FileSystem can have those defined within it as well, and there's no reason to even draw them up yourself. FileSystem can do it for you:


fs['afile.txt'] #=> <#FileSystem::File...>
fs['adir/'] #=> <#FileSystem::Dir...>


Clear, straightforward and convenient. But best of all, this single point of entry into all things "file system" may lead to more interesting possibilities, in some respect similar to what FUSE offers us in general, only confined to Ruby's realm.

I also wonder how remote file systems might fit into this... interesting considerations all.

2006-07-22

Singleton 2 Legion

For as long as I've coded Ruby there has been some question as to the appropriateness of the term singleton class. This is the term generally used in Ruby parlance to refer to the language construct: class << obj; self; end, and is the context of class methods and module methods.

While the term is fitting in the sense that each object only has one of these classes (hence "single"), issue arises from a terminology clash with an already well accepted object-oriented programming term: http://en.wikipedia.org/wiki/Singleton_pattern. In OOP-speak a singleton class is a class that can only be instantiated once. Many have argued that the Singleton pattern itself is a flawed concept, since one can just instantiate an object and assign it to a constant to achieve the same end. So why restrict anyone from a second instance of a class if they so choose? That's a reasonable argument, although not always as practical as it might seem. Nonetheless, the terminology clash still remains.

In light of nomenclature issue, many have offered alternative terms for Ruby's singleton class. Indeed, Ruby itself has used two alternatives in the past: virtual class and metaclass, and many people still prefer that later of these choices. Other suggestions include the well known _whyism 'eigenclass', the abbreviated 'sclass', the pronoun contingents 'myclass' and 'ownclass', as well as my own concoctions 'adhoc', 'quaclass' and 'nonce'. Yet despite all these names, none of which ever stick, what is this singly thing really?

Recently, our beloved Pit Capitan put it concisely when he said "there are only two types of methods in Ruby, instance methods and singleton methods". Well said. Unfortunately it's not really true. Instance methods aren't really methods at all, they are just definitions for methods to be. That's why when you use #instance_method you get something called an UnboundMethod, not a Method. It only becomes a method when it's bound to an object. To clarify further how that distinction is a bit misleading, consider that when a singleton method is defined it falls into an inheritance hierarchy along side so-called instance methods. In other words, the singleton methods and instance methods exist on the same playing field, and the former can call on the other via #super. In fact, we can access these singleton methods in instance fashion via (class<<obj;self;end).instance_method(sym). So these methods are not really distinct in this manner after all. In fact there is no distinction between methods other than bound and unbound. And this distinction arises from the capability of class Module and class Class to harbor a set of method definitions that are not their own. Which is of course exactly why these constructs exist in the first place. So when we say singleton method, we are not referring to something different from instance method. We only mean that these methods are kept in a special "singleton class", made just for a specific object, and consequently, are automatically bound to that object.

Now, I'm going to claim that the term 'singleton' is a poor choice for a completely different reason than any given before. It may come as a bit of surprise, but singletons are not inherently single. They are only made so by an explicit restriction in the Ruby's source code. It is quite simple, actually, to remark the if-clause out of the source, recompile Ruby and then do:


o = Object.new
def o.x
"x"
end
def o.x
super + "x"
end
o.x => "xx"


The reason Ruby makes the per-object classes single is because it would be terribly inefficient to define a whole new class for every new singleton method defined. That's understandable, but it also comes at a cost. We can not reliably receive an object from the outside world and define our own singleton on it because we may be clobbering someone else's singleton method without even knowing it. (NOTE: It doesn't matter so much if you're just redefining a method altogether, but if you're calling super it very much matters.) Generally we don't even think about such things, but truth be told, object singleton methods often fail to survive code refactoring. Object singleton's really only exist as a side effect of Ruby's class model (and Smalltalk's) which utilize the singleton class as a means of separating a module/class method definitions from it's own actual methods. This could have been done another way of course, but then a class/module would be something wholly different from any other object. The use of the singleton allowed classes and modules to be just like any other object. So singletons are really a bit of cleverness that lie at the very heart of how Ruby works.

But does it mean that they have to be singly? Could we remove the restriction and open up a additional robustness to these per-object classes? We could. In fact, this was the very first hack I ever made to the Ruby source code. My change simply checked to see if the method was already defined in the first "singleton" layer and continue upward until it either found a usable layer or ran out of layers, in which case it created a new one. Combined with the capability to selectively undefine particular methods of particular layers and we gain a flexible pre-object class hierarchy system that can be used without the caveats that currently make singletons so limited outside of class and module definitions.

And then we no long can call them "singleton" but rather "legion" ;)