Tuesday, January 09, 2007

Truthfulness

Cobra copies a lot of syntax and high level semantics from Python, including how to determine truthfulness:

... if stuff
... ... print 'I have some stuff.'

(Sorry for the periods, but Blogger is stripping preceding whitespace. Maybe I'll have to find a new blog site. Suggestions are welcome.)

In Python, as well as the current release of Cobra, "stuff" is considered true if it is non-zero, non-nil, non-empty or the boolean value, true. So even empty strings and collections are considered false. That can be useful since programmers often want to take or avoid an action if a string is blank or a collection is empty. Code becomes more terse:

... if not name
... ... print 'You have no name'
... # vs:

... if name is not nil and not name.length

... ... print 'You have no name'


But that also introduces a lot of expressions in the form "stuff is not nil" either because that is exactly what you need, or for efficiency.

Keep in mind that determining "non-empty" involves inspecting the object to allow a method to partake in the determination ("__nonzero__" in Python; "length" or "count" in Cobra). That's expensive and so between that expense and the numerous "...is not nil" fragments in my code, I decided to reconsider this semantic.

Would it be better to drop the semantics regarding blank strings, empty collections and object participation in truthfulness? I wasn't sure, but felt that an examination of Cobra source code would be telling.

So I augmented the Cobra compiler to count the number of times a reference type was checked for truthfulness vs. the occurrence of "x is not nil". I then applied it to the largest Cobra project to date: the Cobra compiler. The counts came out:
  • 107 - instances of "x is not nil"
  • 509 - instances of simple truthfulness on reference types
So it would appear that high-level truthfulness is quite popular. But as part of doing the count, I printed the expressions that were being considered for their truthfulness.

And I was alarmed at how many of them should have been "is not nil" from an efficiency point of view. Basically, if you know that the type of an object does not customize truthfulness with the "count" method, then "x is not nil" is much more efficient than "x". That's because "x" implies checking the type (String, ICollection) and possibly checking for a "count" method.

I read though all 509 and counted the ones that should have used the simpler calculation of "x is not nil". That changed the numbers to:
  • 412 - x is not nil
  • 204 - truthfulness on reference types
Not only is "is not nil" a more popular computation (2 to 1), but the current approach leads to slower running programs if you're not diligent.

Furthermore, Cobra's compile-time nil tracking means that many variables cannot be nil anyway and therefore the expression:

... if s is not nil and s.length

becomes:

... if s.length

when "s" is a "String" rather than a "String?":

... def doSomething(s as String)
... ... # "nil" cannot be passed in for "s"
... ... # because its type is not "String?"


So I made the change and then updated the Cobra compiler source, test cases, samples and docs. This experience confirmed that the new semantics were usable, writeable, readable, in other words, capable. (Btw this is another advantage to writing a language's compiler (or interpreter as the case may be) in the language itself--you find out fairly quickly which ideas were good and which were bad.) This will show up in the 0.4 release later this month.

Regarding Cobra's ongoing development, there are certainly major additions in the future, for example, operator overloading. But this is the last change I know of that has a major impact on existing code.