Design by Contract: April 2005

Design by Contract

Tuesday, April 26, 2005

Predicates and Expressions - Part 2

Last time I started discussing the expression language used to define predicates on types and members. My conclusions were:

C# expression syntax used since code must be transformed to that in the end, and the more similar the syntaxes of the expression language - the easier the transformation will be.
No side-effects. Side effect within the predicates complicate the tasks of determining whether a temporal comparison is actually effective. If the snapshots can't be guaranteed to be faithful representations of the state before and after the test, then you can't trust the tests, and you might as well not have bothered with them in the first place.
Pragmas, or pre-processor messages, are used in expressions to add a temporal ability to the predicates. $before(foo) allows a snapshot of the variable foo to be taken for later comparison with the exit state.
Pragmas may map to identity in some circumstances. Entry states can have no $after() since we haven't reached the exit state yet. Likewise, there is no notion $before() state since this is the before state, and we can't take a snapshot of the state before the before state (unless it's on some other superior method). Therefore, in a precondition it should be illegal to insert pragmas into predicate expressions. In some cases such as $before(foo) in a precondition, and $after(bar) in a post condition the pragmas map onto the current value of the variable. i.e. in a post condition $after(bar) can be transformed into bar.

The algorithm for converting predicates to guard code is shown below.


foreach assembly in assembly list
  foreach type in assembly
    clear list of invariance predicates
    clear list of processed methods
    foreach invariant attribute on type
      transform pragmas on predicate
      add pre/post analysis objs to list
      generate code for predicate
      append code to processedInvariants list
    end foreach invariant
    foreach member in type
      clear list of requires attrs
      foreach requires attr on member
        transform pragmas on predicate
        add pre/post analysis objs to list
        generate code for predicate
        append code to processedRequires list
      end foreach requires attr
      clear list of ensures attrs
      foreach ensures attr on member
        transform pragmas on predicate
        extract list of vars to be snapshot
        add to list processedSnapshots
        add pre/post analysis objs to list
        generate code for predicate
        append code to processedEnsures list
      end foreach ensures attr
      insert snapshots into code template
      insert pre/post analysis objs to template
      insert code for requires guards into template
      insert code for ensures guards into template
      generate code
      add generated code to processed methods list
    end foreach member
    insert code for members into type template
    clear list of invariance predicates
    generate code for type
    append code to processed types list
  end foreach type
  setup c# compiler with config params
  compile all types in processed types list
  store assembly as per instructions
end foreach assembly

This algorithm conveniently glosses over the complexities of the depth first search event propagation mechanism described in previous posts, but the outcome is the same ultimately.

¶ 4/26/2005 04:24:00 PM //

Sunday, April 24, 2005

Predicates and Expressions - Part 1

The predicates that are used to specify the in, out and inv conditions on a component need to follow certain rules. The first and least obvious is that it must follow the syntax and semantic rules of the C# language. I say that it is the least obvious, since it seems to be conventional to invent a new predicate language for these purposes. I can see the point of this, since it would allow a tailor made language to be used, and to extend it beyond the capabilities of a normal C# expression. I intend to start off with the syntax of C# because they will have to end up as C# expressions, and I want to have to intervene in the simplest way possible. I want the intervention to be achievable via regex replacements where possible. It it essential to make one significant enhancement to the expression language of C#. It needs to be temporal. It needs to be able to make statements about the value of a variable before and after a method has been invoked. In a temporal logic that might be represented by a prime (i.e. x'). That could get rather confusing when we are making comparisons with characters that need to be enclosed by single quotes. At the very least it would be hard to read. Likewise we could represent it as a function which can be made available through a supertype or a utility object. An expression might resemble this: "before(x) == after(x) - 1". The problem is that this pollutes the method namespace so we would have to have something like this in the expressions: "base.before(x)". Which is already looking ugly, and it also forces the user to inherit from a supertype that may wreck their application design. It seems that we need pragmas - preprocessor directives that can be used as clues to the framework. Since we are using the syntax of C# we can be sure that neither types nor member names will begin with a dollar sign. It seems safe to assume that we can begin pragmas with a dollar sign like so: "$before(x) == $after(x) + 1". It should not be confused with interpreted scripting language though, this is input for a pre-processor that outputs code, which is subsequently compiled. That applies whether we are performing dynamic code generation. The sharp-eyed of you will notice that in various kinds of predicate the $before(x) and $after(x) are synonymous with x itself. For example it makes no sense to refer to a $after(x) in a requires predicate. Likewise, there isn't much use for an $after() pragma since we are inserting code into the method of a target object. x will already embody the value in $after(x) so the $after() will just be getting in the way. So, we only need a $before(x) pragma to take a snapshot of the value of the variable at the beginning of the method, for later comparison with what the variable becomes. There are some rules that predicates must obey, and we just saw one of them before. In a pre-condition, $before(x) and x are synonymous, and therefore $before() is meaningless, and shouldn't be found there. This sort of rule can be detected and converted into harmless code by the preprocessor. Another hard and fast rule is that predicates must never have side-effects. That is - the state before testing the predicate should be exactly the same as the state after the test. There are circumstances in which that is untrue. If a predicate detects a bug, it may throw an exception, which may cause the stack to unwind, which will have a side effect. But that is a desired side effect that is part of the semantics of design by contract and error detection, as well as being permissible in a constant method (assuming such a thing were possible in C#). What I mean is undesired side-effects - side effects that are not implicit in the error checking framework, and which are not testable by the predicates themselves. There is a need to avoid harmful code that is harder to spot. The predicates should be without side-effects. It's easy enough to spot when someone accidentally inserts a "=" instead of a "==", but if they code a helper method such as "IsOkToDoIt(this)" that happens to change a variable as it runs, then the predicates change from being an external test of correctness into another kind of source code. We need to prevent this - we are interested in making statements about the state of the class, and how it changes from moment to moment. We don't need to be concerned with making changes to it, since that is what the component code is for. There is no way to prevent the predicates from having a side effect. Fullstop! Surprisingly C# doesn't support the ability to define constant methods. Constant methods are a powerful feature in the armoury of the C++ library writer. It makes it possible to assert the fact that a method has no side effects. In C# you are not able to make that kind of assertion. So we will just have to make it known to the DBC-driven programmer that coding predicates that have side-effects is asking for misery.

¶ 4/24/2005 05:28:00 PM //

Wednesday, April 20, 2005

An interesting proposition

As you will recall from the previous post, I have been wondering about how to implement the functionality for dynamic proxies. Well I saw this great article, by Viji Sarathy, on the web, and think that this might be the way to go. It uses Context Bound Objects to ensure the interception of all calls by the CLR without any explicit framework setup by the user. We can guarantee that the contracts get followed, whether the user tries to evade them or not. The only misgiving I have is over the performance characteristics of the thing. Any thoughts?

¶ 4/20/2005 03:04:00 AM //

Tuesday, April 12, 2005

Code Generation Patterns - Part 3

This time I'm going to describe the source snippet that I posted last time. In previous posts I showed the implementation that I use of the Observer pattern to fire off events notifying the code generator about code structures that it has found. The code generator was shown last time, and you can see that there are a whole bunch of event handlers that take note of the code structures and perform code generation tasks as required.

A typical example is the handler NewType that generated code for classes, structures and interfaces. As you will recall from my last post, I am using depth-first recursion in my scanner, to allow my code generator to generate in one pass. That means that there will be a lot of generated code knocking about from all of the invariant predicates, methods, events, fields and properties that were detected before the type event was invoked.

The first thing the CG does is add the namespace of the type to a list of Namespaces that is being kept for addition at the top of the code for the proxy. Obviously the proxy needs to know about the target object. As a general rule you should use the fully qualified name of any type that you refere to in argument lists, return types etc. Readability is not an issue here (very much) but not having to keep track of all the namespace references that could be needed is worth this stricture.

Following the namespace manipulations the generated code is added to the context variables for the NVelocity template. The generated code is kept in the context vars between lines 10 and 21. For common code structures such as assemblies, types and methods the generated code is added to an ArrayList and when the next most senior CG template needs to include them, it just iterates through the list, inserting everything that it finds. The HashTables are an implementation convenience that helps the CG to maintain a unique list of variables that need to have snapshots taken.

After the CG has added all of the snippets of code that have been generated so far, it invokes the template for the type itself. The template looks like this:

namespace ${type.Namespace}.Proxies{
#region Class ${type.FullName}Proxy
public class ${type.Name}Proxy
{
internal ${type.FullName} TheTargetObject = null;
public ${type.Name}Proxy(${type.FullName} target)
{
TheTargetObject = target;
}

#foreach($method in $methods)
$method
#end
#foreach($property in $properties)
$property
#end
#foreach($field in $fields)
$field
#end
#foreach($event in $events)
$event
#end
} // end class ${type.FullName}Proxy
} // end namespace ${type.Namespace}.Proxies
#endregion

As you can see the template is minimal. All of the interesting work has been done in other templates or "Pragma processors". I'll be going into pragma processors in depth next time. Work ahead of me includes making the proxy implement the same interfaces as the target object. In the template above the proxy class is generated with exactly the same name as the target class, but with "Proxy" appended. It has the same namespace with ".Proxies" appended. The proxy stores a reference to the target object called TheTargetObject. TheTargetObject is delegated to when all the predicate checks have been performed. The template for a method is where the action is:

#if(!$utils.IsProperty($method))
#if($utils.IsVoidType($method.ReturnType))
#set($returntype = "void")
// found a void return type
#else
#set($returntype = ${method.ReturnType.FullName})
// didnt find a void return type
#end## if isVoidType

public new $returntype ${method.Name}(#set($comma = "")
#foreach($arg in ${method.GetParameters()})
$comma ${arg.ParameterType.FullName} ${arg.Name}#set($comma = ", ")#end){
// take the 'before' snapshots
#foreach($snapshot in $ssbefore)
$snapshot.TypeName ${snapshot.After} = ${snapshot.Before};
#end

//TODO: Invariant code here
#foreach($inv in $invariants)
$inv
#end
//TODO: Require code here
#foreach($req in $requires)
$req
#end

## now the call the the real object
#if(!$utils.IsVoidType($method.ReturnType))
$returntype result =
#end##if is void

#if($method.IsStatic)
${fullclsname}.${method.Name}(#set($comma = "")
#else
TheTargetObject.${method.Name}(#set($comma = "")
#end
#foreach($arg in ${method.GetParameters()})
$comma${arg.Name}#set($comma = ", ")#end);

// take the 'after' snapshots
#foreach($snapshot in $ssafter)
$snapshot.TypeName ${snapshot.After} = ${snapshot.Before};
#end

//TODO: Ensure code here
#foreach($ens in $ensures)
$ens
#end
//TODO: Invariant code here
#foreach($inv in $invariants)
$inv
#end

#if(!$utils.IsVoidType($method.ReturnType))
return result;
#end##if is void
}#end ## if is not property

The templates are pre-loaded with a "utils" object that is able to perform various jobs related to analyzing types that are beyond the capabilities of NVelocity. We use this to check whether the method is a property or not. Properties in C# are handled as methods in the CLR. If we ignore properties and generate them as methods we lose a lot of the syntactic flexibility that C# makes available.

We need to check then whether the method is void or not. If so, we provide the short form of the void type, since the code will not compile if we use "System.Void". Not sure why that is the case. Answers on a postcode please!

Having determined what the return type is going to be, we can then create the method signature. We are only ever going to get notified about the public methods, since those are all that external users of the proxy or the target object would ever get to see. We don't need to wrap protected, private or internal methods so we can hard-code the public access specifier. Next, the method parameters are inserted. They are a complete copy of the arguments of the target objects, but with the type names expanded.

Now we get to the whole purpose of this exercise - to insert predicate checks into the method body to check parameters, members and other features before and after invocation of the method on the target object. the pattern of the invariants is as follows:

invariant 1
invariant 2
invariant ...
invariant n
require 1
require 2
require ...
require n
TheTargetObject.Method(...)
ensure 1
ensure 2
ensure ...
ensure n
invariant 1
invariant 2
invariant ...
invariant n

Invariants are predicates that must be true at all times - they must be true before and after the method has been called. Therefore we bracket the call with the invariant tests, and have requires checks before and the ensure checks after. For more information on these predicates take a look at the language documentation for Eiffel. Since the invariants are invoked before and after every method call you should make sure that they are not too expensive.

At the end of the method template the script looks again to check whether the method returns a void, and if so skips the returning of a result. It also doesn't bother to collect the result in the first place.

Each of the methods is built this way, converted into a string and then appended to the end of the ProcessedMethods properties in the CG. When the type event is finally emitted, the ProcessedMethods property contains a wrapper for each of the methods that the scanner found. That should be every public method that the target object implements or inherits. Obviously, the invariant properties of the object must be true for all properties, fields, events and methods, so it is not enough just to wrap the methods that have Require and Ensure attributes attached, since the Invariant attributes apply for the whole type at all times.

Next time I'll show you how I convert the assertions made in the invariant, require and ensure attributes into working code that can be inserted into the proxy. I'll leave you with this exercise - How would you convert this into a peice of working code. I'll show you how I do it. Let me know if your approach is better, because I'd be glad to use it!

¶ 4/12/2005 05:00:00 PM //

Saturday, April 09, 2005

Code Generation Patterns - Part 2

Last time I described typed and un-typed events, and multicasting events to several listeners. One of those listeners would be a code generator. This time I'll go on to describe some of the necessary infrastructure that such a design requires. Generally this information is all about tieing the events together. The context in which an event takes place is as important as the event itself.

When you're generating code progressively, you need to keep track of what you have done, and what still needs to be done. In this case that means the code already generated, and the messages already received, but have not generated code for. It also indicates what code structure(s) an incoming predicate applies to.

There are two complementary modes of operation that a scanner/code-generator system can use for broadcasting events. They are variations of tree search algorithms. In the first case, the most senior element is broadcast first. that means that an event would be generated for a type before the events for its methods are fired. I shall call this (for want of a better name) the "depth-last recursive search". The second approach is a true "depth-first search" since elements that are lowest in the object tree are announced sooner. These two modes support different code-generation plan. The choice will have an effect on what sort of state one has to keep, and how long it has to hang around. More on that later.

With depth-first recursion a method event will be broadcast before the corresponding type event , and a predicate such as an Ensure attribute, will be received before the method on which it was attached. Therefore, when you are generating code, you can't just slot what you get into what you already have, because you don't have anywhere to put it.

An object tree. Depth means further from the assembly object.

With a depth first search context variables track the predicates detected till an event for the method is broadcast and we can generate the whole code for the method. We still have to keep the generated code around till we get the type event! Needless to say, we could hunt for such information as the method event arrives. We could use reflection to navigate up the type object graph as far as the assembly if we wanted to. But if we rely too much on that sort of navigation we break the whole broadcast idiom, and morph the algorithm into a depth-last recursive search (which has it's own unique requirements that I'll come onto next).

In the depth-last search the event for the more senior object is fired before that of its subordinates. That means we get the event for the type before that of the method, and the event for the method before that of its predicates. That's helpful because we now have something to hang the subsequent events off of. If you were producing a whole object graph in memory then this would be ideal, since the tree would always be properly connected rather than fractured. This approach is not without its drawbacks, not least of which is that you have to build up an object graph in memory before you can generate the code for it! With depth-first recursion you know that when you get the event for the method that there are no more predicates coming. You know when it's safe to generate code. With the depth last approach you have to send a "finish" message that says when the stream of sub-objects has stopped. On the whole I've found for this project that depth-first navigation of the assembly object graph works fine, and simplifies the event interface of the listeners that I want to use. In other projects I've used this on I've done the opposite, and everything has gone OK, it really depends on the sizes of the data stream, and the performance characteristics required. There are drawbacks with either approach and you should probably decide on the basis of the following criteria:

Is the stream of events large (or never-ending)
Do you need to keep the resultant data structures around in memory
Would single-pass processing be an advantage to you.

The snippet below shows some of the code that I use to process events in the code generator. Much has been chopped out of course, but from this you should see how all the talk about depth-first searches and events translates into code generation.

public class DbcProxyCodeGenerator : DbcSupertype
{
    public DbcProxyCodeGenerator()
    {
        InitialiseTemplates();
    }

    #region Context Variables

    private ArrayList processedAssembly = null;
    private ArrayList processedTypes = null;
    private ArrayList processedMethods = null;
    private ArrayList processedProperties = null;
    private ArrayList processedFields = null;
    private ArrayList processedEvents = null;
    private Hashtable processedSnapshotsBefore = null;
    private Hashtable processedSnapshotsAfter = null;
    private ArrayList processedInvariants = null;
    private ArrayList processedEnsures = null;
    private ArrayList processedRequires = null;
    private Hashtable processedNamespaces;

    public void NewAssembly(object sender,
  NewAssemblyEventArgs e)
    {
        vtAssembly.SetAttr("assembly", e.TheAssembly);
        string[] namespaces = new string[
  ProcessedNamespaces.Keys.Count];
        ProcessedNamespaces.Keys.CopyTo(namespaces, 0);
        vtAssembly.SetAttr("namespaces", namespaces);
        vtAssembly.SetAttr("types", ProcessedTypes);
        ProcessedAssembly.Add(vtAssembly.Merge());
    }

    public void NewType(object sender, NewTypeEventArgs e)
    {
        ProcessedNamespaces.Add(e.TheType.Namespace, null);
        vtType.SetAttr("type", e.TheType);
        vtType.SetAttr("methods", ProcessedMethods);
        vtType.SetAttr("fields", ProcessedFields);
        vtType.SetAttr("properties", ProcessedProperties);
        vtType.SetAttr("events", ProcessedEvents);
        ProcessedTypes.Add(vtType.Merge());
        ProcessedMethods = null;
        ProcessedFields = null;
        ProcessedProperties = null;
        ProcessedEvents = null;
        ProcessedInvariants = null;
    }

    public void NewMethod(object sender, NewMethodEventArgs e)
    {
        vtMethod.SetAttr("method", e.Method);
        vtMethod.SetAttr("invariants", ProcessedInvariants);
        vtMethod.SetAttr("requires", ProcessedRequires);
        vtMethod.SetAttr("ensures", ProcessedEnsures);
        ArrayList beforeSnapshots =
  SnapshotProcessor.GetBeforeSnapshots
           (e.Method as MemberInfo, ProcessedSnapshotsBefore.Keys);
        ArrayList afterSnapshots =
  SnapshotProcessor.GetAfterSnapshots
           (e.Method as MemberInfo, ProcessedSnapshotsAfter.Keys);
        vtMethod.SetAttr("ssbefore", beforeSnapshots);
        vtMethod.SetAttr("ssafter", afterSnapshots);
        ProcessedMethods.Add(vtMethod.Merge());
        ProcessedEnsures = null;
        ProcessedRequires = null;
    }

    public void NewInvariantAttribute(object sender,
  NewInvariantAttributeEventArgs e)
    {
        EnsureSpecification es = DbcPragmaProcessor.
  ProcessEnsure(e.Invariant.Predicate);
        SnapshotProcessor.RegisterSnapshots(es,
  ref this.processedSnapshotsBefore,
  ref this.processedSnapshotsAfter);
        vtInvariant.SetAttr("invariant", es);
        ProcessedInvariants.Add(vtInvariant.Merge());
    }

    public void NewEnsuresAttribute(object sender,
  NewEnsuresAttributeEventArgs e)
    {
        EnsureSpecification es = DbcPragmaProcessor.
  ProcessEnsure(e.Ensure.Predicate);
        SnapshotProcessor.RegisterSnapshots(es,
  ref this.processedSnapshotsBefore,
  ref this.processedSnapshotsAfter);
        vtEnsure.SetAttr("ensure", es);
        ProcessedEnsures.Add(vtEnsure.Merge());
    }
}

Iterative vs final expansion of templates

A seemlingly simple decision like the order in which you notify interested parties leads to significant performance issues that must be known about in advance. To illustrate this, consider the following scenario:

I want to use this framework in a number of different scenarios. I want to be able to statically generate the code for a proxy object that I then use as though it were the real object. That proxy object will enforce all of my rules at runtime. Projects that use my proxies must include the proxy assembly as well as the assembly of the target object. This is the simple case, I also need to be able to dispense with that and use dynamic proxies. Thus situation is one where I use dynamic code generation at runtime. To do that I need to request access to the object through a class factory that will perform code generation of the proxy on-the-fly. In other situations I want the framework to dovetail with aspect oriented programming frameworks like ACA.NET. In that situation the generated code could be either static or dynamic, but the specifics need to abstracted into a configuration file.

As you can see from these requirements our needs can swing between static and dynamic code generation, and we may even want to use both in the same program. Performance must take precedence if we want to use dynamic code generation. Static code generation won't suffer over much if we choose a code generation strategy that is biased towards dynamic code generation since its costs are at compile time, and won't affect runtime performance.

¶ 4/09/2005 12:59:00 AM //