Wednesday, August 20, 2014

Building Object Oriented Frameworks (1)

Object oriented frameworks are a mainstay of modern software development. Whether you develop in Java, C#, Objective-C, Python, Ruby or Javascript, chances are you're basing your development on some sort of application development framework.
Yet, not many of us are familiar with building application frameworks to fulfill business needs in our organizations. This series of posts illustrates object oriented framework development around a simple (though not trivial) application domain.

  • This first post outlines what a framework is, how it is implemented and how it can be used to build concrete applications.
  • The second post zooms into the core of framework design: identifying unchanging (frozen) and changing (hot) spots of the application domain
  • The code for these documents, written in the Xtend programming language, is available at https://github.com/xrrocha/xrecords. Eclipse's Xtend is a modern JVM language whose syntax is highly readable to developers acquainted with the various mainstream object oriented languages.

    What is a framework anyway?

    At is essence, a framework is a foundation for developing a particular type of application.
    A framework captures the expertise needed to solve a particular class of problems. In doing so, it provides pre-written code that you can add to in order to build a concrete application.
    This concept is beautifully illustrated by Apple's OSX documentation in the following allegory:
    Integrate Your Code with the Frameworks
    Unlike an application, a framework is not directly executable. This is so because, for its given application domain, a framework captures what doesn't change and deliberately leaves out what can change. You, the application developer, must provide the bits that change for the framework to become an executable application.
    Because of this frameworks exhibit a property dubbed inversion of control, where it is the framework that calls into your code, not the other way around.

    What a framework is not

    As follows from the above, a framework is not a library. When you make use of a library you decide when and how to call it. In a framework setup, though, the framework is in control; you supply it with your code for it to execute at a time of its choosing.
    As it's often the case in software development, the term framework is somewhat overloaded and is sometimes used with too narrow a meaning. Among web developers, in particular, "framework" has become synonymous with "web application development framework" or "model-view-controller framework". While these development tools are indeed frameworks, the notion of framework as the foundation for a class of applications is much more general.

    Too abstract! Show me an example

    Sure! Mind you, though, frameworks are abstract

    Consider a utility to convert between tabular record formats such as:
    • Relational database tables
    • CSV and delimited files
    • Fixed-length files
    • XBase (DBF) files
    • Flat JSON, XML or Yaml files
    This utility uses Yaml as its configuration format. Thus, for example, the following Yaml script populates a database table from a CSV file:
    source: !csvSource
        input: !fromLocation [data/acme-form4269.csv]
        fields: &myFields [
            { name: tariff, format: !integer },
            { name: desc,   format: !string  },
            { name: qty,    format: !integer },
            { name: price,  format: !double ['#,###.##'] },
            { name: origin, format: !string },
            { name: eta,    format: !date [dd/MM/yyyy] }
        ]
    
    filter: !condition [tariff != 0] # javascript
    
    destination: !databaseDestination
        tableName:  form4269
        columns: *myFields # CSV field names match column names
        dataSource: !!org.postgresql.ds.PGSimpleDataSource
            user: load
            password: load123
            serverName: customs.feudalia.gov
            databaseName: forms
    
    The Yaml parser used here is SnakeYAML. Type tags are based on YamlTag

    What's our application domain?

    Every framework addresses a particular application domain for which it supports building concrete applications.
    Our utility's application domain is that of tabular file format conversion: it transcribes data from one tabular file format to another.
    From: CSVTo: Fixed-length
    We use the term tabular to state our records are flat in nature: we support scalar fields only, with no provisions for arrays or nested records.
    Incidentally, we treat relational database tables as just another format, on par with its flat file cousins.

    What's the domain theory?

    A framework embodies a theory about its problem domain. This is always the case, even when the theory is implied only by the implementation.
    Because frameworks strive to provide the foundation for any application within their stated domain, their theories need to be comprehensive.
    A comprehensive theory about an application domain can only stem from repeated experiences in automating the domain. This is referred to as the Three Examples pattern of framework development.
    Our tabular format conversion domain is simple enough to have a small theory. Within this domain the notion of concrete application corresponds to a program or script performing a specific conversion. In regard to the 3 examples rule, this framework is backed by numerous hand-written, ad-hoc conversions.

    Lingua Franca

    Translating among many formats calls for an intermediate representation to be chosen such that all formats have translations to and from the intermediate representation rather than translations to every other format.
    This reduces the number of translations from n2 - n to a more manageable 2n. In absence of such lingua franca, we'd be in a Babel Tower predicament.
    A suitable intermediate representation for tabular records is the format-agnostic, in-memory map whose keys are field names and whose values are field contents:
    class Record {
        val fields = new HashMap<String, Object>
        . . .
    }
    

    General Theory

    • Every conversion has a source and a destination.
    • The Source reads zero or more records encoded in the input tabular format. As each record is read, it is converted to a Recordrepresentation.
    • The Destination accepts zero or more Records. As each record is accepted, it is converted to the output tabular format and subsequently written.
    • As each record is read it can be filtered to determine whether it should be included in the output or not. The optional component responsible for this selection is referred to as the Filter.
    • Finally, selected records can be transformed so as to conform with the requirements of the given Destination. The optional component responsible for this is referred to as the Transformer.

    The Framework Model

    The above theory is captured in the following class diagram:
    We had previously stated that a framework captures what doesn't change in its domain. In our case, what doesn't change is the general algorithm followed to copy records from a source to a destination:
    // Copier.xtend
    source.open()
    destination.open()
    
    source.forEach [ in |
      if (filter.matches(in)) {
        val out = transformer.transform(in)
        destination.put(out)
      }
    ]
    
    source.close()
    destination.close()
    
    Because this logic doesn't ever change it is referred to as a frozen spot.
    The portions of the aplication that can change are called, correspondingly, hot spots. In our framework they are:
    • Source. Responsible for reading data items and translating them to the intermediate representation
    • Destination. Responsible for translating from the intermediate representation to the output format and writing the result
    • Filter. Responsible for determining whether a given instance of the intermediate representation should be further processed
    • Transformer. Responsible for converting an intermediate representation instance provided by the Source to one suitable for theDestination
    The following class diagram depicts the framework implementation for the database (JDBC) and CSV tabular formats:



    Note how, despite their seemingly opposite natures, Source and Destination share a common superclass for each tabular format.
    Thus, for example, CSVSource (which reads comma-separated files) and CSVDestination (which writes comma-separated files) share common attributes such the separator character, the field-enclosing quote character and whether the file has a header record or not.
    Likewise, the database components JDBCSource and JDBCDestination share a SQL DataSource attribute.
    Note that both Filter and Transformer above have scripting (rather than framework-supplied) implementations. This reflects the fact that logic for record selection and modification is application-specific and, thus, hard to capture in a general, reusable way. Scripting provides a mechanism for developers to pass simple filtering and transformation expressions without having to write framework-aware code. TheFieldRenamingTransformer component, on the other hand, satisfies the commonly occurring need to map input field names to different output field names.

    Framework Instantiation

    The process of extending the framework to turn it into an executable application is called framework instantiation.
    Instantiation for most frameworks require developers to write code extending framework-provided classes and interfaces. Such frameworks are referred to as whitebox frameworks because their internal structure (in terms of classes and interfaces) is visible to application developers.
    Other frameworks (ours included!) provide a repertoire of ready-made components such that framework instantiation no longer requires application code but only framework component configuration. Such frameworks are referred to as blackbox frameworks because their internal implementation is opaque to application developers who are only concerned configuring component instances.
    Blackbox frameworks make it possible to write new applications by wiring pre-existing components like thus:
    source: !databaseSource
         # Column labels are used as field names
        sqlText: |
            SELECT *
            FROM   emp
            ORDER BY deptno, empno
        dataSource: !!org.hsqldb.jdbc.JDBCDataSource
            url:    jdbc:hsqldb:file:hsqldb/example;hsqldb;shutdown=true
            user:   sa
    
    filter: !scriptFilter [sal > 1000] # javascript
    
    transformer: !renameFields # Only named fields are included in output record
        - empno: id
        - ename: name
        - sal: salary
    
    destination: !xbaseDestination
        output:  !outputDestination [data/well-paid-emps.dbf]
        fields: [
            { id, format: !integer },
            { ename, format: !string },
            { sal, format: !double },
        ]
    
    For our framework instantiation we've chosen Yaml to enunciate the application's object graph. For scripting we default to Javascript.
    In addition to Yaml, other forms in blackbox framework instantiation are used.
    In the Java world, in particular, the ever-popular Spring IoC container is frequently used as means of expressing framework instantiation. See the Heritrix settings guide for an example of Spring IoC framework configuration.
    Of course, good ole' source code can be used to enact framework instantiation:
    val copier = new Copier => [
        source = new JDBCRecordSource => [
            sqlText = "SELECT * FROM emp ORDER BY deptno, empno"
            dataSource = new JDBCDataSource => [
                user = "sa"
                url = "jdbc:hsqldb:file:hsqldb/example;hsqldb;shutdown=true"
            ]
        ]
    
        filter = new ScriptingCopierComponent => [
            script = "sal > 1000"
        ]
    
        transformer = new FieldRenamingTransformer => [
            renames = #{
                "empno" -> "id",
                "ename" -> "name",
                "sal"   -> "salary"
            }
        ]
    
        destination = new XBaseRecordDestination => [
            output = new FileLocationOutputStreamProvider => [
                location = "data/well-paid-emps.dbf"
            ]
            fields = #[
                new FormattedField<Integer> => [ parser = new IntegerParser ],
                new FormattedField<String> => [ parser = new StringParser ],
                new FormattedField<Double> => [ parser = new DoubleParser ]
            ]
        ]  
    ]
    

    Conclusion

    The essence of framework design lies in capturing the application logic that doesn't change. Such immutable logic (or frozen spot) expresses the business behavior in terms of one or more abstract components (or hot spots) that model what does change from application to application.
    For each hot spot multiple, alternative implementations may exist. Framework instantiation generally involves selecting what specific hot spot implementations to use as dictated by the application's requirements.
    As we'll see later on, each concrete hot spot implementation can be itself modeled as a framework. This recursive design process pervades framework design and implementation.
    Ideally, frameworks evolve towards a blackbox style where new applications can be created with a much simpler, declarative API. Thus, new applications are built by mixing, matching, wiring and configuring pre-existing components.
    This is the approach illustrated by our tabular format conversion utility. By capturing all the moving parts in its domain, this framework can be fully instantiated as an object graph wiring together existing components to assemble a fully working application capable of performing a specific format-to-format conversion.

    Continue to the next entry: Frozen spots, Hot Spots

    Friday, August 8, 2014

    Type-safe Strings with Scala Macros

    This post presents Scala macros in a basic, introductory fashion. Macros are leveraged to provide compile-time type safety for strings literals subject to a well-defined syntax.

    Embedded Language Strings are Fragile

    As Scala programmers we're often faced with the need to embed typed strings in our otherwise pristine code:
    // xpath
    webDriver findElement By.XPath("//ul/li[@class = 'description']")
    
    // CSS selectors
    webDriver findElements By.CssSelector("ul > li.description")
    
    // sql
    statement executeQuery "SELECT * FROM emp WHERE sal > 1500"
    
    These "islands" of foreign languages are, by necessity, encoded as strings and are therefore fragile. A typo in, say, an XPath expression will only be detected at runtime when XPath compilation fails.

    Macros to the Rescue

    Scala macros are functions executed at compile time to participate in the compilation process.
    When a macro function invocation is found during compilation, the Scala compiler calls the macro implementation passing the abstract syntax tree (AST) corresponding to the actual function arguments. The AST returned by the macro is then inserted in the compiled code in lieu of the macro's original invocation. Neat!
    All we need now is a macro that parses the embedded language string at compile time so as to ensure its correctness. Such a macro may also replace the original string by a compiled representation.

    A Toy Macro

    To illustrate macro implementation at its simplest let's consider the case where the input string is returned verbatim:
    import scala.language.experimental.macros
    import scala.reflect.macros.blackbox.Context
    
    object Macros {
      def noOp(someString: String) = macro noOpImpl
    
      def noOpImpl(c: Context)(someString: c.Expr[String]): c.Expr[String] = {
        someString
      }
    }
    
    The function noOp is declared to be a macro and an implementation mirroring its signature is specified (noOpImpl). The macro implementation's second argument list matches the function arguments in number, name and type. In our case, the String argument someString becomes c.Expr[String].
    Let's not worry about the Context dependent types for now. All that matters is that the macro returns the same AST value passed to it through its someStringargument.

    Validating Emails

    Let's extend our macro to do something meaningful: validate email literals against a regular expression.
    First, let's quickly write a (somewhat clumsy) regular function to validate emails:
    val EmailRegex = """^.*@.*(\.[a-z]{2,3})$""".r
    def checkEmail(address: String) = EmailRegex.pattern.matcher(address).matches
    . . .
    test("Checks against email regex") {
      assert(checkEmail("me@here.net"))
      assert(!checkEmail("missingAtSign.com"))
      assert(!checkEmail("me@missingDomainSuffix"))
      assert(!checkEmail("me@tooShortDomainSuffix.x"))
      assert(!checkEmail("me@tooLongDomainSuffix.abcd"))
    }
    

    Compile-time Email Validation

    We can now write a simple macro that validates email literals passed to it at compile time:
    def email(address: String) = macro emailImpl
    
    def emailImpl(c: Context)(address: c.Expr[String]): c.Expr[String] = {
      import c.universe._
    
      address.tree match {
        case Literal(Constant(text: String)) =>
          if (checkEmail(text)) address // Pass, return unchanged literal
          else c.abort(c.enclosingPosition, s"Invalid email: $text")
        case _ => address // Not a literal, can't validate at compile time
      }
    }
    
    Our macro can validate literal Strings (such as "you@there.net") at compile time. However, expressions such as s"$user@$host" are left unchanged as they depend on values known only at runtime.
    Note the AST corresponding to a literal string has the form Literal(Constant(..)). We deconstruct this form to extract the actual literal value in the textvariable (which we subsequently validate by means of checkEmail()).
    If a string fails validation a compile-time error message will be printed. IDE's such as IntelliJ Idea and Eclipse will show the error message at editing time. Cool!


    Icing on the Cake: String Interpolator

    So far, we've been using a function to validate our email literals. Scala provides a much more legible construct: string interpolators.
    Thus, instead of:
    val myEmail = email("me@here.net")
    
    we could write:
    val myEmail = email"me@here.net"
    
    Ah! This emphasizes the literal nature of the string and greatly improves readability.
    String interpolators are not necessarily related to macros. They're mostly used in their own right to perform string transformation operations.
    Let's say URL-encoding strings becomes a frequent operation. One may then want to write a string interpolator such that, for instance:
    assert(urlEncode"G√ľnther Frager" == "G%C3%BCnther+Frager")
    
    The code needed to achieve this would be:
    import java.net.URLEncoder
    
    object Interpolators {
        implicit class URLEncode(val sc: StringContext) extends AnyVal {
          def urlEncode(args: Any*) = URLEncoder.encode(sc.s(args: _*), "UTF-8")
        }
    }
    

    Interpolator-based Macro

    To turn our email validation macro into a string interpolator we need the following:
    import scala.language.experimental.macros
    import scala.reflect.macros.blackbox.Context
    
    object Macros {
      val EmailRegex = """^.*@.*(\.[a-z]{2,3})$""".r
      def checkEmail(address: String) = EmailRegex.pattern.matcher(address).matches
    
      implicit class EmailBuilder(val sc: StringContext) {
        def email(args: Any*) = macro scEmailImpl
        def email0(args: Any*) = sc.s(args: _*)
      }
    
      def scEmailImpl(c: Context)(args: c.Expr[Any]*): c.Expr[String] = {
        import c.universe._
        c.prefix.tree match {
          case Apply(_, List(Apply(_, List(literal @Literal(Constant(text: String)))))) =>
            if (checkEmail(text)) reify(c.Expr[String](literal).splice)
            else c.abort(c.enclosingPosition, s"Invalid email: $text")
          case compound =>
            val rts = compound.tpe.decl(TermName("email0"))
            val rt = internal.gen.mkAttributedSelect(compound, rts)
            c.Expr[String](Apply(rt, args.map(_.tree).toList))
        }
      }
    }
    
    This is more involved than our previous version because we're no longer dealing with a single string literal but with a potentially multi-part string context.
    Thus, we match the literal string by means of the uncanny AST expression Apply(_, List(Apply(_, List(literal @Literal(Constant(text: String)))))).
    To handle non-literal expressions (such as email"$user@$host") we invoke the non-macro email0 function.

    Conclusion

    Macros are an extremely useful and powerful feature of the Scala language.
    Macro-based String interpolators are extremely useful to ensure embedded language correctness at compile-time. For a simple example of XPath and CSS selector validation for WebDriver check this definition and this example usage.
    For a good introduction to the type of Scala macros we've explored today see Def Macros.