Spelling Suggestion on the JVM #5: Implementing Norvig's Algo in Xtend

Wed, Aug 15, 2018 - Read in 7 Min

This tutorial project implements a basic spelling suggestion service with a didactic motivation: becoming familiar with “better Java” languages around a simple but instructional example (Norvig’s approach to spelling correction). Norvig’s algo is first implemented in Java 10. Subsequently, idiomatic implementations are presented in Kotlin, Scala and Xtend. The posts in this series are:

Norvig’s Approach to Spelling Correction
Implementing Norvig’s Algo in Java
Implementing Norvig’s Algo in Kotlin
Implementing Norvig’s Algo in Scala
Implementing Norvig’s Algo in Xtend (this post)

Functional, tested code in the 4 languages is available on the project’s Spellbound repository.

The Dictionary Definition and Cool Xtend Features

Everything having to do with spelling correction revolves around a curated list of valid words we refer to as the the dictionary.

In our dictionary implementation we associate every word with an integer rank indicating how frequently the word is used in relation to all other words. The higher the rank, the more frequently used the word is. Thus, for instance, the has rank 106295 while triose has rank 1.

The skeletal SpellingCorrector Xtend class is:

class SpellingCorrector {
  val Map<String, Integer> dictionary
  
  new(Map<String, Integer> dictionary) {
    checkNotNull(dictionary)
    checkArgument(!dictionary.isEmpty)
    this.dictionary = dictionary
  }  
  . . .
  /**
   * Returns {@code Optional<List<String>>} containing a (possibly empty) list of suggestions.
   * Returns {@code Optional.empty()} if the given word is not a typo (i.e., if it occurs in the
   * dicctionary).
   */
  def Optional<List<String>> getCorrections(String word) {
    // ...
  }
}

Syntactically speaking, Xtend is the closest language to Java. In fact (as opposed to all other “alternative” JVM languages) Xtend compiles to readable Java source code rather than to bytecode. This implies the semantics of Xtend are exactly those of Java. This makes it possible for an experienced Java developer to churn out Xtend code in a matter of days and, at times, even hours!

What’s special about the above skeletal class definition?

`val` and `var`

Field dictionary is declared as a val to indicate it is immutable. An immutable field can only be assigned at declaration site or (once) inside a class constructor. Any attempt to set it to a new value will be rightfully rejected by the compiler.

Xtend’s val is equivalent to Java’s final var. Xtend’s var is equivalent to Java’s vanilla var.

Note we follow the val keyword with the field’s type (Map<String, Integer>). Here, we need to declare the type because initialization takes/place in the constructor’s body.

Xtend, though, features powerful type inference allowing the compiler to discover the type of a value or variable given the value it’s initially assigned to:

// Infers type to be Integer
val answer = 42
// Infers type to be Double
static val PI = 3.141592

Of course, it’s possible to explicitly specify a type, often a more general one:

// Temporal is an interface implemented by LocalDateTime
val Temporal rfn = LocalDateTime.now()

Xtend Constructors: `new`

Notoriously, Xtend constructors don’t use the same class name but instead are always called new. This applies to all class constructors (if there’s more than one, which is infrequent with immutable classes).

class SpellingCorrector {
  val Map<String, Integer> dictionary  
  new(Map<String, Integer> dictionary) {
    checkNotNull(dictionary)
    checkArgument(!dictionary.isEmpty)
    this.dictionary = dictionary
  }  
}

Optional Trailing Semi-colons

Refreshingly, Xtend does without those pestering ending semi-colons. Of course, they can be used if desired and can also be used to separate multiple statements in the same line:

val out = new FileWriter("output.txt")
// ...
out.flush; out.close
// ...

Optional Empty Parenthesis

Worth noting, we check for empty maps using the expression !dictionary.isEmpty instead of the expected, parenthesized Java form !dictionary.isEmpty(). Also, we say out.flush; out.close where Java would demand out.flush(); out.close().

This is possible in Xtend because whenever a parameter-less method is invoked the empty parentheses can be (and indeed are generally) omitted.

Some people feel this blurs the distinction between fields and methods. In this case, Xtend is totally OK with keeping the parenthesized form if preferred. This is also the case with Scala. Kotlin, on the contrary, takes the conservative approach of never allowing function parenthesis omission.

In general, Xtend is all about removing Java’s verbosity. Thus, for example, where in Java we’d say person.getName() in Xtend we can also say person.name using first-class property syntax. This applies to assignment as well: Xtend’s person.name = "John Doe" is equivalent to Java’s person.setName("John Doe");!

Word Splits

As per Norvig’s algorithm, we apply potentially reconstituting edits (delete, transpose, replace, insert) not to typos as such but to their splits. A word’s split is simply its separation into left and right fragments.

Xtend has a commonly used Pair<K, V> class that could arguably be used to represent this and for which there is even convenient syntactic sugar. Here, though, we choose to define an explicit class in order to illustrate a few powerful Xtend constructs.

Thus, to represent word splits as their left and right fragments we declare an inner data class: WordSplit:

/**
 * Data class holding two segments of a word.
 */
@Data
static class WordSplit {
  val String wordLeft
  val String wordRight
}

The magical @Data annotation directs Xtend to generate sensible toString(), hashCode() and equals() method overrides as well as marking the class as Serializable.

@Data also synthesizes an all-field constructor that is omitted from the code as it’s the only possible game in town. Obviously, all fields must be immutable; data classes, in general, are meant to be immutable.

Bear in mind Xtend inner classes are always static and the static qualifier is required on declaration. If you do need access to the enclosing object you can simply supply it as a constructor parameter:
class Outer {
  //...
  static class Inner {
    val Outer outer
    val String name
    new(Outer outer, String name) {
      this.outer = outer
      this.name = name
    }
  }
  //...
  def newInner(String someName) {
    return new Inner(this, someName);
  }
  //...
}

The `wordSplits()` Method

Given the WordSplit class the implementation of method wordSplits() in SpellingCorrector is straightforward:

static def List<WordSplit> wordSplits(String word) {
  (0 .. word.length).map [
    val left = word.substring(0, it)
    val right = word.substring(it)
    new WordSplit(left, right)
  ].toList
}

Note methods are introduced via the def keyword. def goes after the method’s scope and visibility and before the method’s return type.

Note there’s no need to use the return keyword upon exiting the method. Xtends automatically returns the last expression in a method’s body or terminating block.

Also -unless otherwise specified- default visibility is public; another welcome reduction in verbosity. In general, visibility defaults to public in most contexts.

We declare this method to be static since nothing in it depends on SpellingCorrector instance state.

Overriding methods must be marked as override:

override def toString() {
  '''The name is «name.toUppperCase»''' // Look ma: string interpolation!
}

Of special interest is Xtend’s syntax for lambdas (as in the above call to map). Borrowing from Smalltalk, Xtend encloses lambda bodies in brackets. This is unique among JVM languages which stick to C’s tradition of using curly braces to enclose blocks of code.

Here, the range 0 .. word.length is sweet shorthand for for (var i = 0; i <= word.length; i++).

Note the upper range bound is inclusive (0 <= it <= word.length). This fits our method’s needs perfectly as substring lengths are (kinda) one-based.

Note, too, that, unlike Java, we don’t need to take the stream() associated with the collection expression (though we can if so we choose as illustrated below for edits).

When a lambda has only one parameter its name can be omitted and it defaults to it. In our lambda body above, we use it to refer to each subsequent range element. This is a convention other languages also adhere to; in the JVM world it is used by Kotlin and Groovy as well as -somewhat incidentally- by JRuby.

If we wanted to name our lambda parameter (say as pos) the above method implementation would look like:

static def List<WordSplit> wordSplits(String word) {
  (0 .. word.length).map [ pos |
    val left = word.substring(0, pos)
    val right = word.substring(pos)
    new WordSplit(left, right)
  ].toList
}

Like in Smalltalk, a pipe symbol (|) is used to separate lambda parameter names from lambda bodies.

Mapping over the range yields an Iterable<WordSplit>. We convert this iterable to a list by appending a .toList parameter-less method to the lambda’s result.

Note, again, there is no need to use a return statement for the lambda to yield results. The last expression in the lambda body is its result value (in our case, new WordSplit(left, right)).

Ricardo Rocha's Website

Musings on Programming and Programming Languages

Spelling Suggestion on the JVM #5: Implementing Norvig's Algo in Xtend

The Dictionary Definition and Cool Xtend Features

`val` and `var`

Xtend Constructors: `new`

Optional Trailing Semi-colons

Optional Empty Parenthesis

Word Splits

The `wordSplits()` Method

Final words about Xtend

Ricardo Rocha's Website

Musings on Programming and Programming Languages

Spelling Suggestion on the JVM #5: Implementing Norvig's Algo in Xtend

The Dictionary Definition and Cool Xtend Features

val and var

Xtend Constructors: new

Optional Trailing Semi-colons

Optional Empty Parenthesis

Word Splits

The wordSplits() Method

Final words about Xtend

`val` and `var`

Xtend Constructors: `new`

The `wordSplits()` Method