Ricardo Rocha's Website

Musings on Programming and Programming Languages

Email GitHub Twitter LinkedIn

Spelling Suggestion on the JVM #5: Implementing Norvig's Algo in Xtend

This tutorial project implements a basic spelling suggestion service. The motivation is didactic: becoming familiar with “better java” languages around a simple but instructional example. Ease of implementation outranks optimal performance so readers can focus on the JVM languages as such. Examples and new concepts are introduced in Java 10 first so they’re immediately understandable to the experienced Java programmer. Crucially, the Java implementation is accompanied by equivalent (but idiomatic) implementations in today’s most relevant alternative JVM languages: Kotlin, Scala and Xtend. Code is available at Spellbound’s Github repo.

This fifth and last post presents an idiomatic Xtend implementation of Norvig’s spelling corrector.

The previous post (Implementing Norvig’s Algo in Scala) illustrates how to implement Norvig’s spelling corrector in Scala following a functional style.

The Dictionary

Everything having to do with spelling correction revolves around a curated list of valid words we refer to as the the dictionary.

In our dictionary implementation we associate every word with an integer rank indicating how frequently the word is used in relation to all other words. The higher the rank, the more frequently used the word is. Thus, for instance, the has rank 106295 while triose has rank 1.

The dictionary declaration for our Xtend SpellingCorrector class is:

class SpellingCorrector {

  val Map<String, Integer> dictionary

  new(Map<String, Integer> dictionary) {
    checkNotNull(dictionary);
    checkArgument(!dictionary.isEmpty);
    this.dictionary = dictionary
  }
  
  . . .
}

Syntactically speaking, Xtend is the closest language to Java. In fact ( as opposed to all other “alternative” JVM languages) Xtends compiles to readable Java source code rather than to bytecode. This implies the semantics of Xtend are exactly those of Java. This makes it possible for an experienced Java developer to churn out Xtend code in a matter of days and, at times, even hours!

What’s special in the above skeletal class definition?

Field dictionary is declared as a val to indicate it is immutable. An immutable field can only be assigned at declaration site or inside a class constructor. Thus, Xtend’s val is equivalent to Java’s final var.

Notoriously, Xtend constructors don’t use the same class name but instead are always called new. This applies to all class constructors (if there’s more than one, which is infrequent).

Also worth noting, we check for empty maps using !dictionary.isEmtpy instead of the expected Java form !dictionary.isEmpty(). This is possible in Xtend because whenever a parameter-less method is invoked the empty parentheses can be (and are generally) omitted.

In general, Xtend is all about removing Java’s verbosity. Thus, for example, where in Java we’d say person.getName() in Xtend it’s also possible to say person.name using first-class property syntax. This applies to assignment as well; person.name = "John Doe" is equivalent to Java’s person.setName("John Doe").

The wordSplits() method

As per Norvig’s algorithm, we apply potentially reconstituting edits (delete, transpose, replace, insert) not to typos as such but to their splits.

In order to represent word splits as their left and right parts we declare an inner data class WordSplit:

/**
 * Data class holding two segments of a word.
 */
@Data
static class WordSplit {
  val String wordLeft
  val String wordRight
}

The magical @Data annotation hints Xtend to generate sensible toString(), hashCode() and equals() implementations as well as marking the class as Serializable.

@Data also synthesizes an all-field constructor that is omitted from the code as it’s the only possible game in town. Obviously, all fields must be immutable; data classes, in general, are meant to be immutable.

Bear in mind Xtend inner classes are always static. If you need access to the enclosing object you must explicitly supply it as a field. No big deal here…

Given the WordSplit class the implementation of method wordSplits() in SpellingCorrector is straightforward:

static def List<WordSplit> wordSplits(String word) {
  (0 .. word.length).map [
    val left = word.substring(0, it)
    val right = word.substring(it)
    new WordSplit(left, right)
  ].toList
}

Note methods are introduced via the def keyword. Unless otherwise specified their default visibility is public (another welcome reduction in verbosity).

We declare this method to be static as nothing in it depends on SpellingCorrector state.

Of special interest is Xtend’s syntax for lambdas (as in the above call to map). Borrowing from Smalltalk, Xtend encloses lambda bodies in brackets. This is unique among JVM languages which stick to C’s use of curly braces to enclose blocks of code.

The range 0 .. word.length is sweet shorthand for for (var i = 0; i <= word.length; i++).

Note the upper range bound is inclusive (0 <= it <= word.length). This fits our method’s needs perfectly as sub-string indices are off by one.

Note, too, that -unlike Java- we don’t need to take the stream() associated with the collection expression (though we can if so choose as illustrated below).

When a lambda has only one parameter its name can be omitted so it defaults to it. In our lambda body above, we use it to refer to each subsequent range element. This is a convention other languages also adhere to; in the JVM world it is used by Kotlin and Groovy as well as -somewhat indirectly- JRuby.

If we anted to name our variable to say, pos, the above method implementation would look like:

static def List<WordSplit> wordSplits(String word) {
  (0 .. word.length).map [ pos |
    val left = word.substring(0, pos)
    val right = word.substring(pos)
    new WordSplit(left, right)
  ].toList
}

Like in Smalltalk, a pipe symbol (|) is used to separate the lambda parameter names from the lambda’s body.

Mapping over the range yields an Iterable<WordSplit>. We convert this iterable to a list by appending the .toList parameter-less method to the lambda’s result.

Note there is no need to use a return statement for the lambda to yield results. The last expression in the lambda body is its result value (in our case, new WordSplit(left, right)).

Final words about Xtend


comments powered by Disqus