Spelling Suggestion on the JVM #5: Implementing Norvig's Algo in Xtend
- Norvig’s Approach to Spelling Correction
- Implementing Norvig’s Algo in Java
- Implementing Norvig’s Algo in Kotlin
- Implementing Norvig’s Algo in Scala
- Implementing Norvig’s Algo in Xtend (this post)
Functional, tested code in the 4 languages is available on the project’s Spellbound repository.
The Dictionary Definition and Cool Xtend Features
Everything having to do with spelling correction revolves around a curated list of valid words we refer to as the the dictionary.
In our dictionary implementation we associate every word with an integer rank indicating how
frequently the word is used in relation to all other words. The higher the rank, the more
frequently used the word is. Thus, for instance, the has rank 106295
while triose has rank
1
.
The skeletal SpellingCorrector
Xtend class is:
class SpellingCorrector {
val Map<String, Integer> dictionary
new(Map<String, Integer> dictionary) {
checkNotNull(dictionary)
checkArgument(!dictionary.isEmpty)
this.dictionary = dictionary
}
. . .
/**
* Returns {@code Optional<List<String>>} containing a (possibly empty) list of suggestions.
* Returns {@code Optional.empty()} if the given word is not a typo (i.e., if it occurs in the
* dicctionary).
*/
def Optional<List<String>> getCorrections(String word) {
// ...
}
}
Syntactically speaking, Xtend is the closest language to Java. In fact (as opposed to all other “alternative” JVM languages) Xtend compiles to readable Java source code rather than to bytecode. This implies the semantics of Xtend are exactly those of Java. This makes it possible for an experienced Java developer to churn out Xtend code in a matter of days and, at times, even hours!
What’s special about the above skeletal class definition?
val
and var
Field dictionary
is declared as a val
to indicate it is immutable. An immutable field can
only be assigned at declaration site or (once) inside a class constructor. Any attempt to set it to
a new value will be rightfully rejected by the compiler.
Xtend’s val
is equivalent to Java’s final var
. Xtend’s var
is equivalent to Java’s vanilla
var
.
Note we follow the val
keyword with the field’s type (Map<String, Integer>
). Here, we need
to declare the type because initialization takes/place in the constructor’s body.
Xtend, though, features powerful type inference allowing the compiler to discover the type of a value or variable given the value it’s initially assigned to:
// Infers type to be Integer
val answer = 42
// Infers type to be Double
static val PI = 3.141592
Of course, it’s possible to explicitly specify a type, often a more general one:
// Temporal is an interface implemented by LocalDateTime
val Temporal rfn = LocalDateTime.now()
Xtend Constructors: new
Notoriously, Xtend constructors don’t use the same class name but instead are always called new
.
This applies to all class constructors (if there’s more than one, which is infrequent with
immutable classes).
class SpellingCorrector {
val Map<String, Integer> dictionary
new(Map<String, Integer> dictionary) {
checkNotNull(dictionary)
checkArgument(!dictionary.isEmpty)
this.dictionary = dictionary
}
}
Optional Trailing Semi-colons
Refreshingly, Xtend does without those pestering ending semi-colons. Of course, they can be used if desired and can also be used to separate multiple statements in the same line:
val out = new FileWriter("output.txt")
// ...
out.flush; out.close
// ...
Optional Empty Parenthesis
Worth noting, we check for empty maps using the expression !dictionary.isEmpty
instead of the expected, parenthesized Java form !dictionary.isEmpty()
. Also, we say
out.flush; out.close
where Java would demand out.flush(); out.close()
.
This is possible in Xtend because whenever a parameter-less method is invoked the empty parentheses can be (and indeed are generally) omitted.
Some people feel this blurs the distinction between fields and methods. In this case, Xtend is totally OK with keeping the parenthesized form if preferred. This is also the case with Scala. Kotlin, on the contrary, takes the conservative approach of never allowing function parenthesis omission.
In general, Xtend is all about removing Java’s verbosity. Thus, for example, where in Java we’d say
person.getName()
in Xtend we can also say person.name
using first-class property syntax.
This applies to assignment as well: Xtend’s person.name = "John Doe"
is equivalent to Java’s
person.setName("John Doe");
!
Word Splits
As per Norvig’s algorithm, we apply potentially reconstituting edits (delete, transpose, replace, insert) not to typos as such but to their splits. A word’s split is simply its separation into left and right fragments.
Xtend has a commonly used Pair<K, V>
class that could arguably be used to represent this and for
which there is even convenient syntactic sugar. Here, though, we choose to define an explicit class
in order to illustrate a few powerful Xtend constructs.
Thus, to represent word splits as their left and right fragments we declare an inner data class:
WordSplit
:
/**
* Data class holding two segments of a word.
*/
@Data
static class WordSplit {
val String wordLeft
val String wordRight
}
The magical @Data
annotation directs Xtend to generate sensible toString()
, hashCode()
and
equals()
method overrides as well as marking the class as Serializable
.
@Data
also synthesizes an all-field constructor that is omitted from the code as it’s the only
possible game in town. Obviously, all fields must be immutable;
data classes, in general, are meant to be immutable.
Bear in mind Xtend inner classes are always static and the
static
qualifier is required on declaration. If you do need access to the enclosing object you can simply supply it as a constructor parameter:class Outer { //... static class Inner { val Outer outer val String name new(Outer outer, String name) { this.outer = outer this.name = name } } //... def newInner(String someName) { return new Inner(this, someName); } //... }
The wordSplits()
Method
Given the WordSplit
class the implementation of method wordSplits()
in SpellingCorrector
is straightforward:
static def List<WordSplit> wordSplits(String word) {
(0 .. word.length).map [
val left = word.substring(0, it)
val right = word.substring(it)
new WordSplit(left, right)
].toList
}
Note methods are introduced via the def
keyword. def
goes after the method’s scope and
visibility and before the method’s return type.
Note there’s no need to use the return
keyword upon exiting the method. Xtends automatically
returns the last expression in a method’s body or terminating block.
Also -unless otherwise specified- default visibility is public
; another welcome reduction
in verbosity. In general, visibility defaults to public
in most contexts.
We declare this method to be static
since nothing in it depends on SpellingCorrector
instance state.
Overriding methods must be marked as
override
:override def toString() { '''The name is «name.toUppperCase»''' // Look ma: string interpolation! }
Of special interest is Xtend’s syntax for lambdas (as in the above call to map
). Borrowing from
Smalltalk, Xtend encloses lambda bodies in brackets.
This is unique among JVM languages which stick to C’s tradition of using curly braces to enclose
blocks of code.
Here, the range 0 .. word.length
is sweet shorthand for for (var i = 0; i <= word.length; i++)
.
Note the upper range bound is inclusive (0 <= it <= word.length
). This fits our method’s needs
perfectly as substring lengths are (kinda) one-based.
Note, too, that, unlike Java, we don’t need to take the stream()
associated with the collection
expression (though we can if so we choose as illustrated below for edits).
When a lambda has only one parameter its name can be omitted and it defaults to it
. In our lambda
body above, we use it
to refer to each subsequent range element. This is a convention other
languages also adhere to; in the JVM world it
is used by Kotlin and Groovy as well as -somewhat
incidentally- by JRuby.
If we wanted to name our lambda parameter (say as pos
) the above method implementation would look
like:
static def List<WordSplit> wordSplits(String word) {
(0 .. word.length).map [ pos |
val left = word.substring(0, pos)
val right = word.substring(pos)
new WordSplit(left, right)
].toList
}
Like in Smalltalk, a pipe symbol (|
) is used to separate lambda parameter names from lambda
bodies.
Mapping over the range yields an Iterable<WordSplit>
. We convert this iterable to a list by
appending a .toList
parameter-less method to the lambda’s result.
Note, again, there is no need to use a return
statement for the lambda to yield results. The last
expression in the lambda body is its result value (in our case, new WordSplit(left, right)
).