Type-safe Strings with Scala Macros
This post presents Scala’s def
macros in a basic, introductory fashion. Macros are leveraged to provide compile-time type safety for strings literals with a well-defined syntax.
Embedded Language Strings are Fragile
As Scala programmers we’re often faced with the need to embed typed strings in our otherwise pristine code:
// xpath
webDriver findElement By.XPath("//ul/li[@class = 'description']")
// CSS selectors
webDriver findElements By.CssSelector("ul > li.description")
// sql
statement executeQuery "SELECT * FROM emp WHERE sal > 1500"
These “islands” of foreign languages are, by necessity, encoded as strings and are therefore fragile. A typo in, say, an XPath expression will only be detected at runtime when XPath compilation fails.
Macros to the Rescue
Scala macros are functions executed at compile time to participate in the compilation process.
When a macro function invocation is found during compilation, the Scala compiler calls the macro implementation passing the abstract syntax tree (AST) corresponding to the actual function arguments. The AST returned by the macro is then inserted in the compiled code in lieu of the macro’s original invocation. Neat!
All we need now is a macro that parses the embedded language string at compile time so as to ensure its correctness. Such a macro may also replace the original string by a compiled representation.
A Toy Macro
To illustrate macro implementation at its simplest let’s consider the case where the input string is returned verbatim:
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
object Macros {
def noOp(someString: String) = macro noOpImpl
def noOpImpl(c: Context)(someString: c.Expr[String]): c.Expr[String] = {
someString
}
}
The function noOp
is declared to be a macro and an implementation mirroring its signature is specified
(noOpImpl
). The macro implementation’s second argument list matches the function arguments in number, name and type. In our case, the String
argument someString
becomes c.Expr[String]
.
Let’s not worry about the Context
dependent types for now. All that matters is that the macro returns the same AST value passed to it through its someString
argument.
Validating Emails
Let’s extend our macro to do something meaningful: validate email literals against a regular expression.
First, let’s quickly write a (somewhat clumsy) regular function to validate emails:
val EmailRegex = """^.*@.*(\.[a-z]{2,3})$""".r
def checkEmail(address: String) = EmailRegex.pattern.matcher(address).matches
// stuff...
test("Checks against email regex") {
assert(checkEmail("me@here.net"))
assert(!checkEmail("missingAtSign.com"))
assert(!checkEmail("me@missingDomainSuffix"))
assert(!checkEmail("me@tooShortDomainSuffix.x"))
assert(!checkEmail("me@tooLongDomainSuffix.abcdefghijk"))
}
Compile-time Email Validation
We can now write a simple macro that validates email literals passed to it at compile time:
def email(address: String) = macro emailImpl
def emailImpl(c: Context)(address: c.Expr[String]): c.Expr[String] = {
import c.universe._
address.tree match {
case Literal(Constant(text: String)) =>
if (checkEmail(text)) address // Pass, return unchanged literal
else c.abort(c.enclosingPosition, s"Invalid email: $text")
case _ => address // Not a literal, can't validate at compile time
}
}
Our macro can validate literal Strings (such as "you@there.net
”) at compile time. However, expressions such as s"$user@$host"
are left unchanged as they depend on values known only at runtime.
Note the AST corresponding to a literal string has the
form Literal(Constant(..))
. We deconstruct this form to extract the actual literal value in the text
variable (which we subsequently validate by means of checkEmail()
).
If a string fails validation a compile-time error message will be printed. IDE’s such as IntelliJ Idea and Eclipse will show the error message at editing time. Cool!
Icing on the Cake: String Interpolator
So far, we’ve been using a function to validate our email literals. Scala provides a much more legible construct: string interpolators.
Thus, instead of:
val myEmail = email("me@here.net")
we could write:
val myEmail = email"me@here.net"
Ah! This emphasizes the literal nature of the string and greatly improves readability.
String interpolators are not necessarily related to macros. They’re mostly used in their own right to perform string transformation operations.
Let’s say URL-encoding strings becomes a frequent operation. One may then want to write a string interpolator such that, for instance:
assert(urlEncode"Günther Frager" == "G%C3%BCnther+Frager")
The code needed to achieve this would be:
import java.net.URLEncoder
object Interpolators {
implicit class URLEncode(val sc: StringContext) extends AnyVal {
def urlEncode(args: Any_*) = URLEncoder.encode(sc.s(args: _*), "UTF-8")
}
}
Interpolator-based Macro
To turn our email validation macro into a string interpolator we need the following:
import scala.language.experimental.macros
import scala.reflect.macros.blackbox.Context
object Macros {
val EmailRegex = """^.*@.*(\.[a-z]{2,3})$""".r
def checkEmail(address: String) = EmailRegex.pattern.matcher(address).matches
implicit class EmailBuilder(val sc: StringContext) {
def email(args: Any*) = macro scEmailImpl
def email0(args: Any*) = sc.s(args: _*)
}
def scEmailImpl(c: Context)(args: c.Expr[Any]*): c.Expr[String] = {
import c.universe._
c.prefix.tree match {
case Apply(_, List(Apply(_, List(literal @Literal(Constant(text: String)))))) =>
if (checkEmail(text)) reify(c.Expr[String](literal).splice)
else c.abort(c.enclosingPosition, s"Invalid email: $text")
case compound =>
val rts = compound.tpe.decl(TermName("email0"))
val rt = internal.gen.mkAttributedSelect(compound, rts)
c.Expr[String](Apply(rt, args.map(_.tree).toList))
}
}
}
This is more involved than our previous version because we’re no longer dealing with a single string literal but with a potentially multi-part string context.
Thus, we match the literal string by means of the uncanny AST
expression Apply(_, List(Apply(_, List(literal @Literal(Constant(text: String))))))
.
To handle non-literal expressions (such as email"$user@$host"
) we invoke the non-macro email0
function.
Conclusion
Macros are an extremely useful and powerful feature of the Scala language.
Macro-based String interpolators are extremely useful to ensure embedded language correctness at compile-time. For a simple example of XPath and CSS selector validation for WebDriver check this definition and this example usage.
For the official introduction to Scala def
macros we’ve explored today see Def
Macros.