Skip to content

How Do You Apply Polyglotism?

18
Aug
2008

For the past two years or so, there has been an increasing meme across the developer blogosphere encouraging the application of the polyglot methodology.  For those of you who have been living under a rock, the idea behind polyglot programming is that each section of a given project should use whatever language happens to be most applicable to the problem in question.  This makes for a great topic for arm-chair bloggers, leading to endless pontification and flame-wars on forum after forum, but it seems to be a bit more difficult to apply in the real world.

The fact is that very few companies are open to the idea of diversity in language selection.  Just look at Google, one of the most open-minded and developer-friendly companies around.  They employ some of the smartest people I know, programmers who have actually invented languages with wide-scale adoption.  However, this same company mandates the use of a very small set of languages including Python, Java, C++ and JavaScript.  If a company like Google can’t even bring itself to dabble in language diversity, what hope do we have for the Apples of the world?

A few months ago, I received an internal email from the startup company where I work.  This email was putting forth a new policy which would restrict all future developments to one of two languages: PHP or Java.  In fact, this policy went on to push for the eventual rewrite of all legacy projects which had been written in other languages including Objective-C, Ruby, Python and a fair number of shell scripts.  I was utterly flabbergasted (to say the least).  A few swift emails later, we were able to come to a more moderate position, but the prevailing attitude remains extremely focused on minimizing the choice of languages.

To my knowledge, this sort of policy is fairly common in the industry.  Companies (particularly those employing consultants) seem to prefer to keep the technologies employed to a minimum, focusing on the least-common denominator so as to reduce the requirements for incoming developer skill sets.  This is rather distressing to me, because I get a great deal of pleasure out of solving problems differently using alternative languages.  For example, I would have loved to build the clustering system at my company using the highly-scalable actor model with Scala, but the idea was shot down right out of the gate because it involved a non-mainstream language.  To be fair to my colleagues, the overall design involved was given more serious consideration, but it was always within the confines of Java, rather than the original actor-driven concept.

There is actually another aspect to this question: assuming you are allowed to use a variety of languages to "get the job done", how do you apply them?  Ola Bini has talked about the various layers of a system, but this is harder to see in practice than it would seem.  How do you define where to "draw the line" between using Java and Scala, or even the more dramatic differences between Java and JRuby or Groovy?  Of course, we can base our decision strictly on lines of code, but in that case, Scala would trump Java every time.  For that matter, Ruby would probably beat out the two of them, and I’m certainly not writing my next large-scale enterprise app exclusively in a dynamic language.

I realize this is somewhat of a cop-out post, just asking a question and never arriving at a satisfactory conclusion, but I would really like to know how other developers approach this issue.  What criteria do you weigh in making the decision to go with a particular language?  What sorts of languages work well for which tasks?  And above all, how do you convince your boss that this is the right way to go?  The floor is open, please enlighten me!  :-)

Case Classes Are Cool

11
Aug
2008

Of all of Scala’s many features, this one has probably taken the most flack over the past year or so.  Not immutable data structures or even structural types, but rather a minor variation on a standard object-oriented construct.  This is more than a little surprising, especially considering how much work they can save when properly employed.

Quick Primer

Before we get into why they’re so nice, we should probably look at what they are and how to use them.  Syntactically, case classes are standard classes with a special modifier: case.  This modifier signals the compiler to assume certain things about the class and to define certain boiler-plate based on those assumptions.  Specifically:

  • Constructor parameters become public “fields” (Scala-style, which means that they really just have an associated accessor/mutator method pair)
  • Methods toString(), equals() and hashCode() are defined based on the constructor fields
  • A companion object containing:
    • An apply() constructor based on the class constructor
    • An extractor based on constructor fields

What this means is that we can write code like the following:

case class Person(firstName: String, lastName: String)
 
val me = Person("Daniel", "Spiewak")
val first = me.firstName
val last = me.lastName
 
if (me == Person(first, last)) {
  println("Found myself!")
  println(me)
}

The output of the above is as follows:

Found myself!
Person(Daniel,Spiewak)

Notice that we’re glossing over the issue of pattern matching and extractors for the moment.  To the regular-Joe object-oriented developer, the really interesting bits are the equals() method and the automatic conversion of the constructor parameters into fields.  Considering how many times I have built “Java Bean” classes solely for the purpose of wrapping data up in a nice neat package, it is easy to see where this sort of syntax sugar could be useful.

However, the above does deserve some qualification: the compiler hasn’t actually generated both the accessors and the mutators for the constructor fields, only the accessors.  This comes back to Scala’s convention of “immutability first”.  As we all know, Scala is more than capable of expressing standard imperative idioms with all of their mutable gore, but it tries to encourage the use of a more functional style.  In a sense, case classes are really more of a counterpart to type constructors in languages like ML or Haskell than they are to Java Beans.  Nevertheless, it is still possible to make use of the syntax sugar provided by case classes without giving up mutability:

case class Person(var firstName: String, var lastName: String)
 
val me = Person("Daniel", "Spiewak")
me.firstName = "Christopher"   // call to a mutator

By prefixing each constructor field with the var keyword, we are effectively instructing the compiler to generate a mutator as well as an accessor method.  It does require a bit more syntactic bulk than the immutable default, but it also provides more flexibility.  Note that we may also use this var-prefixed parameter syntax on standard classes to define constructor fields, but the compiler will only auto-generate an equals() (as well as hashCode() and toString()) method on a case class.

Why Are They Useful?

All of this sounds quite nice, so why are case classes so overly-maligned?  Cedric Beust, the creator of the TestNG framework, even went so far as to call case classes “…a failed experiment”.

From my understanding of Scala’s history, case classes were added in an attempt to support pattern matching, but after thinking about the consequences of the points I just gave, it’s hard for me to see case classes as anything but a failure. Not only do they fail to capture the powerful pattern matching mechanisms that Prolog and Haskell have made popular, but they are actually a step backward from an OO standpoint, something that I know Martin [Odersky] feels very strongly about and that is a full part of Scala’s mission statement.

Well, he’s right…at least as far as the pattern matching bit is involved.  Case classes are almost essential for useful pattern matching.  I say “almost” because it is possible to have pattern matching in Scala without ever using a single case class, thanks to the powerful extractors mechanism.  Case classes just provide some nice, auto-generated magic to speed things along, as well as allowing the compiler to do a bit more checking than would be otherwise possible.

The point that I think Cedric (and others) have missed entirely is that case classes are far more than just a means to get at pattern matching.  Even the most stringent object-oriented developer has to admit that a slick syntax for declaring a data container (like a bean) would be a nice thing to have.  What’s more, Scala’s automatic generation of a companion object for every case class lends itself very nicely to some convenient abstractions.  Consider a scenario I ran into a few months back:

class MainWindow(parent: Shell) extends Composite(parent, SWT.NONE) {
  private lazy val display = parent.getDisplay
 
  private val panels = Map("Foreground" -> ForegroundPanel, 
                           "Background" -> BackgroundPanel, 
                           "Font" -> FontPanel)
 
  setLayout(new FillLayout())
 
  val folder = new TabFolder(this, SWT.BORDER)
  for ((text, make) <- panels) {
    val item = new TabItem(folder, SWT.NONE)
    val panel = make(folder)
 
    item.setText(text)
    item.setControl(panel)
  }
 
  def this() = this(new Shell(new Display()))
 
  def open() {
    parent.open()
    layout()
 
    while (!parent.isDisposed) {
      if (!display.readAndDispatch()) {
        display.sleep()
      }
    }
  }
}
 
case class ForegroundPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}
 
case class BackgroundPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}
 
case class FontPanel(parent: Composite) extends Composite(parent, SWT.NONE) {
  ...
}

If you ignore the SWT boiler-plate, the really interesting bits here are the Map of panels and the initialization loop for the TabItem(s).  In essence, I am making use of a cute little trick with the companion objects of each of the panel case classes.  These objects are automatically generated by the compiler extending function type: (Composite)=>ForegroundPanel, where ForegroundPanel is replaced by the case class in question.  Because each of these classes extends Composite, the inferred type of panels will be: Map[String, (Composite)=>Composite](actually, I’m cheating a bit and not giving the precise inference, only its effective equivalent)

This definition allows the iteration over the elements of panels, generating a new instance by using the value element as a function taking a Composite and returning a new Composite instance: the desired child panel.  It’s all statically typed without giving up either the convenience of a natural configuration syntax (in the panels declaration) or the familiarity of a class definition for each panel.  This sort of thing would certainly be possible without case classes, but more work would be required on my part to properly declare each companion object by hand.

Conclusion

I think the reason that a lot of staid object-oriented developers tend to frown on case classes is their close connection to pattern matching, a more powerful relative of the much-despised switch/case mechanism.  What these developers fail to realize is that case classes are really much more than that, freeing us from the boiler-plate tyranny of endless getter/setter declarations and the manual labor of proper equals() and toString() methods.  Case classes are the object-oriented developer’s best friend, just no one seems to realize it yet.

Scala Collections for the Easily Bored Part 3: All at Once

4
Aug
2008

In the previous installment of this series, we looked at how Scala’s collections provide common mechanisms for iteration, as well as many higher-order operations in the same conceptual vein.  In this, the third and final part of the series, we will examine some mechanisms for conceptually operating on the collection as a whole.  Thus, rather than transforming individual collection elements, we will be looking at ways to inspect and modify the data structure itself.

Filtration

Stepping out of Scala for a moment, let’s consider the common design paradigm of the relational database.  Imagine that we have defined a table, people, which contains hundreds of thousands of records, assembled by some despotic government.  Now perhaps the secret service in this government wants to dispatch legal enforcers to the residences of all adults with the “wrong” political leaning.  The most natural way to accomplish this task would be to perform an SQL SELECT, filtering by age and politics:

SELECT * FROM people WHERE age > 18 AND affiliation = 'Green Party'

This query would of course return a data set which could be iterated over, performing the necessary operations (in this case, incarceration) for each record.  As it turns out, this sort of use-case is not confined solely to databases.  It is often necessary to restrict or filter a data structure based on certain criteria.  As a trivial example, imagine that we have been passed a List of Int(s) and we want to remove all odd numbers.  We could accomplish this by using the filter method and passing a function value to describe the criterion:

def onlyEven(nums: List[Int]) = {
  nums.filter((n) => n % 2 == 0)
}

If we call the onlyEven method, passing List(1, 2, 3, 4, 5), the result will be List(2, 4), since 2 and 4 are the only numbers into which 2 divides evenly.  As with many of the other common collection operations, Scala includes a syntax sugar for filter.  We can rewrite the previous example using for-comprehensions in the following way:

def onlyEven(nums: List[Int]) = {
  for {
    n <- nums
    if n % 2 == 0
  } yield n
}

This is a little different from the for-comprehensions we have seen already in that we have placed the yield statement outside the braces with the comprehension definition within.  Believe it or not, this syntax is perfectly valid, coming from Haskell’s do-notation.  This construct will be translated at compile time into a corresponding invocation of filter (and map, but that isn’t terribly relevant here) prior to type checking.  This sort of notation can be very convenient for many tasks, similar to LINQ in the .NET languages.

Partition

In the previous example, we looked at how to selectively remove elements from a collection based on a single criterion.  However, sometimes the requirement is not to remove elements, but rather to separate them into different collections.  For example, we might not want to simply throw away all odd elements in a list, it might be useful to keep both even and odd lists on hand for further operation.  This can be accomplished using the partition method:

def separateEven(nums: List[Int]) = {
  nums.partition((n) => n % 2 == 0)
}

You will notice that the signature for partition looks remarkably similar to filter.  This uniformity in the API is by design, making usage both easier to remember and reducing the number of changes necessary to switch between similar operations.  However, despite the similarity in dispatch, partition returns a very different result:

val numbers = List(1, 2, 3, 4, 5, 6, 7)
val sep = separateEven(numbers)
 
sep == (List(2, 4, 6), List(1, 3, 5, 7))    // true

Literally, the partition method splits elements into two different lists, based on the boolean value of the given predicate (in this case, even or odd).  These lists are returned as a tuple, Scala’s idiom for returning multiple values from a single method.

Searching

Having the capability to filter and split an entire list is all well and good, but it is perhaps even more common to need to find a specific element within a collection.  This is most often seen when dealing with sets or maps, but it can also be useful in the context of linear structures such as list and array.  A simple example might be a trivial caching mechanism for database lookups:

val cache = new HashMap[Int, String]
 
def readData(id: Int) = {
  if (cache.contains(id)) {    // gonna find her
    cache(id)
  } else {
    val conn = createConnection()
 
    val back = try {
      val stmt = conn.prepareStatement("SELECT value FROM dataTab WHERE id = ?")
      stmt.setInt(1, id)
 
      val res = stmt.executeQuery()
      if (res.next()) {
        res.getString(1)
      } else {
        null
      }
    } finally {
      conn.close()
    }
 
    cache += (id -> back)
    back
  }
}

Unlike Java, Scala’s collections API is extremely consistent in what methods correspond to what functionality.  The contains method on a Map does in fact search based on key, not value.  However, sometimes the situation calls for a solution which is not so specific.  Looking for a particular element is great (and very efficient on maps and sets), but it isn’t the most general implementation.  A more flexible form of searching would be to match based on a higher-order function (just like filter), rather than an explicit value.  This not only allows searching for a specific element, but it also provides the ability to look for a range.  More generally, the exists method makes it possible to check to see if an element of a given collection satisfies a certain property.

val nums = List(1, 2, 3, 4, 5, 6)
nums.exists((n) => (3 to 5).contains(n))

In this example, we are literally checking the list, nums, for any values which are in the mathematical range [3, 5].  The exists method calls the predicate (our function parameter - or “lambda” if you prefer) for each element in the list until it returns true, at which point the iteration is short circuited.  The predicate itself creates a new Range and checks to see if the specified value is within that range.  As it turns out, Range itself is a collection just like any other, defining the same methods that we’ve come to know at love.

There is one final variation on the “search” theme that is worth examining: find.  While it’s great to know that some element within a collection satisfies a certain property, sometimes it is even more useful to be able to ascertain what element that was.  Thus, rather than returning a Boolean, the find method returns an instance of the Option monad containing the first satisfactory element find, or None if the property remains unsatisfied.  Adapting our example from above yields the following code and its associated result:

val nums = List(1, 2, 3, 4, 5, 6)
val elem = nums.find((n) => (3 to 5).contains(n))
 
elem == Some(3)   // true

Conclusion

Well, it’s been a rather short series, but hopefully still worth reading.  In truth, I skipped a great deal of detail and idiomatic beauty that can be found within the Scala collections API.  While the type definitions could certainly stand improvement in some areas, it is already without a doubt the best-designed collections framework I have ever used (in any language).  Literally, once you figure out how to best employ one collection type, you will have learned them all.  For further reading on the topic, you can always peruse the scaladoc, or alternatively just play around in the Scala interpreter.  Have fun!

Scala Collections for the Easily Bored Part 2: One at a Time

28
Jul
2008

As I hinted previously, this series is intended to delve into Scala’s extensive collections API and the many ways in which it can make your life easier.  Probably the most important operations you could ever perform on collections are those which examine each element, one at a time.  After all, what’s a more common array idiom than looping over all values?

In that vein, this article starts by looking at foreach, the imperative programmer’s bread-and-butter when it comes to types like Array and List.  But rather than just stopping there, we also will look at more powerful, higher-order operations like fold, map and the ever-mysterious: flatMap.

Iterating

As I said above, looping over every item in a collection is probably the most heavily used operation in a programmer’s repertoire.  In fact, this pattern is so common in imperative languages that Java 5 saw the introduction of a special construct at the language level, just to make things a little easier.  For example:

String[] names = { "Daniel", "Chris", "Joseph" };
for (String name : names) {
    System.out.println(name);
}

This code should be old hat to anyone coming from a Java background.  Under the surface, this code is compiled into a while-loop with an iterator and an increment operation.  The code steps through the array, assigning each successive element to name.  Most statically typed languages have a construct similar to this.  For example, C# offers the foreach statement, which is almost precisely similar to Java’s enhanced for-loop, but with a slightly different syntax.  However, some languages (like Ruby) idiomatically eschew loops and rely instead on higher-order methods.  Translating the above into Ruby yields the following:

names = ['Daniel', 'Chris', 'Joseph']
names.each do |name|
  puts name
end

In this case, we aren’t using a loop of any kind, but rather creating a block (Ruby’s name for a closure) which takes a single parameter and passes it to the built-in puts method.  This block is then passed as an object to the each method of class Array, which calls the block once for each element in series.  Certainly, there is a language construct which encapsulates this, but using the each method directly is considered more “Ruby”.

The same approach is taken in Scala.  Rather than define a special construct for iteration, Scala simply provides the syntactic tools needed to construct higher-order methods like Ruby’s each.  Every collection in Scala’s library defines (or inherits) a foreach method (taking after its C# ancestry) which behaves precisely like Ruby’s each.  To show how, we will translate our example once more, this time into Scala:

val names = Array("Daniel", "Chris", "Joseph")
names.foreach { name =>
  println(name)
}

Here we define an anonymous method (Scala’s name for a closure) which takes a single parameter.  As in Ruby, this closure is passed to foreach and invoked for each array element.  In this way, foreach is a a so-called “higher-order” method, due to the fact that it accepts a parameter which is itself another method.  Scala’s implementation of this concept is actually a bit more general than Ruby’s, allowing us to shorten our example into the following:

val names = Array("Daniel", "Chris", "Joseph")
names.foreach(println)

This time, instead of creating an anonymous method to pass to foreach, we simply pass the println method itself.  The only criterion that foreach imposes on its parameter is that it is a method which accepts a single parameter of type String (the element type of the array).  Since we already have just such a method (println), there is no need to define another simply for encapsulation.

Unfortunately, despite its flexibility, foreach is not always the most syntactically concise way to iterate over a collection.  There are times that we just want to use a syntax which is similar to the for-loops available in other languages.  Well, fear not!  Scala’s for-comprehensions are more than capable of providing just such a syntax.  Consider the example of imperatively summing the values of a list:

val nums = List(1, 2, 3, 4, 5)
 
var sum = 0
for (n <- nums) {
  sum += n
}

At compile-time, the above is actually translated into an equivalent call to foreach, passing the body of the loop as the anonymous method.  This means that any class which correctly declares a foreach method may be used in a for-comprehension in this way, greatly reducing the syntactic overhead.

Folding

Looping is nice, but sometimes there are situations where it is necessary to somehow combine or examine every element in a collection, producing a single value as a result.  An example of this could be our previous example of summing a list.  Using foreach, we had to make use of a mutable variable (sum) and produce the result as a side effect.  This is fine for hybrid languages like Scala, but some languages actually lack mutable variables altogether.  In the previous post, I mentioned ML, which is a pure-functional language (almost).  Since pure-functional languages lack mutable state, the gods of computing needed to come up with some other way to accommodate this particular case.  The solution is called “folding”.

Folding a collection is literally the process of looking at each element in addition to a current accumulator and returning some value.  To make things more clear, let’s re-write our list summing example in a functional style:

val nums = List(1, 2, 3, 4, 5)
val sum = nums.foldLeft(0) { (total, n) =>
  total + n
}

It may seem a bit disguised, but this too is a loop, just like foreach.  For each element in the list (starting from the left), the foldLeft method will call the anonymous method we passed to it, providing both the current element, as well as the total accumulated from the previous call.  Since there was no previous call when the first element is processed, we must specify an initial value for total - in this case, 0.  Literally, the above can be flattened into the following:

val sum = 0 + 1 + 2 + 3 + 4 + 5

Of course, we would never want to hard-code a list in this fashion, but it serves as a sufficient illustration of the functionality.  I know from experience that when you first discover fold it’s difficult to see why anyone would want to use a construct so limited.  After all, doesn’t it just serve to obscure the true meaning of the code?  Well, take my word for it, fold is an almost indispensable tool…once you get to know it a little better.  Try keeping an eye out for times in your own code where a fold might be useful instead of a conventional loop.  You’ll be surprised how often it can be used to solve a problem, sometimes one not even intuitively related to accumulation.

There’s no special language-level syntax for fold, but Scala’s powerful operator overloading mechanism has allowed the designers of the collections API to create a special operator of rather dubious readability.  To illustrate, here’s our “summing a list” example once more:

val nums = List(1, 2, 3, 4, 5)
(0 /: nums) {_+_}

Yeah, I can’t read it either.  This example is semantically equivalent to the previous fold, but its meaning is a bit obfuscated by a) the bizarre right-associative operator, and b) a mysterious cameo by Scala’s ubiquitous underscore.  While I don’t have a problem using the underscore syntax in my own code, I don’t think that the fold operator improves anything other than number of characters.  I suppose it’s a matter of taste though.

Reduce

Fold has a closely related operation in Scala called “reduce” which can be extremely helpful in merging the elements of a sequence where leading or trailing values might be a problem.  Consider the ever-popular example of transforming a list of String(s) into a single, comma-delimited value:

val names = List("Daniel", "Chris", "Joseph")
val str = names.foldLeft("") { (acc, n) =>
  acc + ", " + n
}
 
println(str)

If you compile and run this code, you will actually arrive at a result which looks like the following:

, Daniel, Chris, Joseph

This is because folding a list requires a leading value, but this means that we have an extra separator running wild.  We could try a foldRight, but this would merely move the same problem to the tail of the string.  Interestingly enough, this problem of leading/trailing separators is hardly specific to folding.  I can’t tell you how many times I ran into this issue when constructing highly-imperative query synthesis algorithms for ActiveObjects.

The easiest way to solve this problem in Scala is to simply use a reduce, rather than a fold.  As a rule, any collection which defines foldLeft will also define reduceLeft (and the corresponding right methods).  Reduce distinguishes itself from fold in that it does not require an initial value to “prime the sequence” as it were.  Rather, it starts with the very first element in the sequence and moves on to the end.  Thus, the following code will produce the desired result for our names example:

val names = List("Daniel", "Chris", "Joseph")
val str = names.reduceLeft[String] { (acc, n) =>
  acc + ", " + n
}
 
println(str)

There are of course a few small problems with this approach.  Firstly, it is not as general as a fold.  Reduce is designed primarily for the iterate/accumulate pattern, whereas fold may be applied to many problems (such as searching a list).  Also, the reduce method will throw an exception if the target collection is empty.  Finally, Scala’s type inference isn’t quite clever enough to figure out what’s going on without the explicit [String] type parameter (since our result is of type String).  Still, even with all these shortcomings, reduce can be a very powerful tool in the right hands.

Mapping

We’ve seen how fold can be an extremely useful tool for applying a computation to each element in a collection and arriving at a single result, but what if we want to apply a method to every element in a collection in-place (as it were), creating a new collection of the same type with the modified elements?  Coming from an imperative background, this probably sounds a little abstract, but like fold, map can be extremely useful in many common scenarios.  Consider the example of transforming a list of String(s) into a list of Int(s):

val strs = List("1", "2", "3", "4", "5")
val nums = strs.map { s =>
  s.toInt
}
 
nums == List(1, 2, 3, 4, 5)   // true

The final expression in this snippet is just to illustrate what really happens to the list elements when map is called.  Literally, the map method walks through each element in the list, calls the provided method and then stores the result in the corresponding index of a new list.  (list is immutable, remember?)  If you think about it, this is very similar to looping with foreach except that at each iteration we produce a value which is saved for future use.

Another common application of this technique might be to cast an entire array from one type to another.  I often make use of XMLRPC, which has the unfortunate property of stripping all type information from its composite types.  Thus, I often find myself writing code like this:

public void rpcMethod(Object[] params) {
    String[] strParams = new String[params.length];
    for (int i = 0; i < params.length; i++) {
        strParams[i] = (String) params[i];
    }
}

It’s a lot of trouble to go through, but I really don’t know any better way.  We can’t just cast the array to String[], since the array itself is not of type String[], only its elements match that type.  Java doesn’t support higher-order operations such as map, but fortunately Scala does.  We can translate the above into a functional style and gain tremendously in both readability and conciseness:

def rpcMethod(params: Array[Object]) {
  val strParams = params.map { _.asInstanceOf[String] }
}

For the sake of brevity, you’ll notice that I made use of the underscore syntax as a placeholder for the anonymous method parameter.  This syntax works remarkably well for short operations like the above, where all we need to do is take the input value and cast it to a new type.

As it turns out, mapping over a collection is a phenomenally common operation, perhaps even more so than fold.  For that reason, the creators of Scala decided that it was worth adding a special syntax sugar built into the powerful for-comprehension mechanism.  With a little bit of tweaking, we can transform our casting example into an arguably more readable form:

def rpcMethod(params: Array[Object]) {
  val strParams = for (p <- params) yield p.asInstanceOf[String]
}

At compile-time, these two forms are literally equivalent.  In some ways it is a matter of taste as to which is better.  I personally tend to favor directly calling map for simple, non-combinatorial operations like this, but to each his own.

Binding

Actually, the name “bind” comes from Haskell.  Scala’s term for this operation is “flatMap” because the operation may be viewed as a combination of the map and flatten methods.  Of all of the techniques discussed so far, this one probably has the richest theoretical implications.  Coming straight from the menacing jungles of category theory and the perplexing wasteland of monads, flatMap is both intriguing and apparently useless (at first glance anyway).

Like map, flatMap walks through every element in a collection and applies a given function, saving the value for later use.  However, unlike map, flatMap expects the return type of the specified function to be the same as the enclosing collection with an optionally different type parameter.  If we look at this in terms of our number-converting example from previously, this means that our anonymous method must not return a value of type Int, but rather of type List[Int].  Once flatMap has all of the resultant List[Int] values, it flattens them into a single list containing all of the elements from each of the inner lists.

Ok, that was utterly un-helpful.  Maybe the method signature would be more illuminating:

class List[A] {   // don't try this at home
  def flatMap[B](f: (A)=>List[B]): List[B]
}

Other than forcing order of evaluation, I can’t personally think of too many common cases where this sort of operation is useful.  However, one contrived example does spring to mind.  Consider once more the example of converting a list of String(s) into a list of Int(s), but this time assume that some of the String elements might not nicely convert into integer values:

val strs = List("1", "two", "3", "four", "five")
val nums = strs.flatMap { s =>
  try {
    List(s.toInt)
  } catch {
    case _ => Nil
  }
}
 
nums == List(1, 3)    // true

Recall that in Scala, everything is an expression - including try/catch blocks - therefore, everything has a value.  This code literally walks through the entire list and tries to convert each element into an integer and wrap the result in a List.  If the conversion fails (for whatever reason), an empty list is returned (Nil).  Because we return an empty list for those elements which cannot be converted, flatMap literally resolves those results out of existence, leading to a List which only contains two Int(s).  For the monadically uninclined among us, this is precisely the reason why Nil is referred to as the “zero” of the List monad.  However, that’s a topic for an entirely different series

Conclusion

Ok, so this article was a bit longer than I really wanted to run, but there’s a lot of material to cover!  Scala’s collections framework shows how even operations steeped in mountains of theory can still prove useful in solving common problems.  Now, every time I use collections in Java (or even Ruby), I find myself reaching for many of these same methods, only to find them either unavailable or less powerful than I would like.  Scala provides a welcome refuge for all those of us who desire more powerful collection types in our daily programming.

Be with us next time for filter, forall, exists and more!

Scala Collections for the Easily Bored Part 1: A Tale of Two Flavors

21
Jul
2008

One of the most obvious things to a Java developer first coming into Scala-land is the radically different Collections API included as part of the standard library.  For the most part, we use the same frameworks and APIs in Scala as are available in Java.  This is natural, thanks to the extremely tight integration between the two languages.  So why is this one area such a startling departure from Scala’s heritage?

The answer has to do with what the Scala language is syntactically capable of handling.  Scala isn’t just an object-oriented language, it is also highly functional.  It is only natural that such an integral part of the core libraries would reflect this fact.  Unfortunately, most developers fail to take full advantage of the power offered by the collections API.  Despite the available power, most code written using Scala’s collections tends to look a lot like Java in disguise.

I had actually planned on addressing this topic in a single article.  However, Scala’s collections are so vast and powerful (sounds like one of Roddenberry’s alien consciousness beings) that it really would overrun the limits of conventional blogging to attempt to cover it all in a single post.  Despite the fact that I’ve been creating mammoth anthologies of late, I think it’s probably better to break it into bite-sized chunks.  First up: the confusing dual nature of Scala’s collections API!

A Tale of Two Flavors

The very first thing developers notice when looking at Scala’s collections is a (seemingly) odd redundancy in the specification.  Looking under the scala.collection package, we see not one, but three separate sub-packages, each containing what seem to be reimplementations of the same types.  For example, consider the following three traits:

I don’t know about you, but this confused the heck out of me the first time I really dug into Scala’s standard API.  Actually, it gets even worse when you discover that there is also a trait (and companion object), scala.collection.Map, which is actually a super-trait of the three listed above.  Seems like Dr. Odersky discovered the magic of separated namespaces and reacted like a two-year-old on espresso.

As it turns out, there’s a very logical reason for having these separated and seemingly-redundant collections packages.  Of the three, one of them can be discounted immediately as uninteresting.  The jcl package contains collections, but they are merely wrappers for the corresponding Java collections.  This allows more efficient transmutation between the Java collections API and Scala.  It is almost never necessary to use this package directly, since a number of implicit conversions are built into Scala to make the process essentially transparent.

Of course, this still leaves the immutable and mutable packages.  This distinction traces back to some of Scala’s functional roots.  As you are likely aware, Scala supports both mutable and immutable variables, as denoted by the var and val keywords, respectively.  While there are some significant differences at compile-time, conceptually, the only distinction between these two types is the former allows reassignment whereas the latter does not.  Mutable variables are a common - and indeed, essential - feature in imperative languages (Java, C++, Ruby, etc).  For example, here’s how we would sum an array of integers in Java:

int[] numbers = ...
 
int sum = 0;
for (int n : numbers) {
    sum += n;   // reassign sum to accumulate n
}

In this case, sum is a mutable variable which accumulates the total value of all numbers in the array, numbers.  Theoretical disputes aside, this style of programming is simply impossible in certain languages.  For example, SML provides no mechanism for declaring mutable variables.  So if we want to sum the values in a list of ints, we have to do it in some other way (code provided for the curious):

fun sumList ls = foldl (op +) 0 ls

In Scala, both of these techniques are available to us.  However, despite providing for mutable state, Scala does encourage developers to avoid it.  The reasoning behind this is that mutable state has a tendency to make code more difficult to reason about, making testing much harder.  Also, as any experienced developer will tell you, mutable state kills concurrency.

Not only does Scala encourage the use of immutable variables, it also encourages the use of immutable collections.  This concept may seem a little bizarre to those of you coming from an imperative background (I know it did to me), but it actually works.  While a map container which cannot be modified probably seems a little useless, it actually provides a startling degree of freedom from concerns like concurrent locking and unintended side-effects.  With immutable collections, you are able to manipulate objects with perfect assuredness that no method call will “accidentally” alter its contents.  How many times have you cloned a collection prior to returning it?  Or how often have you dug through someone else’s code just to assure yourself that it is safe to call a given method passing an instance of List?  Immutable data structures completely solve that problem.

Naturally, immutable collections require very different patterns and idioms than those which are mutable.  My ML code from above illustrates this to a very small degree, using a fold to traverse an immutable list.  As a general rule, immutable data structures are only useful when being passed around via method calls.  A common pattern is to build up a data structure recursively, creating a new instance with one more element at each invocation:

def toSet[T](list: List[T]) = {
  def traverse(list: List[T])(set: Set[T]): Set[T] = list match {
    case hd :: tail => traverse(tail)(set + hd)   // create a new Set, adding hd
    case Nil => set
  }
 
  traverse(list)(Set[T]())
}
 
val names = List("Daniel", "Chris", "Joseph", "Renee")
val nameSet = toSet(names)

At each recursive invocation of traverse, a new Set is created, based on the contents of the old one, set, but with one additional member, hd.  At the same time, we are deconstructing an immutable List, list, selecting its first element and whatever remains at each step.  Whenever you work with immutable data structures, you will see a lot of code which looks like this.

Of course, the natural question which comes to mind is, what about performance?  If each invocation actually creates a brand new Set for every recursive call, doesn’t that require a lot of inefficient object copying and heap operations?  Well, as it turns out, this isn’t really the case.  Yes, a new instance must be created at each turn, which is is a comparatively expensive operation on the JVM, but almost nothing is being copied.  All of Scala’s immutable data structures have a property known as persistence, which means that you don’t copy the data out of the old container when creating a new, you just have the new reference the old and treat all of its contents as if they were its own.  A linked list is a good example of this, since each node of a list contains exactly one element, as well as a reference to another node.  If we think of each node as a representative of a list starting with itself and traversing to the end, then list suddenly becomes a fully persistent data structure (since the new list contains a sub-list in its entirety and additive operations require no data copying).  Rich Hickey, the creator of Clojure (a Lisp dialect running on the JVM), has an excellent presentation which explains some of the hows and whys behind this technique (as well as some other interesting topics).  Chapter 19 of Programming in Scala (Odersky, Spoon & Venners) also has a good example of a persistent immutable queue.

Conclusion

I happen to like immutable data, so most of this series uses the scala.collection.immutable package.  However, there are certainly situations where mutable data structures are the only way to go, either because of system requirements or for performance reasons.  Fortunately, Scala’s mutable collections have an almost identical interface to its immutable collections.  Thus, most of the in formation presented here is applicable to both branches of the library.

Now that we have laid a basic foundation regarding the fundamentals of Scala’s collections framework, we can move onto more interesting things.  The next installment will deal with fold and map, the bread and butter of every collection.