Another Tour of Scala

ForComprehensions

The Gist

Sequence Comprehensions describes Scala’s for “loop”

My Interpretation

You can read the wikipedia entry on list comprehensions if you like, but what we’re talking about here is the many ways in which Scala allows you to traverse and process a list.

We’ve seen some of this in ScalaFunctions and XmlLiterals , but these deal with more specific means of traversing a list (e.g. to find a specific element or map elements to other elements). These are all specialized versions of the so-called “for-comprehension”, which is a fancy name for a for loop.

Suppose we have a list of U.S. States that have been tagged with their general location in the U.S. (e.g. “east” vs. “midwest” vs. “south”). Now suppose we wish to get a list of all the states that aren’t on the east, except for Washington, DC, which, for some reason, we don’t consider east coast

The expression state <- states is a generator, which assigns a value from states to the value state in succession. Each time this happens, the “guard condition” that starts with if is evaluated. If it evaluates to true (or if there is no guard condition), we “mark” this item for iteration. Once the collection has been processed, we iterate over each “marked” item executing the body of the for-comprehension (this is the part that comes after the final paren). In this case, we use the yield keyword, which essentially means “yield this value back to the list we are creating”. Yes, we are creating a list, here. The end result of this is a list of states that matched our guard condition.

Note the subtle difference here; in Java we would iterate over the entire collection; in Scala we are building up a collection over which to iterate via the guard condition and then executing the body of the for-comprehension. This means that, in general, conditions inside the for-comprehension body cannot affect conditions in the guard condition:

var done = false
for (state <- states if !done) if state.code == "DC" done = true

When we hit DC, we will not stop; this is because the guard condition has already been evaluated for every item in the list before we got to the body. Yikes (See this mind-blowing article on the crazy subtleties of the for loop)

If we didn’t want to actually yield a list, we can omit the yield keyword alltogether:

for( state <- states if state.location != 'east || state.code == "DC")
  println(state)

Nested Loops

Instead of nesting for loops, we can simply add expressions to the comprehension. Suppose we wish to find pairs of states that could be “sister” states; states that aren’t in the same geographic area.

val sisters = for( state1 <- states; 
                   state2 <- states if state1.location != state2.location) 
                     yield (state1,state2)

Here, we loop through each pair of states, yield a tuple of states that don’t have the same location.

What about map, filter, etc.?

In reality, the for comprehensions are translated by the compiler into calls to map, filter, and flatMap. Section 10.3 of Scala By Example details this.

My Thoughts on this Feature

I thought I understood this feature until I read this article . Now, I’m a bit lost, mostly as to why it works this way. Perhaps the name for is misleading; in most programming languages, for means “do something for everything in the list”. I guess the definition of the “list” is what’s unclear. I find it hard to know what a particular for statement in Scala will actually do. That doesn’t seem good.

I suppose once I’ve internalized how it works, it will seem more obvious.