The Scala Iterator 'drop' Method Generates a Matryoshka Class Nesting
The Scala Iterator drop method has a complexity bug that shows up when one calls drop repeatedly, for example when traversing over an iterator in a loop.
The nature of the problem is that drop, under the hood, invokes slice, which returns a new anonymous subclass of AbstractIterator containing an instance of the input class, which can be seen in this code excerpt from Iterator.scala:
def drop(n: Int): Iterator[A] = slice(n, Int.MaxValue)
// ... comments excised ...
def slice(from: Int, until: Int): Iterator[A] = {
val lo = from max 0
var toDrop = lo
while (toDrop > 0 && self.hasNext) {
self.next()
toDrop -= 1
}
// I am a ticking quadratic time bomb:
new AbstractIterator[A] {
private var remaining = until - lo
def hasNext = remaining > 0 && self.hasNext
def next(): A =
if (remaining > 0) {
remaining -= 1
self.next()
}
else empty.next()
}
}
In the case where one is only calling drop once, this is not very consequential, but when the same method is used in a loop, the nesting is repeated, generating a nesting of anonymous classes that is ever-deeper – rather like Matryoshka dolls:

This can be a substantial problem, as it generates quadratic complexity in what is logically a linear operation. A simple example of looping code that can cause this nesting:
def process_nth_elements[T](itr: Iterator[T], n: Int = 1) {
var iter = itr
while (iter.hasNext) {
val nxt = iter.next
// ... process next element ...
// skip to next element
iter = iter.drop(n-1)
// this becomes more and more expensive as iterator classes
// become nested deeper
}
}
A simple example program, which can be found here, demonstrates this nesting directly:
import java.io.{StringWriter, PrintWriter}
import scala.reflect.ClassTag
def tracehead(e: Exception, substr: String = "slice"): String = {
val sw = new StringWriter()
e.printStackTrace(new PrintWriter(sw))
sw.toString.split('\n').takeWhile((s:String)=> !s.contains(substr)).drop(1).mkString("\n")
}
class TestIterator[T: ClassTag](val iter: Iterator[T]) extends Iterator[T] {
override def hasNext = iter.hasNext
override def next = {
println(tracehead(new Exception))
iter.next
}
}
def drop_test[T](itr: Iterator[T]) {
var n = 0
var iter = itr
while (iter.hasNext) {
n += 1
println(s"\ndrop # $n")
iter = iter.drop(1)
}
}
When the drop_test function is run on an instance of TestIterator, the stack trace output shows the Matryoshka nesting directly:
scala> drop_test(new TestIterator(List(1,2,3,4,5).iterator))
drop # 1
at $line18.$read$$iw$$iw$$iw$$iw$TestIterator.next(<console>:19)
drop # 2
at $line18.$read$$iw$$iw$$iw$$iw$TestIterator.next(<console>:19)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
drop # 3
at $line18.$read$$iw$$iw$$iw$$iw$TestIterator.next(<console>:19)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
drop # 4
at $line18.$read$$iw$$iw$$iw$$iw$TestIterator.next(<console>:19)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
drop # 5
at $line18.$read$$iw$$iw$$iw$$iw$TestIterator.next(<console>:19)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
One would expect this quadratic behavior to show up in benchmarking, and it does. Consider this simple timing test:
def drop_time[T](itr: Iterator[T]) {
val t0 = System.currentTimeMillis()
var iter = itr
while (iter.hasNext) {
iter = iter.drop(1)
}
println(s"Time: ${System.currentTimeMillis() - t0}")
}
One would expect this function to be linear in the length of the iterator, but we see the following behavior:
scala> drop_time((1 to 5000 * 1).toList.iterator)
Time: 106
scala> drop_time((1 to 5000 * 2).toList.iterator)
Time: 475
scala> drop_time((1 to 5000 * 3).toList.iterator)
Time: 1108
scala> drop_time((1 to 5000 * 4).toList.iterator)
Time: 2037
scala> drop_time((1 to 5000 * 5).toList.iterator)
Time: 3234
scala> drop_time((1 to 5000 * 6).toList.iterator)
Time: 4717
scala> drop_time((1 to 5000 * 7).toList.iterator)
Time: 6447
scala> drop_time((1 to 5000 * 8).toList.iterator)
java.lang.StackOverflowError
at scala.collection.Iterator$$anon$10.next(Iterator.scala:312)
The corresponding plot shows the quadratic cost:

Given the official semantics of drop, which state that the method invalidates the iterator it was called on, this nesting problem should be avoidable by implementing the method more like this:
def drop(n: Int): Iterator[A] = {
var j = 0
while (j < n) {
this.next
j += 1
}
this
}