Skip to content

Add lazyFoldLeft #18

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jul 30, 2019
Merged

Add lazyFoldLeft #18

merged 1 commit into from
Jul 30, 2019

Conversation

adamgfraser
Copy link
Contributor

This PR adds a new lazyFoldLeft implicit enrichment method to Scala standard collections as discussed here. lazyFoldLeft is like foldLeft but the combination function op is non-strict in its second parameter. If op(b, a) chooses not to evaluate a and returns b, this terminates the traversal early.

This method addresses a tension that currently exists with using folds in Scala. Lazy right folds are preferable in many ways as a building block for specific folds because they support efficient implementation of methods like exists that allow early termination. They also allow traversals of infinite collections. However, right folds are not tail recursive and thus are not stack safe for large collections, a property we usually want to guarantee.

This problem has been addressed in various ways including variations of folds that allow the caller to explicitly signal termination (e.g. foldLeftSome or lazyFoldRight in this library) or evaluation of the right fold in the context of a stack safe monad (e.g. lazyFoldRight in Cats with Eval). These solutions are less than ideal for two reasons.

First, they can require more work from the caller to explicitly specify early termination conditions when it should be implied by the operation itself. For example, when implementing exists with foldLeftSome we could use None to explicitly signal termination.

def exists[A](as: Iterable[A])(f: A => Boolean): Boolean =
  as.foldLeftSome(false)((b, a) => if (b) None else Some(f(a)))

But the early terminating nature of the fold is really inherent in the || operator, which can be seen much more clearly without the Option.

def exists[A](as: Iterable[A])(f: A => Boolean): Boolean =
  lazyFoldLeft(as)(false)(_ || f(_))

Second, these solutions require some additional boxing, either in terms of Option, Either, or Eval that negatively impacts performance and reduces the practical utility of these higher order functions in implementing more specific collection operations.

This implementation of lazyFoldLeft has a couple of attractive properties.

In terms of soundness, we can show that for any pure function f: (B, => A) => B, if as.foldLeft(z)(f) terminates, then as.lazyFoldLeft(z)(f) terminates with the same value. The argument is that lazyFoldLeft only terminates early if op(b, a) chooses not to evaluate a and returns b. If when traversing a given element, f did not evaluate a, then for the given b, f is effectively a function of type B => B. But the second termination condition is that the op(b, a) returns b. So in that case calling f again with the next element in the traversal will also not evaluate a and will also return b, assuming that f is a pure function. By induction we can apply the same logic to every remaining element and safely terminate the traversal early.

My benchmarking also indicates that the performance cost of using a lazy left fold instead of a strict one is relatively low. Essentially the only additional work in traversing each element is setting and getting a boolean value.

So it seems like this could be a very useful method allowing us to implement a wide variety of methods in a relatively efficient way. I added some basic tests but didn't see the infrastructure for property-based testing or benchmarks. I'm happy to add those if that is helpful.

@julienrf
Copy link
Collaborator

julienrf commented May 1, 2019

Thanks a lot for the detailed description!

My benchmarking also indicates that the performance cost of using a lazy left fold instead of a strict one is relatively low. Essentially the only additional work in traversing each element is setting and getting a boolean value.

How does it compare with the cost of using Option or Either, that you mention in the description?

@adamgfraser
Copy link
Contributor Author

Here are some benchmarking results. I did two benchmarks. The first looks at adding a list of numbers from 1 to 50,000. The strict left fold is the fastest as we would expect. The lazy left fold is slightly faster than using Option to signal early termination. The lazy right fold using Either is dramatically slower, though I think this is a result of the fact that we have to build up a chain of functions and then evaluate them rather than anything having to do with Either itself.

[info] Benchmark                                         (size)  Mode  Cnt        Score        Error  Units
[info] lazyFoldLeft.StrictBenchmark.sumViaFoldLeft        50000  avgt    8   275836.344 ±  24046.468  ns/op
[info] lazyFoldLeft.StrictBenchmark.sumViaFoldSomeLeft    50000  avgt    8   573510.116 ±  23423.670  ns/op
[info] lazyFoldLeft.StrictBenchmark.sumViaLazyFoldLeft    50000  avgt    8   484497.481 ±  17715.001  ns/op
[info] lazyFoldLeft.StrictBenchmark.sumViaLazyFoldRight   50000  avgt    8  1256060.618 ± 480373.378  ns/op

The second benchmark looks at determining whether there is a number greater than 50,000 in a list of numbers from 1 to 100,000. The specialized exists function is the fastest. The lazy left fold and the fold using Option have similar performance, with the fold using Option being very slightly faster in this case. The lazy right fold is again the slowest by far.

[info] Benchmark                                          (size)  Mode  Cnt        Score        Error  Units
[info] lazyFoldLeft.LazyBenchmark.exists                  100000  avgt    8   203743.419 ±   2319.463  ns/op
[info] lazyFoldLeft.LazyBenchmark.existsViaFoldSomeLeft   100000  avgt    8   525195.407 ±  51276.018  ns/op
[info] lazyFoldLeft.LazyBenchmark.existsViaLazyFoldLeft   100000  avgt    8   550320.841 ±  13896.251  ns/op
[info] lazyFoldLeft.LazyBenchmark.existsViaLazyFoldRight  100000  avgt    8  1232583.315 ± 253110.981  ns/op

So overall I think we can say that the lazy left fold is as fast as using Option to signal early termination.

@joshlemer
Copy link
Contributor

@adamgfraser this is cool, do you think this would render foldSomeLeft as obsolete? Is there reason to keep both?

@adamgfraser
Copy link
Contributor Author

Thanks!

From a technical perspective, we can write any expression using foldSomeLeft in terms of lazyFoldLeft, which we can show by implementing foldSomeLeft in terms of lazyFoldLeft.

def foldSomeLeft[A, B](as: Iterable[A])(z: B)(op: (B, A) => Option[B]): B =
  as.lazyFoldLeft((z, false)) { (b, a) =>
    val (acc, finished) = b
    if (finished) (acc, true)
    else op(acc, a) match {
      case Some(v) => (v, false)
      case None => (acc, true)
    }
  }._1

So then it comes down to a more subjective question of what kind of API we want to provide. lazyFoldLeft and foldLeftSome are two ways of doing the same thing. I find lazyFoldLeft quite elegant to use and like the way it terminates the fold early automatically if the operator doesn't need to evaluate more elements. However, some people might like the explicitness of using Option to signal early termination. So we could keep both. But then there is the argument that there is value in providing one way to do things and not having unnecessary methods.

What do you think?

@julienrf
Copy link
Collaborator

julienrf commented May 1, 2019

Thanks again for your benchmarks and detailed explanations.

lazyFoldLeft and foldLeftSome are two ways of doing the same thing. I find lazyFoldLeft quite elegant to use and like the way it terminates the fold early automatically if the operator doesn't need to evaluate more elements. However, some people might like the explicitness of using Option to signal early termination. So we could keep both. But then there is the argument that there is value in providing one way to do things and not having unnecessary methods.

I have a slight preference to keep just lazyFoldLeft. What do others think?

@joshlemer
Copy link
Contributor

Well your example with exists is pretty nice, but the use case is so similar to foldSomeLeft (though I dislike the name) that it seems weird to have both.

If we take your exists example:

def exists[A](as: Iterable[A])(f: A => Boolean): Boolean =
  as.foldLeftSome(false)((b, a) => if (b) None else Some(f(a)))

Maybe this example won't be so bad if we use Option.when/Option.unless?

def exists[A](as: Iterable[A])(f: A => Boolean): Boolean =
  as.foldLeftSome(false)((b, a) => Option.unless(b)(f(a)))

Though on the other hand it does seem nice, for completion, to have lazyFoldLeft since we already have lazyFoldRight

@adamgfraser
Copy link
Contributor Author

@julienrf That makes sense to me. Would you like me to update the PR accordingly?

@joshlemer I think the advantage of not having to use Option will be greater in more complex use cases (e.g. when the binary operation itself is using Option to indicate an optional value so you have two layers of nested Option values indicating different things). It may also integrate better with other code. If you have defined a binary operation separately and want to use it in a fold you can just drop it in with lazyFoldLeft whereas you have to specify the termination conditions yourself if you use foldSomeLeft.

@adamgfraser
Copy link
Contributor Author

Is there additional work to do on this? Should we merge it or use close?

Copy link
Collaborator

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @adamgfraser, sorry for they delay! I wanted to see if more debate would happen :)

val elem = `this`.next()
def getNext = { nextEvaluated = true; elem }
val acc = op(result, getNext)
finished = !nextEvaluated && acc == result
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why we have the && acc == result part.

This makes the following never terminate, although no elements are evaluated!

Iterator.from(0).lazyFoldLeft(true)((b, _) => !b)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need it to deal with folds such as:

def length(as: Iterable[Int]): Int =
  as.lazyFoldLeft(0)((b, _) => b + 1)

In this case we aren't evaluating the list element but we need to continue the fold because the accumulator is changing. Otherwise we wouldn't correctly compute the length of the list.

In the example you gave, I think non-termination is the correct behavior. Every iteration the result alternates between true and false so it continues forever. Note that these are the same results we would get if we used a naive implementation of foldRight:

def foldRight[A, B](as: Iterable[A])(z: => B)(f: (A, => B) => B): B =
  if (as.isEmpty) z
  else f(as.head, foldRight(as.tail)(z)(f))

This would correctly compute the length of a list and not terminate on your example (technically it would fail with a stack overflow error).

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the explanation, that makes sense to me!

@julienrf julienrf merged commit 9ccba2c into scala:master Jul 30, 2019
@adamgfraser adamgfraser deleted the lazyFoldLeft branch July 30, 2019 13:57
@adamgfraser
Copy link
Contributor Author

@julienrf Thanks! Great working with you on this! Are there any plans for adding more functionality to this library? Would be happy to contribute more in the future.

@julienrf
Copy link
Collaborator

Thank you @adamgfraser for contributing!

There is no specific roadmap, the goal of this project is to be a hub for good collection-related utilities. Feel free to help getting open PRs merged, to submit new PRs closing exsting issues, or to suggest new ideas.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants