-
Notifications
You must be signed in to change notification settings - Fork 32
Add lazyFoldLeft #18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add lazyFoldLeft #18
Conversation
Thanks a lot for the detailed description!
How does it compare with the cost of using |
Here are some benchmarking results. I did two benchmarks. The first looks at adding a list of numbers from 1 to 50,000. The strict left fold is the fastest as we would expect. The lazy left fold is slightly faster than using
The second benchmark looks at determining whether there is a number greater than 50,000 in a list of numbers from 1 to 100,000. The specialized
So overall I think we can say that the lazy left fold is as fast as using |
@adamgfraser this is cool, do you think this would render |
Thanks! From a technical perspective, we can write any expression using
So then it comes down to a more subjective question of what kind of API we want to provide. What do you think? |
Thanks again for your benchmarks and detailed explanations.
I have a slight preference to keep just |
Well your example with If we take your exists example:
Maybe this example won't be so bad if we use
Though on the other hand it does seem nice, for completion, to have |
@julienrf That makes sense to me. Would you like me to update the PR accordingly? @joshlemer I think the advantage of not having to use |
Is there additional work to do on this? Should we merge it or use close? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey @adamgfraser, sorry for they delay! I wanted to see if more debate would happen :)
val elem = `this`.next() | ||
def getNext = { nextEvaluated = true; elem } | ||
val acc = op(result, getNext) | ||
finished = !nextEvaluated && acc == result |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not clear to me why we have the && acc == result
part.
This makes the following never terminate, although no elements are evaluated!
Iterator.from(0).lazyFoldLeft(true)((b, _) => !b)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We need it to deal with folds such as:
def length(as: Iterable[Int]): Int =
as.lazyFoldLeft(0)((b, _) => b + 1)
In this case we aren't evaluating the list element but we need to continue the fold because the accumulator is changing. Otherwise we wouldn't correctly compute the length of the list.
In the example you gave, I think non-termination is the correct behavior. Every iteration the result alternates between true
and false
so it continues forever. Note that these are the same results we would get if we used a naive implementation of foldRight
:
def foldRight[A, B](as: Iterable[A])(z: => B)(f: (A, => B) => B): B =
if (as.isEmpty) z
else f(as.head, foldRight(as.tail)(z)(f))
This would correctly compute the length of a list and not terminate on your example (technically it would fail with a stack overflow error).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the explanation, that makes sense to me!
@julienrf Thanks! Great working with you on this! Are there any plans for adding more functionality to this library? Would be happy to contribute more in the future. |
Thank you @adamgfraser for contributing! There is no specific roadmap, the goal of this project is to be a hub for good collection-related utilities. Feel free to help getting open PRs merged, to submit new PRs closing exsting issues, or to suggest new ideas. |
This PR adds a new
lazyFoldLeft
implicit enrichment method to Scala standard collections as discussed here.lazyFoldLeft
is likefoldLeft
but the combination functionop
is non-strict in its second parameter. Ifop(b, a)
chooses not to evaluatea
and returnsb
, this terminates the traversal early.This method addresses a tension that currently exists with using folds in Scala. Lazy right folds are preferable in many ways as a building block for specific folds because they support efficient implementation of methods like
exists
that allow early termination. They also allow traversals of infinite collections. However, right folds are not tail recursive and thus are not stack safe for large collections, a property we usually want to guarantee.This problem has been addressed in various ways including variations of folds that allow the caller to explicitly signal termination (e.g.
foldLeftSome
orlazyFoldRight
in this library) or evaluation of the right fold in the context of a stack safe monad (e.g.lazyFoldRight
in Cats withEval
). These solutions are less than ideal for two reasons.First, they can require more work from the caller to explicitly specify early termination conditions when it should be implied by the operation itself. For example, when implementing
exists
withfoldLeftSome
we could useNone
to explicitly signal termination.But the early terminating nature of the fold is really inherent in the
||
operator, which can be seen much more clearly without theOption
.Second, these solutions require some additional boxing, either in terms of
Option
,Either
, orEval
that negatively impacts performance and reduces the practical utility of these higher order functions in implementing more specific collection operations.This implementation of
lazyFoldLeft
has a couple of attractive properties.In terms of soundness, we can show that for any pure function
f: (B, => A) => B
, ifas.foldLeft(z)(f)
terminates, thenas.lazyFoldLeft(z)(f)
terminates with the same value. The argument is thatlazyFoldLeft
only terminates early ifop(b, a)
chooses not to evaluatea
and returnsb
. If when traversing a given element,f
did not evaluatea
, then for the givenb
,f
is effectively a function of typeB => B
. But the second termination condition is that theop(b, a)
returnsb
. So in that case callingf
again with the next element in the traversal will also not evaluatea
and will also returnb
, assuming thatf
is a pure function. By induction we can apply the same logic to every remaining element and safely terminate the traversal early.My benchmarking also indicates that the performance cost of using a lazy left fold instead of a strict one is relatively low. Essentially the only additional work in traversing each element is setting and getting a boolean value.
So it seems like this could be a very useful method allowing us to implement a wide variety of methods in a relatively efficient way. I added some basic tests but didn't see the infrastructure for property-based testing or benchmarks. I'm happy to add those if that is helpful.