Skip to content
This repository was archived by the owner on Dec 22, 2021. It is now read-only.

Unify strict and lazy building #114

Closed
wants to merge 4 commits into from

Conversation

szeiger
Copy link
Contributor

@szeiger szeiger commented Jun 16, 2017

This is an extension of #97 which provides Builders in BuildFrom and
FromSpecificIterable for all collection types because we need them
for always-strict operations like groupBy and we can always get them
anyway.

This greatly simplifies the design of BuildFrom by getting rid of half
of the factory types.

szeiger added 4 commits June 12, 2017 16:18
`BuildFrom` is like `FromSpecificIterable` but with an extra `From`
argument, like in the final version of scala#45. `FromSpecificIterable`
existed conceptually in that version as `BuildFrom[Any, …]` but didn’t
have a separate type.

This new version has separate abstractions for buildable (strict)
collection types in the form of `StrictBuildFrom` and
`FromSpecificIterableWithBuilder`. Since we can get a surrogate builder
(through one of the new `Builder.from` methods) for any lazy collection
and we can restrict code to work only with strict collections via the
`Buildable` trait, this is not strictly necessary. The only reason for
separating the `Builder` abstractions is to avoid exposing them in
`FromSpecificIterable`. Even though everything can be built in a strict
way, these abstractions sit on top of the lazy ones, not below them.
There is no way to get a BuildFrom purely by type in
this design, so we need to call parseCollection
explicitly with a companion object.
This is an extension of scala#97 which provides Builders in BuildFrom and
FromSpecificIterable for all collection types because we need them
for always-strict operations like groupBy and we can always get them
anyway.

This greatly simplifies the design of BuildFrom by getting rid of half
of the factory types.
@szeiger szeiger requested review from julienrf and odersky June 16, 2017 12:08
Copy link
Contributor

@julienrf julienrf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On one hand, introducing the newBuilder method in all factories is tempting because it makes factories a lot simpler! But on the other hand I’d rather not have them in View and LazyList factories.

In summary, we will have to decide on

  • whether we want to keep or not the public builders in all the factories (including View and LazyList)
  • in comparison with Implicit builders on top of #110 #112, do you think that would be doable to have generic tests use implicit instances?
  • also in comparison with Implicit builders on top of #110 #112: in this PR we have one type of implicit builder: BuildFrom, which takes a From parameter and which has a fromSpecificIterable method. Whereas in Implicit builders on top of #110 #112 we have CanBuild which has no From type parameter and no fromSpecificIterable method. Then, the idea in Implicit builders on top of #110 #112 is to derive BuildFrom and BuildTo from this CanBuild. But if in most cases we find it useful to have the From parameter maybe it makes sense to just have BuildFrom, as in this PR.

Do other people have thoughts?


def sortedIterableFactory = SortedSet

def sortedMapFactory: SortedMapFactory[CC]

protected[this] def sortedMapFromIterable[K2, V2](it: collection.Iterable[(K2, V2)])(implicit ordering: Ordering[K2]): CC[K2, V2]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn’t this be implemented?

protected[this] def sortedMapFromIterable[K2, V2](it: collection.Iterable[(K2, V2)])(implicit ordering: Ordering[K2]): CC[K2, V2] = sortedMapFactory.sortedFromIterable(it)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can delegate this methods now the same way it is already done for fromSpecificIterable.


trait IterableFactoryWithBuilder[+CC[_]] extends IterableFactory[CC] {
def newBuilder[A](): Builder[A, CC[A]]
def newBuilder[A](): Builder[A, CC[A]] = new ArrayBuffer[A]().mapResult(fromIterable _)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I’m not sure this default implementation will be useful. Will it be useful elsewhere than for ArrayBuffer and ImmutableArray?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This default is used for all collection types that cannot provide a better builder. This includes all lazy collections.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of lazy collections, what about using an ImmutableArray instead of an ArrayBuffer to avoid the extra space?

Copy link
Contributor Author

@szeiger szeiger Jun 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't know the size ahead of time so you can either copy everything again or live with some wasted space.

Copy link
Contributor

@julienrf julienrf Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean that even after an ArrayBuffer has been completely built there might still be some unused extra space. If you build an ImmutableArray (or just call toArray on the ArrayBuffer, which is almost exactly the same thing), then you trim the end of the buffer. (But yes, that means one additional copy)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any precedent for this in other collection methods? AFAIK trimming to the current size is always an explicit operation. The default is to avoid copying at the expense of some unused space.

@@ -9,7 +9,7 @@ trait SortedSet[A] extends Set[A] with SortedSetOps[A, SortedSet, SortedSet[A]]

trait SortedSetOps[A, +CC[X], +C <: SortedSet[A]]
extends SetOps[A, Set, C]
with SortedOps[A, C] {
with SortedOps[A, C, CC] {

protected[this] def sortedFromIterable[B: Ordering](it: Iterable[B]): CC[B]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be possible to implement this by delegating to sortedIterableFactory.


@Test
def genericTest: Unit = {
assert(Parse.parseCollection[Int, immutable.List[Int]](immutable.List).parse("1|2|3").contains(1 :: 2 :: 3 :: immutable.Nil))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It’s unfortunate that the FromSpecificIterable can not be available implicitly. That’s a requirement to do generic typeclass derivation.

/**
* Builds a collection of type `C` from elements of type `A`
* @tparam A Type of elements (e.g. `Int`, `Boolean`, etc.)
* @tparam C Type of collection (e.g. `List[Int]`, `TreeMap[Int, String]`, etc.)
*/
trait FromSpecificIterable[-A, +C] extends Any {
trait FromSpecificIterable[-A, +C] extends Any with BuildFrom[Any, A, C] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where do we rely on the fact that FromSpecificIterable is a BuildFrom?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whenever you pass a factory object instead of an implicit BuildFrom. All factories provide an implicit conversion to FromSpecificIterable.

@julienrf
Copy link
Contributor

Also, we have to figure out the issue with the dotty compilation in this PR.

@szeiger
Copy link
Contributor Author

szeiger commented Jun 16, 2017

@julienrf As discussed previously we want strict methods like groupBy on Iterable, so providing builders for everything is unavoidable. Building without a source collection is also supported here via FromSpecificIterable. This is unchanged from #85. The only difference is that FromSpecificIterable also provides the right Builder. This is now used in GenericTest instead of BuildFrom.

@julienrf
Copy link
Contributor

As discussed previously we want strict methods like groupBy on Iterable, so providing builders for everything is unavoidable.

Well, this builder can be protected and used by operation implementations only (as in #110 and #111)

@odersky
Copy link
Contributor

odersky commented Jun 16, 2017

I'd like to review this but can only do it after PLDI. So it will be probably a week until you have the review. Thanks for holding that long...

* @tparam A Type of elements (e.g. `Int`, `Boolean`, etc.)
* @tparam C Type of collection (e.g. `List[Int]`, `TreeMap[Int, String]`, etc.)
*/
trait BuildFrom[-From, -A, +C] extends Any {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does making the C a type parameter instead of a type member do for us?

Copy link
Contributor Author

@szeiger szeiger Jun 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would prefer a type member (and I did use one in my original implementation in #45) but FromSpecificIterable in Martin's simplified version uses a type parameter so I used the same here. I think we should use type members for both. Unlike CanBuildFrom, lookup for BuildFrom and FromSpecificIterable is never determined by this type. It is strictly an "out" parameter.

}

/** Build the source collection type from a SortedMapOps */
implicit def buildFromSortedMapOps[CC[K, V] <: SortedMap[K, V] with SortedMapOps[K, V, CC, _], A, B, E : Ordering, F]: BuildFrom[CC[A, B], (E, F), CC[E, F]] = new BuildFrom[CC[A, B], (E, F), CC[E, F]] {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most people won't have to look at this, but it's still pretty imposing with seven (?!) separate type parameters (five "real", two parameterizing CC).

@Ichoran
Copy link
Contributor

Ichoran commented Jun 17, 2017

I'm also quite comfortable with this approach. It feels a little more flexible but a little more imposing than Julien's #112. I can't easily tell what one could accomplish here that one couldn't there, but overall the type member strategy, despite requiring a little more magic, seems overall a bit friendlier.

Both are pretty good, though, for what the tests cover. Do either of you have a clear idea of what the most important differences are for user-level code? I wasn't able to easily distinguish them.

Also, I think there's a principle underlying builders that should simplify matters: the only reason you care about the From collection type is when you want it to be the same as the To type (or as close as possible). Otherwise, you just want to get the elements and you don't really care what From is (beyond any constraints you put on the type of the collection itself, but the method etc parameters will see to that). So you might care about From, A or A, To, but not about From, A, To.

This suggests to me that we probably don't need an explicit From, A, To set of types like in CBF, but can do with type members as in #112. But that's a hunch based on the logical structure; are there counterexamples?

@julienrf
Copy link
Contributor

Do either of you have a clear idea of what the most important differences are for user-level code?

Both solutions seem to be equivalent. The only difference I noted is that in #112, we can get an implicit Parse[List[…]] instance (see here), whereas in this PR we have to explicitly provide the factory (see here). But I think it might just be a choice from @szeiger and that it should be possible to get implicit instances (wdyt, @szeiger?).

Also, in #112 I keep differentiating factories that have builders from those that don’t. I’d rather not expose a LazyList.newBuilder() method nor a View.newBuilder() method.

@szeiger
Copy link
Contributor Author

szeiger commented Jun 26, 2017

It should be possible to add an implicit method to each factory type that provides a BuildFrom[Any, ...]. This would give you type-driven lookups. This may require also moving the implicits from the BuildFrom companion object to get the priorities right when the source type is known.

@szeiger
Copy link
Contributor Author

szeiger commented Jun 26, 2017

One important difference is that in #112 AFAICT all Builders are constructed by expected target type, not driven by the source collection, whereas here the default implicits fall back to the source collection. This allows you to do transformations independent of the static type: The static type of the source determines the level of detail of see of the static type of the target but the implementation class is always the same. This fixes one of the consistency problem of the current collections' CanBuildFrom (where the type-driven lookup is similar to #112).

@szeiger
Copy link
Contributor Author

szeiger commented Jun 26, 2017

Also note that the source-driven transformation is consistent with the new way of performing operations without CanBuildFrom. In the current collections it works in some cases but not in others because there are no advanced builder types for maps and sorted collections.

@julienrf
Copy link
Contributor

in #112 AFAICT all Builders are constructed by expected target type, not driven by the source collection

They are if you ask for an implicit BuildFrom.

But if you ask for an implicit BuildTo then that’s the opposite: you directly ask for a builder for your target collection.

The static type of the source determines the level of detail of see of the static type of the target but the implementation class is always the same.

Indeed this feature is not supported by my PR: if the static type of the source is Seq then you’ll get a default Seq implementation as a result of the transformation, whatever the runtime type of the source collection was.

@szeiger
Copy link
Contributor Author

szeiger commented Jun 27, 2017

I've been playing around with type-driven building. I run into the problem that I already encountered previously that BuildFrom for sorted maps and sorted sets are ambiguous. This is the reason why this PR has a BuildFromLowPriority but this solution doesn't work for implicits in the factories. I believe the solution is a SortedIterableOps in parallel to SortedMapOps so we get a clear split between unary and binary sorted collections.

This still leaves the problem of distinguishing between the static and dynamic BuildFrom instances. With From being contravariant, a BuildFrom[Any, ...] is preferred but we want it the other way around.

@Ichoran
Copy link
Contributor

Ichoran commented Jun 28, 2017

With regard to the actual type, not the statically visible type, being the collection: I think that's important behavior, probably important enough to favor this approach. But if it can be made to work with the simplified type parameters (pushing To to a type member, and using Aux under a friendly name if needed), that would be even better.

Why do the implicits in the factories not obey priority? Or is it just that you need to create a low priority thing for them each time which is awkward?

@szeiger
Copy link
Contributor Author

szeiger commented Jun 28, 2017

For sorted maps you get a BuildFrom[..., ..., TreeSet[...]] and a BuildFrom[..., ..., TreeMap[...]], neither of which is more specific than the other. I'll squash and rebase this PR and I will also try to fix this issue by restricting buildFromSortedOps to a new SortedIterableOps type so it doesn't apply to SortedMapOps anymore.

@szeiger
Copy link
Contributor Author

szeiger commented Jun 28, 2017

OK, restricting to SortedSetOps instead of SortedOps doesn't solve the problem. There are still ambiguities between either one of buildFromMapOps / buildFromSortedMapOps and buildFromIterableOps. This used to work in #45. It is not clear to me why it doesn't work anymore with the simplified design. Both Scala and Dotty give the same results and switching back to a type To type member doesn't make a difference, either. I'll keep the BuildFromLowPriority for now.

@julienrf julienrf mentioned this pull request Jun 29, 2017
julienrf added a commit that referenced this pull request Jun 29, 2017
@julienrf julienrf closed this Jun 29, 2017
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants