Order x by y #442

Fil · 2021-06-23T20:58:15Z

specify {xsort: true} in a mark to sort the ordinal x domain according to the maximum value of the y (or y2) channel of that mark. The default reducer is max; "sum" is also available to sort bars across facets without having to create a "fake" non-faceted mark.

There doesn't seem to be a need for a complex DSL: reversing the order can be done in the scale options (y: {reverse: true}).

Note that this approach does not take into account any of the filters: for example, a dot mark that doesn't make it to the screen because its radius is negative, will still be used to set the domain's order.

In a previous iteration I had made it possible to pass in a generic function reducer(I, accessor), easy enough to add back if there is a demonstrated need.

I believe this covers the most common cases (and the rest can be done by explicitly setting y.domain).

(One thing that is not covered is the ordering of facets: only x and y are targeted.)

closes #388

build and examples : https://observablehq.com/@fil/order-x-by-y-442

Fil · 2021-06-28T10:35:20Z

Some feedback:

name could be sortX / sortY instead of xsort, ysort.

This would be more in line with Plot's conventions. And would be easier to remember as "sort X" [by Y] / "sort Y" [by X].

why is the xsort option in the 2nd parameter and not the first?

the logic here is that it’s a free-floating option, i.e. not computed during the group operation, but in the scale set-up that happens after all the channels have been computed. However because of the syntax it’s easy to confuse this reducer with the group’s reducers; to fix that we could make an exception in the group’s reducers—if any is called “xsort” we pass it up directly?

enjalot · 2021-06-30T15:55:11Z

name could be sortX / sortY instead of xsort, ysort.

I do think this would be better, matching the rest of Plot's API naming conventions

why is the xsort option in the 2nd parameter and not the first?
I'm not sure I follow the internal logic, but is there any case you would use sortX without a group?
the idea that you could do sortY: "sum" feels very much like it should be in the left parameter to the group. As a user i've come to see the left parameter as the "output" of the group operator.
If there is a usecase where it makes sense as free floating (on the right?) then i wouldn't mind as much, but if its tied to the group I think it should be next to the other group outputs

Fil · 2021-06-30T16:19:51Z

You can use this without a group, see the "confusion matrix" example in https://observablehq.com/@fil/order-x-by-y-442

What makes point 2 difficult is that the reducer applies to the y/y2 channels across the facets, not to the data values across the group (which is what the group reducer does):

"sum" and "max" are one and the same if there is only one mark for each x.
"sum" in a group output reduces the data that is being grouped; "sum" in sortX does a sum of the y/y2 channel values for all marks sharing the same x.

I've added a second penguins example at the bottom of https://observablehq.com/@fil/order-x-by-y-442, that also shows how it can work without grouping, and the difference between "sum" and "max"; I think this could be expanded to "count" and "min", and maybe more (or the same list of reducers as the group reducers).

enjalot · 2021-06-30T18:07:02Z

the new examples helped me understand the position of the arguments.
would agree that ading "count", "min" and other common reducers would be good

enjalot · 2021-07-14T17:59:52Z

trying out the latest changes on RIAA revenue data (usually used to test stacked bar charts)
https://observablehq.com/d/565c5c35ae67b22c

feels like it works nicely.
gave me a thought about sorting the facet dimensions in a similar way... is that really different/difficult?

Fil · 2021-07-14T21:11:29Z

The ideas for sorting the facet domain(s) are captured in #414

mbostock · 2021-08-01T15:51:23Z

I’d like to try a different tack with this—I think we can offer a more general solution here for ordering ordinal domains, not just x → y and y → x. For example, if you use the group transform to generate dots of varying radius and then lay them out along the x-dimension, you might want to sort the x-domain so the dots are drawn in descending order; or if you have a scatterplot with an ordinal fill color, you might want to allocate a categorical domain so that the most frequent color is assigned first, then the next-most frequent, and so on, to maximize discriminability.

I need to play with this, but my initial thought is that we should have a way to denote that a mark’s channel should be used (in conjunction with a reducer) to derive the order of an ordinal scale’s domain. For example, take this bar chart:

Plot.plot({
  marks: [
    Plot.barY(alphabet, {x: "letter", y: "frequency"})
  ]
})

If we want the x-dimension to be sorted by descending frequency (y), perhaps you could say:

Plot.plot({
  marks: [
    Plot.barY(alphabet, {x: "letter", y: {value: "frequency", sort: "x"}})
  ]
})

Or perhaps more explicitly, because you might want to drive more than one ordinal dimension with the same channel, and you might want to specify a different reducer other than median:

Plot.plot({
  marks: [
    Plot.barY(alphabet, {x: "letter", y: {value: "frequency", sort: {x: "median"}}})
  ]
})

Putting aside the exact syntax, I think we want to express a relationship from a mark channel (above, a Plot.barY’s y channel) to an ordinal scale (above, x), in conjunction with a reducer (above median) and possibly a comparator or at least the ability to reverse the natural order (say "-median"). By doing so we avoid repeating how the order is derived (which data to use, which column or accessor, etc.) even when the data is transformed (e.g., through binning or grouping).

mbostock · 2021-08-01T16:01:59Z

Oh, another reason I think it should be a specific channel (as opposed to, say, indicating that the x-domain should be ordered based on all associated y-values for each x-value), is in something like this timeline:

Plot.plot({
  marks: [
    Plot.barX(civilizations, {x1: "start", x2: "end", y: "civilization"})
  ]
})

Here you might want to order y by the (median) start of each civilization (x1) or the (median) end (x2). Both are valid orderings that address different questions. It’s therefore nice to have specificity.

It also occurs to me that you could have an invisible dummy mark solely for the purposes of expressing this order. For example above, what if you want to order the civilizations by total duration? There’s not currently a mark that has such a channel. Or yet another option would be some syntax at the scale definition, but we’d need to specify the data source and support transforms and faceting, too, which makes it tricky.

mbostock

This is very close to what I want. The only thing missing is that I want to generalize the syntax so that this can also be used to drive the order of ordinal color domains. For example when you have a scatterplot, you often want the most prominent dots to get the first color from a categorical domain, the next-most prominent dots to get the second color, and so on to maximize discriminability. And this will be useful for the forthcoming color legend where you often want the swatches to be in order of descending frequency (or sum).

A bonus feature, if I can figure out the syntax for it, is for the sort options to express a descending sort order. Yes, you can do the same thing by adding the reverse option to the corresponding scale, but the primary advantage of this PR is that you can quickly tack on something to a mark definition to drive the ordinal domain order. The shorthand is nice to replace d3.groupSort or equivalent, but it’ll be even nicer if a single option can express a descending order, too.

I’m going to think about this now and hopefully push some commits.

mbostock · 2021-08-18T19:59:36Z

Another possibility is a separate order option, rather than trying to incorporate the order specification into the channel value definition. I suppose it’s a little more verbose since you have to repeat the channel name twice.

Plot.barY(alphabet, {x: "letter", y: "frequency", order: {x: "y"}})

But, it also means that you can reference a derived channel which is computed by a transform. For example if you take advantage of the implicit stack transform, you can reference y1 or y2 as desired.

The above shorthand would also allow a longer form where you can specify the desired reducer for ordering, and potentially reverse and limit (for top N) options, too.

Plot.barY(alphabet, {x: "letter", y: "frequency", order: {x: {value: "y", reduce: "median"}}})

Fil · 2021-08-19T10:24:28Z

A few suggestions in #511

* throw the correct error if a mistaken sort option is given as {x: "-y", limit: 10} * first ladies example plot (demonstrates "group names by first tenure_start") based on Toph Tucker’s https://observablehq.com/@tophtucker/first-ladies-of-the-united-states * document sort x by y * add limit: [lo, hi]

Fil · 2021-08-19T15:50:00Z

lgtm!

Fil requested review from mbostock and enjalot June 23, 2021 20:58

This was referenced Jun 23, 2021

sort groups #414

Closed

Sort a scale’s domain according to a channel, or other scale? #388

Closed

Fil mentioned this pull request Jul 21, 2021

sort bins #334

Closed

Fil force-pushed the fil/order-x-by-y branch from f169f15 to 5e90f95 Compare July 23, 2021 15:50

sortX, sortY (rebased)

e60860b

mbostock force-pushed the fil/order-x-by-y branch from 5e90f95 to e60860b Compare August 18, 2021 16:00

stricter reduce interpretation

2a3b22f

mbostock reviewed Aug 18, 2021

View reviewed changes

prepare channel sorting in mark.initialize

ee8c3bc

mbostock force-pushed the fil/order-x-by-y branch from 08ccb3d to ee8c3bc Compare August 18, 2021 16:58

mbostock added 4 commits August 18, 2021 10:29

allow channels to imply domains

8854b46

remove unused variable

5952c34

use group reducers

f6efa66

use groupSort

935a7d2

mbostock added 5 commits August 18, 2021 13:57

mark order option

6be6992

remove unused order option

5da9023

order options

a106941

fix reverse order

2ff7a1d

order ↦ sort

69fdfa6

mbostock added 4 commits August 18, 2021 22:00

isOptions

5a6eb02

reverse shorthand

b694ed7

alias sort channels

db9df56

shorten

2c949f9

mbostock and others added 2 commits August 19, 2021 08:38

better error for unknown scale

32daf47

mbostock added 12 commits August 19, 2021 09:27

update README

83a9786

update README

9cd103d

update README

d827808

update README

12735a5

update README

d874a36

update README

1d1fa77

update README

19f262a

update README

577231d

order facet channels, too

3f2d250

update README

37f94e8

add googleTrendsRidgeline test

cfe1d9f

min- and max-index

f7e8ebb

Fil mentioned this pull request Aug 19, 2021

facet wrap #332

Closed

3 tasks

negative limit; iterable test

f4eb688

Fil mentioned this pull request Aug 19, 2021

A transform to consolidate ordinal values outside the top n into an “other” category, perhaps in conjunction with the group transform. #144

Open

mbostock added 5 commits August 19, 2021 10:54

update README

6ef29b7

coerce to string once

ea94018

coerce to string

6d131bc

replace ±channel with shared sort options

e25b0da

update README

ca6c455

mbostock merged commit 6bdd1ac into main Aug 19, 2021

mbostock deleted the fil/order-x-by-y branch August 19, 2021 19:52

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Order x by y #442

Order x by y #442

Uh oh!

Fil commented Jun 23, 2021 •

edited

Loading

Uh oh!

Fil commented Jun 28, 2021

Uh oh!

enjalot commented Jun 30, 2021

Uh oh!

Fil commented Jun 30, 2021 •

edited

Loading

Uh oh!

enjalot commented Jun 30, 2021

Uh oh!

enjalot commented Jul 14, 2021

Uh oh!

Fil commented Jul 14, 2021

Uh oh!

mbostock commented Aug 1, 2021

Uh oh!

mbostock commented Aug 1, 2021

Uh oh!

mbostock left a comment

Uh oh!

mbostock commented Aug 18, 2021

Uh oh!

Fil commented Aug 19, 2021

Uh oh!

Fil commented Aug 19, 2021

Uh oh!

Uh oh!

Order x by y #442

Order x by y #442

Uh oh!

Conversation

Fil commented Jun 23, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Fil commented Jun 28, 2021

Uh oh!

enjalot commented Jun 30, 2021

Uh oh!

Fil commented Jun 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

enjalot commented Jun 30, 2021

Uh oh!

enjalot commented Jul 14, 2021

Uh oh!

Fil commented Jul 14, 2021

Uh oh!

mbostock commented Aug 1, 2021

Uh oh!

mbostock commented Aug 1, 2021

Uh oh!

mbostock left a comment

Choose a reason for hiding this comment

Uh oh!

mbostock commented Aug 18, 2021

Uh oh!

Fil commented Aug 19, 2021

Uh oh!

Fil commented Aug 19, 2021

Uh oh!

Uh oh!

Fil commented Jun 23, 2021 •

edited

Loading

Fil commented Jun 30, 2021 •

edited

Loading