-
Notifications
You must be signed in to change notification settings - Fork 186
Order x by y #442
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Order x by y #442
Conversation
Some feedback:
This would be more in line with Plot's conventions. And would be easier to remember as "sort X" [by Y] / "sort Y" [by X].
the logic here is that it’s a free-floating option, i.e. not computed during the group operation, but in the scale set-up that happens after all the channels have been computed. However because of the syntax it’s easy to confuse this reducer with the group’s reducers; to fix that we could make an exception in the group’s reducers—if any is called “xsort” we pass it up directly? |
I do think this would be better, matching the rest of Plot's API naming conventions
|
You can use this without a group, see the "confusion matrix" example in https://observablehq.com/@fil/order-x-by-y-442 What makes point 2 difficult is that the reducer applies to the y/y2 channels across the facets, not to the data values across the group (which is what the group reducer does):
I've added a second penguins example at the bottom of https://observablehq.com/@fil/order-x-by-y-442, that also shows how it can work without grouping, and the difference between "sum" and "max"; I think this could be expanded to "count" and "min", and maybe more (or the same list of reducers as the group reducers). |
the new examples helped me understand the position of the arguments. |
trying out the latest changes on RIAA revenue data (usually used to test stacked bar charts) feels like it works nicely. |
The ideas for sorting the facet domain(s) are captured in #414 |
I’d like to try a different tack with this—I think we can offer a more general solution here for ordering ordinal domains, not just x → y and y → x. For example, if you use the group transform to generate dots of varying radius and then lay them out along the x-dimension, you might want to sort the x-domain so the dots are drawn in descending order; or if you have a scatterplot with an ordinal fill color, you might want to allocate a categorical domain so that the most frequent color is assigned first, then the next-most frequent, and so on, to maximize discriminability. I need to play with this, but my initial thought is that we should have a way to denote that a mark’s channel should be used (in conjunction with a reducer) to derive the order of an ordinal scale’s domain. For example, take this bar chart: Plot.plot({
marks: [
Plot.barY(alphabet, {x: "letter", y: "frequency"})
]
}) If we want the x-dimension to be sorted by descending frequency (y), perhaps you could say: Plot.plot({
marks: [
Plot.barY(alphabet, {x: "letter", y: {value: "frequency", sort: "x"}})
]
}) Or perhaps more explicitly, because you might want to drive more than one ordinal dimension with the same channel, and you might want to specify a different reducer other than median: Plot.plot({
marks: [
Plot.barY(alphabet, {x: "letter", y: {value: "frequency", sort: {x: "median"}}})
]
}) Putting aside the exact syntax, I think we want to express a relationship from a mark channel (above, a Plot.barY’s y channel) to an ordinal scale (above, x), in conjunction with a reducer (above median) and possibly a comparator or at least the ability to reverse the natural order (say |
Oh, another reason I think it should be a specific channel (as opposed to, say, indicating that the x-domain should be ordered based on all associated y-values for each x-value), is in something like this timeline: Plot.plot({
marks: [
Plot.barX(civilizations, {x1: "start", x2: "end", y: "civilization"})
]
}) Here you might want to order y by the (median) start of each civilization (x1) or the (median) end (x2). Both are valid orderings that address different questions. It’s therefore nice to have specificity. It also occurs to me that you could have an invisible dummy mark solely for the purposes of expressing this order. For example above, what if you want to order the civilizations by total duration? There’s not currently a mark that has such a channel. Or yet another option would be some syntax at the scale definition, but we’d need to specify the data source and support transforms and faceting, too, which makes it tricky. |
5e90f95
to
e60860b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is very close to what I want. The only thing missing is that I want to generalize the syntax so that this can also be used to drive the order of ordinal color domains. For example when you have a scatterplot, you often want the most prominent dots to get the first color from a categorical domain, the next-most prominent dots to get the second color, and so on to maximize discriminability. And this will be useful for the forthcoming color legend where you often want the swatches to be in order of descending frequency (or sum).
A bonus feature, if I can figure out the syntax for it, is for the sort options to express a descending sort order. Yes, you can do the same thing by adding the reverse option to the corresponding scale, but the primary advantage of this PR is that you can quickly tack on something to a mark definition to drive the ordinal domain order. The shorthand is nice to replace d3.groupSort or equivalent, but it’ll be even nicer if a single option can express a descending order, too.
I’m going to think about this now and hopefully push some commits.
08ccb3d
to
ee8c3bc
Compare
Another possibility is a separate order option, rather than trying to incorporate the order specification into the channel value definition. I suppose it’s a little more verbose since you have to repeat the channel name twice. Plot.barY(alphabet, {x: "letter", y: "frequency", order: {x: "y"}}) But, it also means that you can reference a derived channel which is computed by a transform. For example if you take advantage of the implicit stack transform, you can reference y1 or y2 as desired. The above shorthand would also allow a longer form where you can specify the desired reducer for ordering, and potentially reverse and limit (for top N) options, too. Plot.barY(alphabet, {x: "letter", y: "frequency", order: {x: {value: "y", reduce: "median"}}}) |
A few suggestions in #511 |
* throw the correct error if a mistaken sort option is given as {x: "-y", limit: 10} * first ladies example plot (demonstrates "group names by first tenure_start") based on Toph Tucker’s https://observablehq.com/@tophtucker/first-ladies-of-the-united-states * document sort x by y * add limit: [lo, hi]
lgtm! |
specify {xsort: true} in a mark to sort the ordinal x domain according to the maximum value of the y (or y2) channel of that mark. The default reducer is max; "sum" is also available to sort bars across facets without having to create a "fake" non-faceted mark.
There doesn't seem to be a need for a complex DSL: reversing the order can be done in the scale options (y: {reverse: true}).
Note that this approach does not take into account any of the filters: for example, a dot mark that doesn't make it to the screen because its radius is negative, will still be used to set the domain's order.
In a previous iteration I had made it possible to pass in a generic function reducer(I, accessor), easy enough to add back if there is a demonstrated need.
I believe this covers the most common cases (and the rest can be done by explicitly setting y.domain).
(One thing that is not covered is the ordering of facets: only x and y are targeted.)
closes #388
build and examples : https://observablehq.com/@fil/order-x-by-y-442