-
Notifications
You must be signed in to change notification settings - Fork 646
Sorting crates by top recently downloaded #857
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I'm confused as to why this needs to join at all? Could this do what you need? query = query.order(crate_downloads.select(sum(downloads))
.filter(date.gt(now - 90.days()))
.filter(crate_id.eq(sql::<Integer>("crates.id")))) |
I thought I needed the join to associate |
I think it also needs to join because we want to show the recent download count in the UI. Is it possible to get the crate's ID, crate's name (and other metadata) as well as the # of downloads in the last 90 days, sorted by the number of downloads, without a join? |
One potential idea I tried in parallel was to introduce a view which can then be joined to by Diesel. I have an example repo available. This might work, but is unfortunate because I couldn't figure out how to get Diesel to infer the schema for the view, which means that the Rust representation could go stale. It doesn't matter in this case, but a view would also be impossible to "parameterize", which is something I've seen in other cases of joining to a subselect. |
You don't need to join in order to do that. You can just order by a subselect like I wrote above. The query plan generated is nearly identical |
Ah I see where I misunderstood now |
Still, it shouldn't need to join onto a subselect -- I'd think this would work fine: let recent_downloads = sql::<BigInt>("SUM(crate_downloads.downloads)");
crates.inner_join(crate_downloads::table)
.group_by(crates::id)
.filter(crate_downloads::date.gt(now - 90.days()))
.select((crates::all_columns, recent_downloads))
.order(recent_downloads); That still doesn't solve the issue mentioned here though. It sounds like you just need to completely branch this case off, which you'd have to do regardless of Diesel. The type that you're returning from the query would be different, and the response you return would therefore be different as well. It'd have to be something like: if sort_by_recent_downloads {
// perform the full query, including execution, returning the result
} else {
// what the method does now
} |
Okay, thank you. I'm confused about your statement of the type that we're returning from the query being different, could you explain that a bit more? The function currently works returning only the top recently downloaded results, it's only when I try to incorporate it into the main query where I start getting the errors. Do you mean that the query we are trying to construct is a different type from the queries constructed in the other if/else blocks and thus is not able to type check correctly, but that the overall return type of the function is still consistent? |
I was referring to this comment above:
If we are to do that, we'd be selecting additional data in this case that isn't present in any of the other branches, which means the query is now different from the rest of the function. |
@sgrif the data types don't have to be different though-- we could either:
sooo are there remaining problems? |
Would likely be the simplest solution in the short term. Anything else will either require some new features in Diesel, or a pretty deep refactoring of this function. |
I'm going to open a PR to at least add the join methods onto boxed queries. It won't fully allow this, since the type would still change from let data = if sort == "recent_downloads" {
query.inner_join(crates_downloads::table)
.select(all_columns_but_with_sum_where_total_downloads_is)
.group_by(crates::id)
.filter(crate_downloads::date.gt(now - 90.days()))
.paginate(limit, offset)
.load(&*conn)
} else {
query.paginate(limit, offset).load(&*conn)
}; |
diesel-rs/diesel#1016 would be needed as well. |
Thanks, that would likely be helpful. I tried the query you wrote in the previous comment:
translated in our case to:
and received the following error:
I recall getting this error before, and think that this is why we decided that a subquery was necessary. Do you have any insight into if this could be caused by a different problem? |
Never mind that PR has a bug |
ohhhhhh i thought it was the join that doesnt' work with into_boxed |
It's both. Well, you can join before boxing, but |
@natboehm Can you give this a try using #852 as a base branch and then adding this to Cargo.toml: [replace]
"diesel:0.14.0" = { git = "https://github.com/diesel-rs/diesel.git", branch = "sg-crates-io-brainstorming" } To see if the query I mentioned above works for this use case with those two changes? |
Sure, when trying to run the backend server I received this error message:
|
/headdesk Fixed... (We have little test coverage for the feature since it's not "officially" supported) |
Awesome, it seems to be working, the |
Yeah, I'll merge them once CI is green. @carols10cents What do we want to do for the crates.io dependency? I'm not planning a release for a few months. Are we comfortable pointing at git? |
From running the tests we've found that this query doesn't work for crates with no downloads. In order to support crates with no downloads, we need to change the
Essentially this is what we need to generate. |
You'd have to reach into our internals in order to do that currently. use diesel::query_source::joins::LeftOuter;
query.join(crate_downloads::table, LeftOuter, crates::id.eq(crate_downloads::crate_id).and(crate_downloads::date.gt(now - 90.days())) |
I'll have an actual API for this some time this weekend. The way to do this will be: query.left_join(crate_downloads::table.on(crates::id.eq(crate_downloads::crate_id).and(crate_downloads::date.gt(now - 90.days()))) |
Sounds good, thanks! The query works with Diesel 0.14.1, some of the rows with zero downloads end up null and it looks like you added an update to deal with that issue. |
API added in diesel-rs/diesel#1026 |
I'm currently working on issue #702, sort crates by number of downloads in the past 90 days. In the function
index
of the filesrc/krate.rs
, I'm having trouble converting the SQL query to Diesel and incorporating it into the already formedquery
variable statement.Even when the
inner_join
is placed above theinto_boxed
function, as @sgrif recommended in my previous pull request #817, a different error persists in which the types are not matching when executing the query. My latest attempt places theinner_join
afterinto_boxed
, as I could not figure out how to keep the logic of the query while callinginner_join
beforeinto_boxed
.The query written in
src/recent_downloads_query.sql
currently works when inserted in straight SQL with Diesel. This is done in lines 912 - 916 ofsrc/krate.rs
. Currently when served locally, this branch is able to serve only a list of crates sorted by number of downloads in the past 90 days. All other functionality withinfn index
is commented out. I did this to show that the query does successfully get the top crates downloaded in the past 90 days. I can't seem be able to translate correctly to Diesel, or to be able to incorporate it to work correctly with the initialization ofquery
and the other ways in whichquery
is used.In terms of functionality, when on the search page we want to be able to both show the list of crates sorted by top recently downloaded and list their respective recent download count instead of the all-time download count. We want this in addition to the current state of sorting crates by all-time downloads.
I'd appreciate some help translating my query into the Diesel code, as well as some serious guidance in incorporating it into the variable
query
in the functionindex
insrc/krate.rs
.My latest attempt at the Diesel query currently resides in
krate.rs
, lines 820 - 822. The working SQL query is insrc/recent_downloads_query.sql
.@sgrif @carols10cents