Do unconditional redirects for downloads when the db is broken #3564
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
We had a few database-related outages in the recent weeks, which impacted people downloading crates with Cargo. Download requests are the majority of the traffic we serve, and they're also the most critical ones. Even if the rest of crates.io stops working downloads must continue to be operational to reduce the impact on our users.
As a preface, the downloads endpoint used the database for two reasons: counting the downloads (which can be skipped during outages) and ensuring the crate name is canonicalized. For example, if someone tries to
Foo-bar 1.0.0
by calling/api/v1/crates/foo_Bar/1.0.0/download
the crates.io application canonicalizes the name and redirects the user tohttps://static.crates.io/crates/Foo-bar/Foo-bar-1.0.0.crate
. If we were not to perform canonicalization the user would be redirected tohttps://static.crates.io/crates/foo_Bar/foo_Bar-1.0.0.crate
, which would result in a 404.While analyzing the outages and after investigating the traffic patterns of multiple weeks, we identified that the vast majority of downloads (including all downloads from Cargo) already send the canonical name to crates.io, and only a couple of third-party tools send crate names that are not canonicalized.
This PR changes the downloads endpoint to do unconditional redirects without canonicalizing the crate names during a full database outage. During normal operations the canonicalization will still be performed. The change will allow Cargo builds to continue functioning even without a database, and the really small percentage of requests will get a 404 instead of a 500, which in my view is an acceptable tradeoff.
This PR also adds two metrics:
cratesio_instance_downloads_non_canonical_crate_name_total
: how many download requests we received with a non-canonical crate name. This will allow us to revisit whether the tradeoff is still worth, and it's way easier to query than the previous method we used to extract the data.cratesio_instance_downloads_unconditional_redirects_total
: how many unconditional redirects we performed. We'll want to setup an alert that pages us as soon as the metric is> 0
.Finally, this PR implements full tests for the changes introduced here (tests are actually the bulk of the diff). The test uses the "real" database pool instead of the dummy one used for the rest of the tests, acting on a fresh schema. Creating the fresh schema takes a second or two, so we're only doing it in this test to avoid slowing down the test suite. Also, instead of connecting directly to PostgreSQL the database the pool connects through
ChaosProxy
, a simple proxy implemented in our codebase that allows to break or restore the connection at will.Part of #3541
r? @jtgeibel