database: Change `num_no_build` to be `NOT NULL` #9767

Turbo87 · 2024-10-27T15:07:28Z

This PR was extracted from #9756, because it needs to be deploy separately from the other migrations in that PR.

The backfill SQL script in #9766 has been run on the staging and production databases, so this should be good to go. SELECT * FROM versions WHERE num_no_build IS NULL returns zero rows.

codecov · 2024-10-27T15:17:46Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 88.81%. Comparing base (084d46d) to head (f12378b).
Report is 11 commits behind head on main.

Additional details and impacted files

@@           Coverage Diff           @@
##             main    #9767   +/-   ##
=======================================
  Coverage   88.81%   88.81%           
=======================================
  Files         289      289           
  Lines       29810    29820   +10     
=======================================
+ Hits        26475    26485   +10     
  Misses       3335     3335

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Turbo87 · 2024-10-27T15:56:48Z

src/models/version.rs

+    #[builder(default = strip_build_metadata(num))]
+    num_no_build: &'a str,


argh, I screwed this up... this part of the diff should've been in the previous PR instead. every new version is currently inserted with num_no_build = NULL, which makes the set not null migration in this PR fail 🤦‍♂️

update versions set num_no_build = split_part(num, '+', 1) where num_no_build IS NULL;

I will run ⬆️ right before deploying, which should be enough for this to deploy cleanly...

alright, that worked. all good now! :)

eth3lbert · 2024-11-27T07:41:27Z

When I tried to import data using scripts/import-database-dump.sh today, I encountered an error, as shown below:

...
    \copy "versions" ("bin_names", "checksum", "crate_id", "crate_size", "created_at", "downloads", "edition", "features", "has_lib", "id", "license", "links", "num", "published_by", "rust_version", "
updated_at", "yanked") FROM 'data/versions.csv' WITH CSV HEADER
ERROR:  null value in column "num_no_build" of relation "versions" violates not-null constraint
...

I believe this error occurred because the "num_no_build" column is set to NOT_NULL and we also exclude it from our dump data. This then violates the NOT_NULL constraint.

Turbo87 · 2024-11-27T07:53:08Z

ugh... unfortunately that makes a lot of sense 😞

we might have to rethink #9786 🤔

@dtolnay any thoughts on this?

eth3lbert · 2024-11-27T08:06:00Z

Another idea would be to modify the script template, changing the DDL to allow NULL values before copying. We could also include #9767 (comment) in the template to set the value after copying and then set the column back to NOT_NULL after updating.

Turbo87 · 2024-11-27T08:18:57Z

that basic version of the backfill won't help for the full dataset though, because it would cause uniqueness issues on the corresponding index for some of the older versions where we still allowed for duplicates :-/

dtolnay · 2024-11-28T05:14:46Z

If we have an UPDATE command that can do a 100% faithful backfill cheaply, is there a place we can put that? Somewhere in import.sql? If not, let's revert #9786.

This worked for me on today's db-dump:

CREATE TABLE public.versions ( ..., num_no_build VARCHAR, ... );

\copy versions (bin_names, checksum, crate_id, crate_size, created_at, downloads, edition, features, has_lib, id, license, links, num, published_by, rust_version, updated_at, yanked) FROM 'versions.csv' WITH CSV HEADER

UPDATE versions
SET num_no_build = CASE WHEN deduplicate.count = 1 THEN deduplicate.nnb ELSE versions.num END
FROM
  (SELECT crate_id, split_part(num, '+', 1) nnb, count(*)
   FROM versions
   GROUP BY crate_id, nnb) deduplicate
WHERE versions.crate_id = deduplicate.crate_id
  AND split_part(versions.num, '+', 1) = deduplicate.nnb;

ALTER TABLE versions ALTER COLUMN num_no_build SET NOT NULL;

-- example crate with duplicates:
SELECT crate_id, num, num_no_build
FROM versions
WHERE crate_id = 1643
ORDER BY num;

Turbo87 · 2024-11-28T09:34:30Z

If we have an UPDATE command that can do a 100% faithful backfill cheaply

crates.io/migrations/2024-10-24-133507_add-num-no-build-column/up.sql

Lines 8 to 29 in 8b73b26

    
           -- update versions 
        
           --     set num_no_build = split_part(num, '+', 1); 
        
           -- 
        
           -- with duplicates as ( 
        
           --     -- find all versions that have the same `crate_id` and `num_no_build` 
        
           --     select crate_id, num_no_build, array_agg(num ORDER BY id) as nums 
        
           --     from versions 
        
           --     group by crate_id, num_no_build 
        
           --     having count(*) > 1 
        
           -- ), 
        
           -- duplicates_to_update as ( 
        
           --     -- for each group of duplicates, update all versions except the one that 
        
           --     -- doesn't have "build metadata", or the first one that was published if 
        
           --     -- all versions have "build metadata" 
        
           --     select crate_id, num_no_build, unnest(case when array_position(nums, num_no_build) IS NOT NULL then array_remove(nums, num_no_build) else nums[2:] end) as num 
        
           --     from duplicates 
        
           -- ) 
        
           -- update versions 
        
           --     set num_no_build = duplicates_to_update.num 
        
           --     from duplicates_to_update 
        
           --     where versions.crate_id = duplicates_to_update.crate_id 
        
           --     and versions.num = duplicates_to_update.num;

is the script that I used on production when we backfilled the data. I'm not sure if I would call it "cheap", but it worked well enough :D

Somewhere in import.sql?

hmm, let me see if I can adjust our scripts to do this

database: Change num_no_build to be NOT NULL

f12378b

Turbo87 added C-internal 🔧 Category: Nonessential work that would make the codebase more consistent or clear A-backend ⚙️ labels Oct 27, 2024

Turbo87 enabled auto-merge October 27, 2024 15:08

Turbo87 merged commit 622c786 into rust-lang:main Oct 27, 2024
8 checks passed

Turbo87 deleted the num_no_build2 branch October 27, 2024 15:17

Turbo87 commented Oct 27, 2024

View reviewed changes

This was referenced Nov 28, 2024

database_dump: Fix num_no_build import issue #10097

Closed

Revert "Exclude num_no_build from versions.csv db-dump (#9786)" #10107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

database: Change `num_no_build` to be `NOT NULL` #9767

database: Change `num_no_build` to be `NOT NULL` #9767

Uh oh!

Turbo87 commented Oct 27, 2024

Uh oh!

Uh oh!

codecov bot commented Oct 27, 2024

Uh oh!

Turbo87 Oct 27, 2024

Uh oh!

Turbo87 Oct 27, 2024

Uh oh!

eth3lbert commented Nov 27, 2024

Uh oh!

Turbo87 commented Nov 27, 2024

Uh oh!

eth3lbert commented Nov 27, 2024

Uh oh!

Turbo87 commented Nov 27, 2024

Uh oh!

dtolnay commented Nov 28, 2024

Uh oh!

Turbo87 commented Nov 28, 2024 •

edited

Loading

Uh oh!

Uh oh!

		#[builder(default = strip_build_metadata(num))]
		num_no_build: &'a str,

database: Change num_no_build to be NOT NULL #9767

database: Change num_no_build to be NOT NULL #9767

Uh oh!

Conversation

Turbo87 commented Oct 27, 2024

Uh oh!

Uh oh!

codecov bot commented Oct 27, 2024

Codecov Report

Uh oh!

Turbo87 Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

Turbo87 Oct 27, 2024

Choose a reason for hiding this comment

Uh oh!

eth3lbert commented Nov 27, 2024

Uh oh!

Turbo87 commented Nov 27, 2024

Uh oh!

eth3lbert commented Nov 27, 2024

Uh oh!

Turbo87 commented Nov 27, 2024

Uh oh!

dtolnay commented Nov 28, 2024

Uh oh!

Turbo87 commented Nov 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

database: Change `num_no_build` to be `NOT NULL` #9767

database: Change `num_no_build` to be `NOT NULL` #9767

Turbo87 commented Nov 28, 2024 •

edited

Loading