-
Notifications
You must be signed in to change notification settings - Fork 1
Add script to validate upstream references in PR branch commits #22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: mainline
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull Request Overview
Adds a new Python script to validate upstream Linux kernel commit references in a PR branch by checking each referenced SHA exists in the mainline branch and detecting any Fixes:
tags.
- Introduces
check_kernel_commits.py
with functions to list PR commits, extract upstream commit references, verify existence, and find fixes. - Provides both plain and GitHub-comment-friendly (“pretty”) output modes.
- Handles missing references and summarizes findings or success at the end.
Comments suppressed due to low confidence (3)
check_kernel_commits.py:9
- Core CLI logic and helper functions currently lack any accompanying tests. Consider adding unit tests for
run_git
,hash_exists_in_mainline
,find_fixes_in_mainline
, and the main argument-parsing flow to prevent regressions.
def run_git(repo, args):
check_kernel_commits.py:127
- [nitpick] Using hard-coded space counts for
subsequent_indent
can be brittle. Consider deriving the indent from the length of the status tag ('[NOTFOUND] '
) or defining a constant to improve readability and reduce magic values.
wrap_paragraph(header, width=80, initial_indent='', subsequent_indent=' ')
check_kernel_commits.py:43
- The
git cat-file -e {upstream_ref}^{hash_}
syntax is invalid for checking if a commit exists on a branch. Consider usinggit merge-base --is-ancestor {hash_} {upstream_ref}
orgit rev-list {upstream_ref} | grep -q {hash_}
to verify ancestry.
['git', '-C', repo, 'cat-file', '-e', f'{upstream_ref}^{hash_}'],
19700ef
to
2198c68
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is Great maybe just some cleanups
for refname, refval in [('upstream reference', upstream_ref), | ||
('PR branch', args.pr_branch), | ||
('base branch', args.base_branch)]: | ||
if not ref_exists(args.repo, refval): | ||
missing_refs.append((refname, refval)) | ||
if missing_refs: | ||
for refname, refval in missing_refs: | ||
print(f"ERROR: The {refname} '{refval}' does not exist in the given repo.") | ||
print("Please fetch or create the required references before running this script.") | ||
sys.exit(1) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not in favor of this.
parser.add_argument
includes a required=True
functional argument to fail out early. This seems like an easy way to mess something up in the future.
See Reference: https://github.com/ctrliq/kernel-src-tree-tools/blob/mainline/rolling-release-update.py#L85
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The 'repo', 'pr_branch', and 'base_branch' arguments are defined as positional arguments (they aren't optional with - or --). So if you don't have three arguments you'll get an error right away. Once we have those arguments this code is checking to make sure that those references actually exist in the git repo we are operating on (along with the implied reference origin/kernel-mainline).
check_kernel_commits.py
Outdated
parser.add_argument("repo", help="Path to the git repo") | ||
parser.add_argument("pr_branch", help="Name of the PR branch") | ||
parser.add_argument("base_branch", help="Name of the base branch") | ||
parser.add_argument("--pretty", action='store_true', help="Output in Markdown, suitable for GitHub PR comments") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is pretty the correct term? I guess when I see --pretty
I expect a TUI pretty ... but since its MD for the GITHUB actions maybe this is ok.
I think maybe --markdown
or --automation
or something
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like --markdown
upstream_hashes = re.findall(r'^commit\s+([0-9a-fA-F]{12,40})', msg, re.MULTILINE) | ||
for uhash in upstream_hashes: | ||
short_uhash = uhash[:12] | ||
exists = hash_exists_in_mainline(args.repo, upstream_ref, uhash) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe just a simple comment that for this we're checking that the PR commit
exists upstream.
It took me a quick minute for me to realize what was happening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will add
check_kernel_commits.py
Outdated
"""Return commits in upstream_ref that have Fixes: <first 12 chars of hash_> in their message.""" | ||
short_hash = hash_[:12] | ||
output = run_git(repo, [ | ||
'log', upstream_ref, '--grep', f'Fixes: {short_hash}', '--format=%h %s (%an)' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can do multiple greps lets do both upper and lower case because not everyone is consistent
'--grep', f'fixes: {short_hash}'
Are the upstream fixes
consistently 12 or are do we sometimes see 8's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Checking for upper and lower sounds good. Not sure about 8s. I'll look.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whipped up this quick oneliner. Didn't validate it a whole lot. But it does look like there are ~1600 fixes references with a hash <12.
brett@iconium ~/linux % git log | grep -E -i " Fixes: [a-fA-F0-9]{1,40} \(" | awk '{print $2}' | xargs -I@ sh -c 'echo -n @ | wc -m' | sort -n | uniq -c
10 6
282 7
236 8
220 9
464 10
419 11
105987 12
5335 13
1206 14
273 15
2202 16
61 17
46 18
35 19
30 20
14 21
7 22
7 23
7 24
6 25
1 26
2 27
1 29
1 31
4 32
1 38
102 40
check_kernel_commits.py
Outdated
|
||
def hash_exists_in_mainline(repo, upstream_ref, hash_): | ||
"""Check if a commit hash exists in the upstream reference.""" | ||
result = subprocess.run( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not using run_git
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will change
c74c6b5
to
c87fea4
Compare
This script scans each commit in a PR branch for upstream Linux kernel commit references and validates those references. If a commit references an upstream commit, the script checks that the referenced commit exists in mainline and reports if it has been marked with a Fixes: tag in the upstream kernel. Usage: python3 check_kernel_commits.py <repo_path> <pr_branch> <base_branch> [--markdown] By default, the script outputs results for terminal display. Use the --markdown flag to format output for GitHub PR comments.
c87fea4
to
1144d0b
Compare
So this might be a problem with vibe coding, or maybe more specifically with me vibe coding. The git cat-file method for verifying upstream commits exist was not actually working in what I checked in. Not sure how I got to that point because the script was definitely able to find bogus commits at one point. In the current version that function is now using git merge-base --is-ancestor which seems to work and is perhaps easier to reason about by the casual observer anyway. 🤷♂️ |
This script scans each commit in a PR branch for upstream Linux kernel commit references and validates those references. If a commit references an upstream commit, the script checks that the referenced commit exists in mainline and reports if it has been marked with a Fixes: tag in the upstream kernel.
Usage:
python3 check_kernel_commits.py <repo_path> <pr_branch> <base_branch> [--markdown]
By default, the script outputs results for terminal display. Use the --markdown flag to format output for GitHub PR comments.
This kernel-src-tree PR ctrliq/kernel-src-tree#323 includes a github action that uses this script to comment on the PR. There is a lot of messy history in the PR due to testing, but the interesting thing to look at is the github action ctrliq/kernel-src-tree@b82196f and the comment that the github actions bot made flagging the bogus upstream commit reference and available upstream fixes