Skip to content

Is there any way to get last commit of a certain file? #588

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
kvzn opened this issue Jul 9, 2020 · 6 comments
Open

Is there any way to get last commit of a certain file? #588

kvzn opened this issue Jul 9, 2020 · 6 comments

Comments

@kvzn
Copy link

kvzn commented Jul 9, 2020

It should be something like the command git log --follow FILENAME. revwalk might work but would lead to tons of computing. Do we have another ways? thank you!

@alexcrichton
Copy link
Member

This might be best to ask libgit2 itself since this library only wraps libgit2. I'm not personally familiar myself with an API to do this, but I don't have an encyclopedic knowledge of the API.

@extrawurst
Copy link
Contributor

@kevinzheng I was looking for something similar and turns out revwalk is the way to go, its how TortoiseGit does it aswell: https://github.com/TortoiseGit/TortoiseGit/blob/master/src/TortoiseShell/GITPropertyPage.cpp#L369

here is a good reference issue in libgit2: libgit2/libgit2#495

@extrawurst
Copy link
Contributor

extrawurst commented Jul 9, 2020

@kevinzheng although I am kind of intrigued to benchmark this approach against using the blame functions: https://libgit2.org/libgit2/#HEAD/type/git_blame then each blame_hunk contains the commit and git_signature which in turn contains a git_time

@kvzn
Copy link
Author

kvzn commented Jul 15, 2020

@extrawurst @alexcrichton would you pls take a look at my implementation? It looks like working, but I haven't tested the performance, and I don't know how to handle the commits with multiple parents, thank you!

#[derive(Debug, Deserialize, Serialize, PartialEq, Clone)]
pub struct Commit {
    pub commit_id: String,
    pub message: String,
    pub time: NaiveDateTime,
    pub author: Signature,
    pub committer: Signature,
}
pub fn last_commit_of_file_or_dir(
    repo: &Repository,
    file_path: &str,
    from_commit_id: Option<&str>,
) -> Result<crate::beans::Commit, AppError> {
    let mut revwalk = repo.revwalk()?;
    revwalk.set_sorting(git2::Sort::TIME)?;

    match from_commit_id {
        Some(from_cid) => match Oid::from_str(from_cid) {
            Ok(oid) => revwalk.push(oid)?,
            Err(e) => return Err(AppError::Git2Error(e)),
        },
        None => revwalk.push_head()?,
    }

    while let Some(oid) = revwalk.next() {
        let oid = oid?;

        if let cmt = repo.find_commit(oid)? {
            let tree = cmt.tree()?;

            let old_tree = if cmt.parent_count() > 0 {
                // TODO: multiple parents???
                let parent_commit = cmt.parent(0)?;
                Some(parent_commit.tree()?)
            } else {
                None
            };

            let mut opts = DiffOptions::new();
            let diff = repo.diff_tree_to_tree(old_tree.as_ref(), Some(&tree), Some(&mut opts))?;

            let mut deltas = diff.deltas();

            let contains = deltas.any(|dd| {
                let new_file_path = dd.new_file().path().unwrap();
                // File || Dir
                new_file_path.eq(Path::new(&file_path)) || new_file_path.starts_with(&file_path)
            });

            if contains {
                let c = git2_commit_to_our_commit(&cmt)?;
                return Ok(c);
            }
        }
    }
    return Err(AppError::CommandError(format!(
        "Failed to get last commit of file {}!",
        &file_path
    )));
}
fn git2_commit_to_our_commit(commit: &git2::Commit) -> Result<crate::beans::Commit, AppError> {
    let message = commit.message().unwrap_or("").to_string();

    let author = crate::beans::Signature {
        user_id: None,
        name: commit.author().name().unwrap_or("".as_ref()).to_string(),
        email: commit.author().email().unwrap_or("".as_ref()).to_string(),
    };

    let committer = crate::beans::Signature {
        user_id: None,
        name: commit.committer().name().unwrap_or("".as_ref()).to_string(),
        email: commit
            .committer()
            .email()
            .unwrap_or("".as_ref())
            .to_string(),
    };

    let time = git2_time_to_chrono_time(commit.time());

    Ok(crate::beans::Commit {
        commit_id: commit.id().to_string(),
        message,
        time,
        committer,
        author,
    })
}

@Shnatsel
Copy link
Member

Shnatsel commented Jun 6, 2021

It appears that this is a widely requested feature - nearly every language wrapper has a feature request for it - e.g. libgit2/pygit2#231. However, it's not implemented in git2 - here's the upstream feature request: libgit2/libgit2#495.

Someone has contributed a custom implementation for the C# bindings, although I haven't looked at it in detail: libgit2/libgit2sharp#963

I've rolled my own implementation, but it reports different timestamps compared to git log for half of the files in the repo I care about.

For ease of testing I list the timestamps for all the files that ever existed in the repository, rather than attempting to filter further. Here's my code:

// Copyright 2021 Google, inc.
// SPDX-License-identifier: Apache-2.0

use std::{collections::HashMap, path::PathBuf};
use git2::{Commit, Repository, Tree, Error};

fn main() -> Result<(), Error> {
    let mut mtimes: HashMap<PathBuf, i64> = HashMap::new();
    let repo = Repository::open(".")?;
    let mut revwalk = repo.revwalk()?;
    revwalk.set_sorting(git2::Sort::TIME)?;
    revwalk.push_head()?;
    let mut newer_commit: Option<Commit> = None;
    let mut newer_commit_tree: Option<Tree> = None;
    for commit_id in revwalk {
        let commit_id = commit_id?;
        let commit = repo.find_commit(commit_id)?;
        if commit.parent_count() > 1 {
            // ignore merge commits because they touch lots of files
            // without any of them being actually modified
            continue;
        }
        let tree = commit.tree()?;
        // check if this is not the very first commit, then we have nothing to diff
        if let Some(newer_commit_tree) = newer_commit_tree {
            let diff= repo.diff_tree_to_tree(Some(&tree), Some(&newer_commit_tree), None)?;
            for delta in diff.deltas() {
                let file_path = delta.new_file().path().unwrap();
                let file_mod_time = newer_commit.as_ref().unwrap().time();
                let unix_time = file_mod_time.seconds();
                mtimes.entry(file_path.to_owned()).or_insert(unix_time);
            }
        }
        newer_commit = Some(commit);
        newer_commit_tree = Some(tree);
    }
    for (path, time) in mtimes.iter() {
        println!("{:?}: {}", path, time);
    }
    Ok(())    
}

Here's a (slower) reference BASH implementation using git log that outputs the data in the same format for ease of comparison:

#!/bin/bash
git ls-files | while read FILENAME; do 
    TIME=$( git log -1 --format="%ct" -- "$FILENAME" )
    echo "\"${FILENAME#./}\": $TIME"
done

The BASH version aligns with the output of git whatchanged --pretty='%ct', but my git2-based impl does not. git2-based implementation tends to report newer dates than those in git whatchanged.

Fixes I've attempted:

  • I've tried using %at (author time, not committer time) in the BASH version, which made a slight difference for the worse
  • I've tried filtering out merge commits, but that didn't seem to make any difference.

¯\_(ツ)_/¯

Edit: Ah, that's probably because I'm walking the commit log chronologically using git2::Sort::TIME. If I instead walk them by parent links, it should work better.

@Shnatsel
Copy link
Member

Shnatsel commented Jun 8, 2021

Okay, this works:

// Copyright 2021 Google, inc.
// SPDX-License-identifier: Apache-2.0

use std::{cmp::max, collections::HashMap, path::PathBuf};
use git2::{Repository, Error};

fn main() -> Result<(), Error> {
    let mut mtimes: HashMap<PathBuf, i64> = HashMap::new();
    let repo = Repository::open(".")?;
    let mut revwalk = repo.revwalk()?;
    revwalk.set_sorting(git2::Sort::TIME)?;
    revwalk.push_head()?;
    for commit_id in revwalk {
        let commit_id = commit_id?;
        let commit = repo.find_commit(commit_id)?;
        // Ignore merge commits (2+ parents) because that's what 'git whatchanged' does.
        // Ignore commit with 0 parents (initial commit) because there's nothing to diff against
        if commit.parent_count() == 1 {
            let prev_commit = commit.parent(0)?;
            let tree = commit.tree()?;
            let prev_tree = prev_commit.tree()?;
            let diff= repo.diff_tree_to_tree(Some(&prev_tree), Some(&tree), None)?;
            for delta in diff.deltas() {
                let file_path = delta.new_file().path().unwrap();
                let file_mod_time = commit.time();
                let unix_time = file_mod_time.seconds();
                mtimes.entry(file_path.to_owned())
                .and_modify(|t| *t = max(*t, unix_time) )
                .or_insert(unix_time);
            }
        }
    }
    for (path, time) in mtimes.iter() {
        println!("{:?}: {}", path, time);
    }
    Ok(())    
}

A MIT/Apache licensed version can be found here.

Edit: although it looks like this code will miss files only touched in the initial commit. A solution can be found here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants