I have some raw open-source code for a modified version of the linux kernel. Ideally I would have a patch so that I could apply it to a newer version of the kernel, but instead I just have the source code, so I'm trying to create that patch myself.
What I'm finding is that when I create a patch and apply it to the newer kernel, I end up reverting a lot of changes. Is there a feature in git that can tell if local changes are reverting previous commits? Or is there some other tool that can find the commit with the least amount of changes (even if it's time consuming and has to be run on my own machine)?
I've been manually narrowing down what commit the source branched from, but it's very time consuming. I found a branch that fairly closely matched and now am trying to figure out what the latest commit was before changes were made to it.
I'll check out a commit A, copy the changed files, do a log on any file that has a lot of stuff taken out to find out if those exact changes were added from commit B, then checkout the commit before commit B, etc, etc...
EDIT: Since this is all pertaining to open source code, I don't see any reason why I can't share links to it here.
The source code released by LGE can be found here. Search for LS970 under Mobile.
The different branches of the MSM kernel can be found here. So far the ics_strawberry
head seems to be the closest. It's one of the few that has a chromeos
folder, which seems like it would be an odd thing to add specifically for a cell phone that wouldn't be running Chrome OS.
Unfortunately, you cannot use git bisect
here because you cannot tell at any commit whether it was bad or good.
Your goal is to find commit that matches best to your target source. I think that best metric of what is "matching best" is the size (in lines) of unified diff/patch.
With this in mind, you can write a script according to following pseudo-code (sorry, weird mix of shell, Perl and C):
min_diff_size = 10000000000
best_commit = none
git branch tmp original_branch # branch to scan
git checkout tmp
for (;;) {
diff_size = `diff -burN -x.git my_git_subtree my_src_subtree | wc -l`;
if (diff_size < min_diff_size) {
min_diff_size = diff_size;
best_commit = `git log --oneline -1`;
}
git reset --hard HEAD~; # rewind back by 1 commit
if (git reset did not work) break;
}
git checkout original_branch
git branch -d tmp
print "best commit $best_commit, diff size $min_diff_size"
You may also want to cycle through kernel branches as well to find best matching branch.
This probably will work slow and take a lot of time (could be hours), but it will find best matching commit.
I guess this could be sped up massively by doing a binary search. Define a range and start in the middle. Now get your metric on 1/4 and 3/4 and repeat the process on [0;1/2] or [1/2;1] depending on where the measurement is lower.
I ended up using a combination of mvp's pseudo-code and Chronial's suggestion to write a Perl script that works surprisingly faster than I would have imagined. I've posted the code on github as
git-diff-search
Awesome! Thanks for converting this idea into something usable! This is how open-source is supposed to work! :)
I just used
git-bisect
today and realized thatgit bisect run
could be used with a much simpler version of this script (possibly using an environment variable to keep track of min_diff_size). I may tweak this before the next time I go to use it.git bisect
helps you to find out breaking change - somewhere between two commits in a branch. In other words, it could only work if for any commit you could say this is bad commit, or this is still good commit. With our problem, you can't really say at any moment whether it is bad or good - it is neither. It is only good when you reached absolute minimum, but you will not know it until you get there.