Extreme memory usage when running it on the linux kernel repo with cliff.toml from this project #1

Byron · 2021-08-12T23:37:01Z

Describe the bug
When running it on https://github.com/torvalds/linux with the cliff.toml from this repository, the git-cliff process will take a lot of time and consume more and more memory. I had to stop it at 12GB.

To Reproduce
Steps to reproduce the behavior:

git clone https://github.com/torvalds/linux
cp cliff.toml ./linux/
cd linux
git cliff

Expected behavior
A log is produced in reasonable time.

System (please complete the following information):

Ran f1b495d on MacOS with 8GB of RAM and M1

The text was updated successfully, but these errors were encountered:

orhun · 2021-08-13T11:52:48Z

Thanks for reporting this.

It turns out this high memory usage happens at the following line:

git-cliff/git-cliff/src/lib.rs

Line 100 in 2b8b4d3

let commits = repository.commits(commit_range)?;

Which calls this function:

git-cliff/git-cliff-core/src/repo.rs

Lines 39 to 51 in 2b8b4d3

    
           	pub fn commits(&self, range: Option<String>) -> Result<Vec<Commit>> { 
        
           		let mut revwalk = self.inner.revwalk()?; 
        
           		revwalk.set_sorting(Sort::TIME | Sort::TOPOLOGICAL)?; 
        
           		if let Some(range) = range { 
        
           			revwalk.push_range(&range)?; 
        
           		} else { 
        
           			revwalk.push_head()?; 
        
           		} 
        
           		Ok(revwalk 
        
           			.filter_map(|id| id.ok()) 
        
           			.filter_map(|id| self.inner.find_commit(id).ok()) 
        
           			.collect()) 
        
           	}

In conclusion I'd say this is most likely caused by git2.

So this issue basically boils down to:

use git2::{Commit, Repository, Sort};
use std::env;

fn main() {
    let repo_path = env::var("LINUX_KERNEL_REPO").expect("repo path is not specified");
    let repo = Repository::open(repo_path).expect("cannot open repo");
    let mut revwalk = repo.revwalk().unwrap();
    revwalk.set_sorting(Sort::TIME | Sort::TOPOLOGICAL).unwrap();
    revwalk.push_head().unwrap();
    let commits: Vec<Commit> = revwalk
        .filter_map(|id| id.ok())
        .filter_map(|id| repo.find_commit(id).ok())
        .collect();
    println!("{}", commits.len());
}

To reproduce:

cargo new --bin repro && cd repro/
# add `git2 = "0.13.21"` to [dependencies] in Cargo.toml
# save the code above as src/main.rs
git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git linux
LINUX_KERNEL_REPO="$(pwd)/linux" cargo run

I think you should also report this to git2, there is not much I can do here.

Byron · 2021-08-13T13:11:25Z

Thanks for investigating this.

It's interesting that running the above I see this:

➜  git-cliff-core git:(main) LINUX_KERNEL_REPO=~/dev/github.com/torvalds/linux/.git /usr/bin/time -lp cargo run --release --example reproduce
    Finished release [optimized] target(s) in 0.09s
     Running `/Users/byron/dev/github.com/orhun/git-cliff/target/release/examples/reproduce`
1015172
real 18.35
user 17.55
sys 0.61
          2273345536  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              199192  page reclaims
                 645  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   1  signals received
                 589  voluntary context switches
                1596  involuntary context switches
        176305556133  instructions retired
         57509393673  cycles elapsed
          1513012480  peak memory footprint

Maybe the real memory explosion happens elsewhere when processing more than a million commits.

orhun · 2021-08-13T13:35:27Z

It's interesting that running the above I see this:

Ah, I just get a similar result. But it took longer due to my low specs I guess.

Maybe the real memory explosion happens elsewhere when processing more than a million commits.

I'm re-investigating this issue. 👍🏼

alerque · 2021-08-13T15:34:29Z

Sadly libgit2 is missing some significant optimizations that the git CLI tooling has. I've run into resource issues like this on much smaller repos than the Linux kernel where the CLI tooling flies right along and the equivalent calls to the library sink the ship.

orhun · 2021-08-13T20:48:05Z

I pushed f859747 and it should affect the performance dramatically. In fact, I was able to generate a changelog from the linux kernel repository this time:

$ cargo run --release -- -r ~/gh/linux/ -c cliff.toml -o LINUX_CHANGELOG

results in:

# Changelog
All notable changes to this project will be documented in this file.

## [unreleased]

### ALSA

- Pcm: Fix mmap breakage without explicit buffer setup
- Hda/realtek: fix mute/micmute LEDs for HP ProBook 650 G8 Notebook PC

### MAINTAINERS

- Update Vineet's email address
- Fix Microchip CAN BUS Analyzer Tool entry typo
- Switch to my OMP email for Renesas Ethernet drivers

### Security

- Igmp: fix data-race in igmp_ifc_timer_expire()

[...]

Can you try it out to see if it's any better?

Byron · 2021-08-14T01:51:28Z

Fantastic, the fix is probably one of the most effective one-line changes I have ever seen!

Here it the tail of my cliff run on the linux kernel:

real 31.97
user 25.32
sys 3.68
          2934489088  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
             1154355  page reclaims
                  59  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
               25376  voluntary context switches
               16458  involuntary context switches
            32094006  instructions retired
            20068389  cycles elapsed
             2786048  peak memory footprint

I think that's quite alright :).

In case you are interested in being even faster, here is another tool to estimate the hours it would take to implement the commits of a repository.

➜  linux git:(master) ✗ /usr/bin/time -lp gix tools estimate-hours
 9:49:55 Traverse commit graph done 1.0M commits in 7.55s (134.5k commits/s)
total hours: 979612.44
total 8h days: 122451.55
total commits = 1015172
total authors: 28234
total unique authors: 21359 (24.35% duplication)
 9:49:56                  find Extracted and organized data from 1015172 commits in 807.375125ms (1257373 commits/s)
real 8.45
user 8.21
sys 1.13
          1743454208  maximum resident set size
                   0  average shared memory size
                   0  average unshared data size
                   0  average unshared stack size
              117714  page reclaims
               11193  page faults
                   0  swaps
                   0  block input operations
                   0  block output operations
                   0  messages sent
                   0  messages received
                   0  signals received
                   3  voluntary context switches
                9337  involuntary context switches
         54347066220  instructions retired
         28594401548  cycles elapsed
           976183360  peak memory footprint

Byron added the bug Something isn't working label Aug 12, 2021

Byron assigned orhun Aug 12, 2021

Byron closed this as completed Aug 14, 2021

lukehsiao mentioned this issue Jan 20, 2024

Context is incorrect when using --unreleased and --tag #457

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extreme memory usage when running it on the linux kernel repo with cliff.toml from this project #1

Extreme memory usage when running it on the linux kernel repo with cliff.toml from this project #1

Byron commented Aug 12, 2021

orhun commented Aug 13, 2021

Byron commented Aug 13, 2021

orhun commented Aug 13, 2021 •

edited

Loading

alerque commented Aug 13, 2021

orhun commented Aug 13, 2021 •

edited

Loading

Byron commented Aug 14, 2021

Extreme memory usage when running it on the linux kernel repo with cliff.toml from this project #1

Extreme memory usage when running it on the linux kernel repo with cliff.toml from this project #1

Comments

Byron commented Aug 12, 2021

orhun commented Aug 13, 2021

Byron commented Aug 13, 2021

orhun commented Aug 13, 2021 • edited Loading

alerque commented Aug 13, 2021

orhun commented Aug 13, 2021 • edited Loading

Byron commented Aug 14, 2021

orhun commented Aug 13, 2021 •

edited

Loading

orhun commented Aug 13, 2021 •

edited

Loading