-
Notifications
You must be signed in to change notification settings - Fork 762
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Subsequent (2nd/3rd/etc.) indexing taking time #3049
Comments
The I have a branch that retired that merge handling (while accommodating the existence of deleted objects for a time in an index so they don’t appear in search results). The branch doesn’t work anymore after the recent Lucene upgrades, and I haven’t focused on fixing. I’m a little surprised that for your tests with just Mongo the timings do not change at all on subsequent runs. Can you see if the logs shows a lot of entries on subsequent runs? Normally they would have relatively few entries with just a longer-than-expected |
Thank you for your response. Yes, I see 5000 entries on the log files. I am attaching the last one which was from no "git pull" case. It runs about 10 minutes, also. |
Mongo in OpenGrok is afflicted by #2986. The log shows thousands of lines of tag processing. |
I downloaded Mongo, it only has 790 tags, so that means only 790 calls to "git log" to get info about tags on the initial index run. Any incremental will do about 290, which is due to a different bug, that I am also addressing in #2986 . I confirmed this with strace on the initial and 2nd index. The different bug being that the any version with multiple tags, current openwork version only records the first one. Once I submit PR, that will be fixed along with the intended fix of removing the one git log per tag. That being said, in the 1.3.6 docker image, the initial index for mongo source took about 10 minutes and subsequent indexes took 50 seconds, so if you are see thousands of tags, perhaps you mean ctag processing and not git tag processing? UPDATE: I was only looking at uniq tags that were being called with "git log" commands, which the 790 I referred to. But looking back at my trace, there seems to be 3 calls per tag, but that seems to be a function of the threading, as one strace line shows "unfinished". and apparently one of them is the initial exec and another is the resumption. I am not 100 percent sure on this. UPDATE 2: Using Docker 1.3.9 image, results were the same, about 10 minutes for initial index and 46 seconds for subsequent index. |
I believe closed issue #3067 could be a cause of slowness here. |
Hi OpenGrok dev team,
I think this is a kind of question not a bug report.
I'm expecting that the subsequent (2nd, 3rd, etc.) indexing time is much shorter
than initial (1st) indexing time for git repository branches.
However, the subsequent indexing time is about half of the initial indexing time
or worse (longer) for all of our git repository branches. For example, our most
popular (and huge size) branch takes about 42 hours for initial indexing and the
subsequent indexing take about 20 hours. I was expecting to have much shorter.
Is this expected and normal? I hope it's not and you would find the clue to improve
in my configuration/settings.
Here is my environment:
H/W (vm): CPU Xeon 3.2 GHz 8 core, 96GB RAM
S/W: RHEL 7
tomcat-9.0.13, jdk-11.0.2, opengrok-1.3.8
Universal Ctags download and built as of 02/13/2020
git version 2.19.1
/etc/security/limits.conf: soft/hard nofile set as 65536
I did sample process as follows using Mongo DB source code with my sample scripts.
(Please see attached my sample scripts.)
steps:
1st run:
run "./prep.sh temp1" to setup workspace (temp1) and deploy initial war
cd /opt/pisces/workspace/temp1/src
git clone https://github.com/mongodb/mongo
run "./idx.sh temp1" for indexing
==> It took about 10 minutes.
2nd/3rd/etc. run:
cd /opt/pisces/workspace/temps/src/mongo
git pull
run "./idx.sh temp1" for indexing
==> It also took about 10 minutes.
(I also omitted the "git pull" step in the above but the same result.)
idx.sh.txt
prep.sh.txt
The text was updated successfully, but these errors were encountered: