-
Notifications
You must be signed in to change notification settings - Fork 780
generate historycache for directories #1704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Do you use history cache ? If you run the |
git log returns instant. I don't know any history cache setting. I am using OpenGrok with stock settings. There is historycache directory under /var/opengrok/data folder. 1.7G ./index |
That's strange. Could you inspect/instrument what the systems (both client and server) are doing during those 15 seconds ? e.g. is it the client that is CPU loaded or the server is performing some heavy I/O ? Also, what OpenGrok version are you running ? Since #1049 even if the history for given directory/file is very long, the output will be paginated so should not take too long to display/render. |
I found the culprit. As you pointed out git is source of delay. When I dumped git log to txt file it took 9 seconds. I failed to spot this at first because git dumps log to less directly and displays instantly. I thought process ends and displays afterwards. time /opt/git/bin/git log --abbrev-commit --abbrev=8 --name-only --pretty=fuller --date=iso8601-strict > /home/ethem/test.txt real 0m8.994s Instead of full dump, Log dump can be optimized by using --skip option and cutting pipe when required data is acquired. I.e: real 0m0.014s |
As I am new to git, I understand now that this is git problem. With repacking repository with git repack -a option, duration decreased from 8 seconds to 2.5 seconds. 2.5 seconds is not responsive enough to navigate between history log pages. Log file itself is a few megabytes file (In my instance it is 3.2megs), git log output can be cached and OpenGrok can use this file.. |
Well, if history cache is used, running Do you run the indexer with the -H option ? |
I tried with sudo OPENGROK_GENERATE_HISTORY=on /opt/opengrok-1.1-rc8/bin/OpenGrok index /home/ethem/og/src I couldnt find any compressed xml files under /var/opengrok/data/historycache Am I missing something? |
So what are these 788M in the If you have a project called It seems that history cache generation failed for some reason. Do indexer logs contain anything of interest ? |
There are gz files for each code file in our projects. While browsing code file history, i hadn't any issue, I think those files are caches for single file. I had problems of browsing history of entire repository. URL (takes time): URL(no problems at the moment): |
Aha ! :-) The per-directory history is not cached, so The other option is to create the history cache for directories on demand. Thus only the first display will take long time and subsequent displays (leveraging the incremental history generation using the |
Another idea would be to store historycache at least for top-level directory of given repository since it is available anyway, i.e. change |
The reason for why history is not cached for directories is given in
So if directory cache is implemented, that would mean traversing the directory hierarchy all the way up from the changed file and invalidating all directory cache entries. Or devising better solution. |
latest revision hash can be parsed by git log. after parsing first record, close the command output pipe because we don't need rest of records which gives performance boost. after getting latest revision hash, it can be compared revision hash associated with history cache, and if it is not equal then cache can be invalidated and new cache can be generated. I tried for subdirectories and git log works. |
The latest changeset is easy to acquire via Anyhow, there are (at least) 2 different ways how to approach this:
The first option has the advantage that it might be fast for first couple of history pages but it will get progressively worse (assuming the history is not cached for the session). Also for each page, The advantage of the second option is that once cached and valid, the history fetch will be quick. However, the first request will be always slow. Also, if the repository changes often and reindex is done often too, the cache will be mostly invalid, saving no time. |
getSearchMaxItems can be acquired with git rev-list --count --all $subdir Edit: For git root directory, omitting $subdir is better in performance terms. $ time git rev-list --count --all real 0m0.041s real 0m0.686s performance wont degrade with skip option git log -n $history_per_page --skip ($page-1) * $history_per_page $subdir (I tried "0" for first page and it works) |
Well, it should work not only for git; ideally for other SCMs that support per-directory history retrieval. |
I have one particular very active project that has 9265 commits which spans to 371 pages. It takes about 15 seconds to load each page. Project git repository size ise 890MB.
I have another project with total of 2717 commits which takes to load 1-2 seconds to load which is far better and acceptable duration. This project is sized about 130MB.
I have another project with total of 6955 commits. Sized 93MB and takes to load 1-2 seconds for each page.
It seems when size of git repository increases duration of load time increases. Is it possible to optimize this?
Running Virtualbox on SSD disk with Intel(R) Core(TM) i7-6700 CPU @ 3.40GHz CPU with 64gb ram. Allocated 4 cores with 100% threshold and 16gb of ram.
The text was updated successfully, but these errors were encountered: