Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

History generation respects ignored files #1386

Closed
wants to merge 5 commits into from

Conversation

tulinkry
Copy link
Contributor

@tulinkry tulinkry commented Feb 6, 2017

fixes #1351
fixes #1315
hopefully fixes #1297

@tulinkry tulinkry changed the title History generation respect ignored files History generation respects ignored files Feb 6, 2017
@vladak
Copy link
Member

vladak commented Feb 6, 2017

Any significant performance degradations ?

@vladak
Copy link
Member

vladak commented Feb 7, 2017

Not directly relevant however it would be nice to have some statistics about ignored/accepted file/dir counts (especially in the light of recent bug in IgnoredNames #1383) as it might provide quick sanity check at the end of indexing.

}

if (file.isDirectory()) {
// always acceptFile directories so that their files can be examined
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

acceptFile ?

Copy link
Contributor Author

@tulinkry tulinkry Feb 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Refactor typo. Fixed.

RuntimeEnvironment env = RuntimeEnvironment.getInstance();
if (!env.getIncludedNames().isEmpty()
&& // the filter should not affect directory names
(!(file.isDirectory() || env.getIncludedNames().match(file)))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is a bit hard to read, maybe expand this to && ? Also, the comment should say why - I assume this is because of the directory check down below.

Copy link
Contributor Author

@tulinkry tulinkry Feb 7, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I expanded this. The question why is for @trondn because I'm quite missing the purpose of the included names here.

File f1 = parent.getCanonicalFile();
File f2 = file.getCanonicalFile();
if (f2.equals(f1)) {
LOGGER.log(Level.INFO, "Skipping links to itself...: {0} {1}",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these messages should be self-explanatory (i.e. make it clear what are the strings)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added more info

String srcRoot = env.getSourceRootPath();

if (path.startsWith(srcRoot)) {
if (env.hasProjects()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could do the project != null check here to avoid the substring operation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really here, because then if the project == null then the return value is true which is not the original intention. I moved the code around to avoid the substring operation.

if (absolutePath.startsWith(allowedSymlink)) {
String allowedTarget = new File(allowedSymlink).getCanonicalPath();
if (canonicalPath.startsWith(allowedTarget)
&& absolutePath.substring(allowedSymlink.length()).equals(canonicalPath.substring(allowedTarget.length()))) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd be nice to have the logic explained in a comment

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a comment

@tulinkry
Copy link
Contributor Author

tulinkry commented Feb 7, 2017

I filed #1388 for the statistics addon which would be a non-trivial change in this PR.

@tulinkry
Copy link
Contributor Author

tulinkry commented Feb 7, 2017

Performance:

Of course that the performance must go down due to this because we do more work on the file system checking the files. I tried to improve it so it does not that many operations but still some are neccessary to check it. A future improvement is to collect all files which are accepted once and use it later in the index phase (this is history phase) - now it's done twice (once per phase).

With this in mind when I index the OpenGrok source code there is no significant slow down when compared to the master branch without this PR.

@tulinkry
Copy link
Contributor Author

tulinkry commented Feb 8, 2017

Possibly fixes also #984

@tulinkry
Copy link
Contributor Author

@vladak I made the changes; can you review please?

@vladak
Copy link
Member

vladak commented Feb 14, 2017

It'd probably be prudent to address this in JDBC history cache however given there are some plans to EOF it (#1271) it's not probably necessary assuming the EOL will happen sooner than later.

*
* @param history boolean to set
*/
public void hasHistory(boolean history) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it have to be named like that ? hasHistory implicates to return boolean.

? test
/* this isn't a project's root */
: test.getParentFile();
if (!AcceptHelper.accept(project, parent, test)) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is where the inverse map gets constructed so if there are lots of changesets that touch given file, accept() check will be performed for that file for each changeset. This could be quite taxing on large repos. I'd very much prefer to run the accept() check only after the map is constructed.

import org.opensolaris.opengrok.logger.LoggerFactory;

/**
*
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pls add some comment about where the class is used

public static boolean accept(Project project, File parent, File file) {
try {
File f1 = parent.getCanonicalFile();
File f2 = file.getCanonicalFile();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the numbered naming here is a bit hard to follow

}

/**
* Check if I should accept this file into the database
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it'd nice to have some more info about the function, e.g. mention how exactly project is relevant

}

/**
* Check if this file should be accepted.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same here, state how parent is used in the method

@vladak vladak removed this from the 1.0 milestone Mar 22, 2017
@vladak vladak added this to the 1.1 milestone Mar 31, 2017
@vladak
Copy link
Member

vladak commented Aug 9, 2017

Also, HistoryGuru#addRepositories() should be addressed too - the traversal scans for the repositories inside SCM metadata directories which slows down the process unnecessarily and incurs more I/O.

@vladak
Copy link
Member

vladak commented Aug 28, 2018

Any plan to refresh this ?

@tarzanek tarzanek modified the milestones: 1.1, 1.2 Jan 10, 2019
@ahornace ahornace removed this from the 1.2 milestone Aug 5, 2019
@vladak
Copy link
Member

vladak commented Sep 15, 2020

This was superseded by #3137.

@vladak vladak closed this Sep 15, 2020
@tulinkry tulinkry deleted the history-ignore branch October 9, 2020 09:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants