Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Last indexed time in API? #3423

Closed
danielburrell opened this issue Feb 16, 2021 · 8 comments
Closed

Last indexed time in API? #3423

danielburrell opened this issue Feb 16, 2021 · 8 comments
Assignees

Comments

@danielburrell
Copy link

Last indexed time is displayed on the bottom of the page and indicates to an extent if the indexing process is working as expected.

Is this available via the API / as JSON? I looked at the API docs and couldn't find anything.

@vladak
Copy link
Member

vladak commented Feb 16, 2021

Indeed, there is no API call to get the last indexed time. There is no problem adding it however there are multiple facets to this: the time of the last reindex is normally "stored" as last modified time in the timestamp file metadata under the data root. For per project workflow the projects are indexed individually and hence each has its own timestamp (however this is not finished and one usually needs to workaround #1670). Thinking of this, each project should have its own timestamp. Another problem to consider is the indexer/webapp separation. Normally at the end of the indexing the indexer "pings" the web app to set the timestamp and/or mark the projects indexed (otherwise they will not be presented by the application). It is a question what data should the web app use to present in the API call. For the time being I think it should just stat(2) the timestamp file and return the value. That said, in general, adding an API call to retrieve the last modified time of the timestamp file and present it as JSON is not a problem however it does not address all the use cases.

vladak pushed a commit to vladak/OpenGrok that referenced this issue Feb 16, 2021
@vladak vladak self-assigned this Feb 16, 2021
@vladak vladak closed this as completed in c23e1b8 Feb 17, 2021
@danielburrell
Copy link
Author

danielburrell commented Feb 23, 2021

is the modified time at /opengrok/data/timestamp still updated if no changes to the index are found?
I created a healthcheck in kubernetes that checks the timestamp of the file has been altered in the last hour, the idea being that if the indexer runs every 10 minutes then if the timestamp is more than an hour out of date then the indexer has failed and the container should be restarted. I ended up killing what I think was a health process though. In the case that there are no code changes and so nothing further to index, is it the case that the timestamp will stop updating?

@vladak
Copy link
Member

vladak commented Feb 24, 2021

is the modified time at /opengrok/data/timestamp still updated if no changes to the index are found?

It is not, see the isDirty:

if (!isInterrupted() && isDirty()) {
if (env.isOptimizeDatabase()) {
optimize();
}
env.setIndexTimestamp();
}

@vladak
Copy link
Member

vladak commented Feb 24, 2021

As a workaround, touch a file under source root that was already indexed whenever running indexer. This should help, at least until #3077 is implemented.

@vladak
Copy link
Member

vladak commented Feb 25, 2021

So, if I understand the use case correctly, you would like to have a way to tell when the last indexer run has finished, correct ? Should have really asked that question before starting the work on the indextime API endpoint.

@danielburrell
Copy link
Author

danielburrell commented Feb 25, 2021

So far I had already done what you suggested which is to touch a file at the start of every indexing session. The use case I'm trying to solve is for deploying to kubernetes, specifically I have written a liveness and readiness check which is a command that can be run in the container periodically, and if that command returns exit code 0 all is well, otherwise the service is considered unavailable/unhealthy (depending on the check).

I modified the index.sh script to include a touch /var/opengrokalive on line 11.

This allowed me to use stat /opengrok/data/timestamp for readiness (opengrok is considered ready to serve when it has indexed at least once)

For liveness checks I wrote an isIndexing.sh script to ensure progress is being made and the indexer isn't broken or failed, as follows (pseudo code)

indexTime=stat -c %Y /var/opengrokalive 
now=date 
if (now-indexTime) -ge 86400
then exit 1
else
exit 0

The 1 day figure is because one of our projects is huge and it actually takes a whole day to index.

The reason all this is necessary is because we have found indexing has stopped in the past and we were unsure why. With kubernetes the idea is to restart the container automatically based on the indications given.

Hope this helps provides some context.

I should say I actually prefer the 'command in container' approach rather than using the http API as the API has issues around authenticating which present a challenge for my particular deployment.

@vladak
Copy link
Member

vladak commented Feb 25, 2021

You need to touch a file in one of the projects under the source root. The indexer will then pick it up and update the related document in the index, making it dirty. This will lead to the index time stamp file to be touched.

@vladak
Copy link
Member

vladak commented Feb 25, 2021

As for the API vs. authentication, you can use Bearer token, see https://github.com/oracle/opengrok/wiki/Web-services#authenticationauthorization

vladak pushed a commit that referenced this issue Apr 9, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants