Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support last modified header #859

Merged
merged 11 commits into from
Apr 16, 2019
Merged

Conversation

kinow
Copy link
Collaborator

@kinow kinow commented Mar 8, 2019

Fix for #574 .

This pull request adds the support to the HTTP 304 Not Modified header in Skosmos.

Changes are mainly in WebController, for the method that handles displaying the concept information. The logic is as follow:

  • Skip this if config.ttl has skosmos:useModifiedDate = false. Otherwise continue.
  • Call new method getModifiedDate to get the the modified date of the current Concept. It searches for the Concept's dc:modified literal value.
  • If the Concept's dc:modified is not found, then it still tries to get the dc:modified literal value of the main concept scheme (via Vocabulary::getDefaultConceptScheme(), which should return skosmos:mainConceptScheme or the URI of the last concept scheme).
  • Call new method getGitModifiedDate, which executes git log -1 --date=iso --pretty=format:%cd and stores in the local cache for an interval READ_MODIFIED_CONFIG_TTL of 10 minutes.
  • Call new method getConfigModifiedDate, which uses PHP's filemtime on the config.ttl local file, retrieving the modification date and also storing in the local cache for the same interval READ_MODIFIED_CONFIG_TTL of 10 minutes. NB: file modification dates depends on file system, operating system, and other settings.
  • If at least one of the three values (Concept modified date, git modified date, configuration modified date) was found, we use the most recent one.
  • The most recent modified date is then sent in the Last-Modified header.
  • If the user (probably its browser) sent a If-Modified-Since value in the HTTP Request headers, it means it understands HTTP 304. So we will compare the modified since from the browser with our most recent modified date. If our most recent modified date is not greater than the browser modified since date, then we end the request-response transaction with a HTTP/1.0 304 Not Modified header.

Unit tests included use mocking library to reach almost 100% coverage of the changes. Performed local tests using the Dockerfile in local environment, and two browsers with incognito mode and closing/opening the browsers to force cleaning the cache.

This pull request covers the If-Modified-Since header. @osma, this pull request does not cover If-None-Match. I think this could be done later (involves ETAG's, covers HEAD requests too) if necessary.

Cheers
Bruno

@kinow
Copy link
Collaborator Author

kinow commented Mar 8, 2019

Travis is happy. Ready for review again I believe 🎉

@osma osma added this to the 2.2 milestone Mar 15, 2019
@osma
Copy link
Member

osma commented Mar 15, 2019

Thanks a lot! A couple of comments on the logic:

Call new method getConfigModifiedDate, which uses PHP's filemtime on the config.ttl local file, retrieving the modification date and also storing in the local cache for the same interval READ_MODIFIED_CONFIG_TTL of 10 minutes. NB: file modification dates depends on file system, operating system, and other settings.

GlobalConfig.initializeConfig() already does a filemtime() call on config.ttl on every request (if APC cache is in use). I wonder if the same value could be used here, for example by storing the result in a field within GlobalConfig which is then exposed by a method such as GlobalConfig.getModifiedDate() which is simply called by WebController instead of using its own getConfigModifiedDate method. In any case, filemtime() is such a cheap call AFAIK that storing the result in a cache seems a bit overkill.

If our most recent modified date is greater than the browser modified since date, then we end the request-response transaction with a HTTP/1.0 304 Not Modified header.

I think this should be not greater instead of greater? Anyways the code looks correct, no real issue here.

@kinow
Copy link
Collaborator Author

kinow commented Mar 20, 2019

GlobalConfig.initializeConfig() already does a filemtime() call on config.ttl on every request (if APC cache is in use). I wonder if the same value could be used here, for example by storing the result in a field within GlobalConfig which is then exposed by a method such as GlobalConfig.getModifiedDate() which is simply called by WebController instead of using its own getConfigModifiedDate method. In any case, filemtime() is such a cheap call AFAIK that storing the result in a cache seems a bit overkill.

Good idea! As it's using the cache, I think it should be safe to simply add that value to the cache in GlobalConfig. That way, when the request comes in WebController, it will do as before and first search the cache. WDYT?

Just updated the branch to have that behaviour, so take a look at the latest commit here to see if what I did makes sense.

If our most recent modified date is greater than the browser modified since date, then we end the request-response transaction with a HTTP/1.0 304 Not Modified header.

I think this should be not greater instead of greater? Anyways the code looks correct, no real issue here.

I got confused there for a minute, so had a look at Symfony's code to make sure I was not inverting the logic.

Indeed a typo in the description, editing the issue description 👍 sorry for the confusion @osma

@osma
Copy link
Member

osma commented Mar 29, 2019

Good idea! As it's using the cache, I think it should be safe to simply add that value to the cache in GlobalConfig. That way, when the request comes in WebController, it will do as before and first search the cache. WDYT?

Hmm, no, I don't think this makes sense. Right now, when APC is enabled, GlobalConfig checks the timestamp of config.ttl and uses that as (part of) the key for looking up a previously parsed version of the file in the APC cache. The idea here is that config.ttl can be updated any time, but doing so will change the timestamp and thus implicitly invalidate the old version in the APC cache and force re-parsing.

Since we already need to check the timestamp on every request, passing it via APC to WebController is just overcomplicating things. I think GlobalConfig should simply check the config.ttl timestamp once on every request (as it already does, pretty much) and store in a field inside the class and expose it via a method such as getModifiedDate(), as I suggested above.

@kinow
Copy link
Collaborator Author

kinow commented Mar 29, 2019

Oh, yup, good point. If the cache was disabled we simply would never get that value re-used. Just re-read your previous comment, and you suggested a method to be created, but I hadn't understood why. Now it makes sense 👍 updating it in the next days. Thanks @osma !

@kinow kinow force-pushed the support-last-modified-header branch from 6e827e7 to b849502 Compare March 31, 2019 06:48
@kinow
Copy link
Collaborator Author

kinow commented Mar 31, 2019

Dropped last commit, then added a new one that

  • always calls filemtime (so that it works with or without the APC cache enabled)
  • stores the return value in a property of GlobalConfig
  • only uses that value if it is not a bool (filemtime returns int|bool; bool for when errors occur)
  • removes the method from WebController that called filemtime
  • keeps the method to get the return value of filemtime. But instead of calling it, it simply calls GlobalConfig#getConfigModifiedTime and, if not a bool, then returns the DateTime

Rest of the workflow is the same. Unit test updated and passing on Travis, but another thorough review more than welcome (link to compare force-pushed commit above ^)

Cheers
Bruno

@osma
Copy link
Member

osma commented Apr 12, 2019

Sorry for the delay in reviewing, once again...

I think this is very close to be ready for merging, just a couple of probably minor observations:

  1. The constant READ_MODIFIED_CONFIG_TTL is now only used to store the timestamp from git, so it could be renamed something like GIT_MODIFIED_TTL. Also the comment above it should be amended, now it mentions other things than git timestamps.
  2. I'm getting some warnings like this in the Apache error log for some concepts (e.g. YSO concept http://www.yso.fi/onto/yso/p27625 ):
Fri Apr 12 17:59:27.330110 2019] [:error] [pid 23351] [client 127.0.0.1:46138] PHP Notice:  Object of class DateTime could not be converted to int in /var/www/html/Skosmos/controller/Controller.php on line 141, referer: http://localhost/Skosmos/yso/fi/page/p27675
[Fri Apr 12 17:59:27.330133 2019] [:error] [pid 23351] [client 127.0.0.1:46138] PHP Stack trace:, referer: http://localhost/Skosmos/yso/fi/page/p27675
[Fri Apr 12 17:59:27.330146 2019] [:error] [pid 23351] [client 127.0.0.1:46138] PHP   1. {main}() /var/www/html/Skosmos/index.php:0, referer: http://localhost/Skosmos/yso/fi/page/p27675
[Fri Apr 12 17:59:27.330155 2019] [:error] [pid 23351] [client 127.0.0.1:46138] PHP   2. WebController->invokeVocabularyConcept() /var/www/html/Skosmos/index.php:87, referer: http://localhost/Skosmos/yso/fi/page/p27675
[Fri Apr 12 17:59:27.330162 2019] [:error] [pid 23351] [client 127.0.0.1:46138] PHP   3. Controller->sendNotModifiedHeader() /var/www/html/Skosmos/controller/WebController.php:235, referer: http://localhost/Skosmos/yso/fi/page/p27675

@kinow
Copy link
Collaborator Author

kinow commented Apr 13, 2019

Hi @osma !

The constant READ_MODIFIED_CONFIG_TTL is now only used to store the timestamp from git, so it could be renamed something like GIT_MODIFIED_TTL. Also the comment above it should be amended, now it mentions other things than git timestamps.

Missed that one. Done!

I'm getting some warnings like this in the Apache error log for some concepts (e.g. YSO concept http://www.yso.fi/onto/yso/p27625 ):

I couldn't find this warning in the Docker images, even though they are running with Apache httpd and Apache 7.0. Might be some configuration that I am missing (tried the error_reporting(E_ALL) but nothing).

However, when I opened the line reported in the warning in PyCharm, I noticed it had a warning. The PHP docstrings were stating that the function should return DateTime|null but it was returning int|null. Fixed by using DateTime::createFromFormat to parse the server string.

Screenshot from 2019-04-13 21-07-20

Thanks
Bruno

@osma
Copy link
Member

osma commented Apr 16, 2019

Great work @kinow! The warning is now gone. I will merge this to the master branch. If anything new comes up we can open new issues/PRs.

@osma osma merged commit bfb3001 into NatLibFi:master Apr 16, 2019
@kinow
Copy link
Collaborator Author

kinow commented Apr 16, 2019

Brilliant @osma! Let me know how it goes. If anything happens I'll try to amend my changes and fix whatever may be broken (hopefully nothing!)

@osma
Copy link
Member

osma commented Apr 16, 2019

I've now enabled this for most vocabularies in http://dev.finto.fi . It seems to work fine so far. Concept pages that normally take 200-300ms to generate (or sometimes much more if they have LCSH and/or Wikidata mappings) now return a 304 response within 50ms on subsequent requests. I'm not seeing any new warnings in the Apache error log.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants