-
Notifications
You must be signed in to change notification settings - Fork 67
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Demo server unavailable #782
Comments
Now it does not react to any requests any more. |
Hi @smuellerDD , I just ran a quick test on the Demo environment and I was able to communicate without any issues. Are you still experiencing trouble? |
I am seeing some error messages in the external web server logs though, so investigating those.... stand by please. |
Okay please feel free to give it another try. |
It is working. Thank you |
I need to reopen it. GCM does not go through. I get
When submitting responses for vsID 146268, 146269 For another GCM vsID I am waiting now for some time without getting any answer: vsid 146068 |
Some debug information: The server stops responding after uploading the results to the large endpoint:
There is no progress in the communication from that point on. |
@smuellerDD internally those vector sets show as pass, is it showing something else externally, or did you eventually get through? |
The external endpoint does not tell me that there is a verdict. It allows me to upload the data via largefile support. Then the IP connection stays open without the server doing anything. I just tested it again. After 3 minutes, my client times out on the connection and retries the largefile URL and then gets the authorization failure. Some logging - note the time stamp:
|
Now it works again. |
I see that behavior again with new uploads of test results |
@smuellerDD can you tell me a vs id that failed, and the time you experienced the failure after having reopened the issue from #782 (comment) ? |
Am Donnerstag, 13. Februar 2020, 19:43:40 CET schrieb Russ Hammett:
Hi Russ,
@smuellerDD can you tell me a vs id that failed, and the time you
experienced the failure after having reopened the issue from
#782 (comment) ?
vsID 146848
The time is European time at noon.
```
ACVPProxy (12:16:45) Debug - HTTP operation: Setting private key with type PEM
* Trying 52.204.191.193:7443...
* TCP_NODELAY set
* Connected to demo.acvts.nist.gov (52.204.191.193) port 7443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /home/sm/hacking/sources/cavs/acvp/acvp-keys/digitcert_bundle.pem
CApath: none
* SSL connection using TLSv1.2 / AES128-SHA
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: C=US; ST=Maryland; L=Gaithersburg; O=National Institute of
Standards and Technology; OU=ITL; CN=demo.acvts.nist.gov
* start date: May 23 00:00:00 2019 GMT
* expire date: Jun 20 12:00:00 2020 GMT
* subjectAltName: host "demo.acvts.nist.gov" matched cert's
"demo.acvts.nist.gov"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
POST /acvp/v1/upload/
NH0STdlL8c5mf534s4lxuwmPz3bVgDCsGZNHAPg2dY0RmYCI9ySRak6wd3A0ToSK HTTP/1.1
Host: demo.acvts.nist.gov:7443
User-Agent: ACVPProxy/1.2.5
Accept: */*
Content-Type: application/json
Authorization: Bearer foo
Content-Length: 91388465
Expect: 100-continue
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
< Content-Length: 0
* We are completely uploaded and fine
* Empty reply from server
* Connection #0 to host demo.acvts.nist.gov left intact
ACVPProxy (12:20:16) Warning - HTTP operation: Curl HTTP operation failed with
code 52 (Server returned nothing (no headers, no data))
* Connection 0 seems to be dead!
* Closing connection 0
* Trying 52.204.191.193:7443...
* TCP_NODELAY set
* Connected to demo.acvts.nist.gov (52.204.191.193) port 7443 (#1)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: /home/sm/hacking/sources/cavs/acvp/acvp-keys/digitcert_bundle.pem
CApath: none
* SSL re-using session ID
* SSL connection using TLSv1.2 / AES128-SHA
* ALPN, server did not agree to a protocol
* Server certificate:
* subject: C=US; ST=Maryland; L=Gaithersburg; O=National Institute of
Standards and Technology; OU=ITL; CN=demo.acvts.nist.gov
* start date: May 23 00:00:00 2019 GMT
* expire date: Jun 20 12:00:00 2020 GMT
* subjectAltName: host "demo.acvts.nist.gov" matched cert's
"demo.acvts.nist.gov"
* issuer: C=US; O=DigiCert Inc; CN=DigiCert SHA2 Secure Server CA
* SSL certificate verify ok.
POST /acvp/v1/upload/
NH0STdlL8c5mf534s4lxuwmPz3bVgDCsGZNHAPg2dY0RmYCI9ySRak6wd3A0ToSK HTTP/1.1
Host: demo.acvts.nist.gov:7443
User-Agent: ACVPProxy/1.2.5
Accept: */*
Content-Type: application/json
Authorization: Bearer foo
Content-Length: 91388465
Expect: 100-continue
* Mark bundle as not supporting multiuse
< HTTP/1.1 100 Continue
< Content-Length: 0
* Mark bundle as not supporting multiuse
< HTTP/1.1 401 Unauthorized
< Expires: Thu, 01 Jan 1970 00:00:00 GMT
< Connection: keep-alive
< Cache-Control: no-cache, no-store, must-revalidate
< Strict-Transport-Security: max-age=31536000; includeSubDomains
< Pragma: no-cache
< Content-Type: application/json;charset=UTF-8
< Content-Length: 105
< Date: Thu, 13 Feb 2020 11:20:18 GMT
* HTTP error before end of send, stop sending
<
* Closing connection 1
ACVPProxy (12:20:27) Warning - HTTP operation: Unable to HTTP POST data for
URL https://demo.acvts.nist.gov:7443/acvp/v1/upload/
NH0STdlL8c5mf534s4lxuwmPz3bVgDCsGZNHAPg2dY0RmYCI9ySRak6wd3A0ToSK: 401
ACVPProxy (12:20:27) Verbose: Process following server response: {
"acvVersion" : "1.0",
"error" : "The current user is not authorized for the requested action."
}
```
Ciao
Stephan
|
@smuellerDD unfortunately i'm not seeing any reasoning behind the first attempt dropping, I do see why the second attempt failed, but that's seemingly due to the portion of the process in which the first attempt failed. can you provide me with the file that you're attempting to submit as your answers for the problem vs? I can try to add some additional logging for local testing to dig into why we're running into this. |
Just FYI: The issue is still there and I cannot upload large responses. |
Sorry @smuellerDD I forgot to follow up with you on the last response from #782 (comment) I unfortunately didn't find anything in our logs regarding the above that would point to an issue (though there obviously is something going on). The issue from earlier parts of the opened issue seem to have eventually made it though right? I have not figured out why it's occurring some of the time, but it seems to be somehow related to the size of the file. Can you give me the info on the most recent failure? environment, datetime, and vsId? |
Am Montag, 24. Februar 2020, 16:34:27 CET schrieb Russ Hammett:
Hi Russ,
Sorry @smuellerDD I forgot to follow up with you on the last response from
#782 (comment)
I unfortunately didn't find anything in our logs regarding the above that
would point to an issue (though there obviously is something going on).
The issue from earlier parts of the opened issue seem to have eventually
made it though right? I have not figured out why it's occurring some of the
time, but it seems to be somehow related to the size of the file.
The issue seems to be there all the time on the demo server. On prod it works.
Are you saying that you successfully uploaded a response using the large file
support?
Can you give me the info on the most recent failure? environment, datetime,
and vsId?
The final error (i.e. re-trial where I get a 401) occurred today:
15:32:18: 73537 with vsID 150674
15:32:56: 73537 with vsID 150673
European time.
Ciao
Stephan
|
@smuellerDD both 150674 and 150673 are showing "passed" on our end, meaning that they got the entire way through validating your responses being posted from the large endpoint (i assume these responses were large endpoint posts). The logging I have available to me is somewhat limited, so I have to try to piece together the information, but for the vector set 150674, I'm assuming it's failing on the ACVPProxy in manner that no response is ever returned from the server when POSTing to the large endpoint? If that is the case, it appears that the POST was in fact successful, as it made it all the way through our system, which fully passed the vector set (snip): {
"vsId": 150674,
"disposition": "passed",
"tests": [
{
"tcId": 1,
"result": "passed"
},
{
"tcId": 2,
"result": "passed"
},
{
"tcId": 3,
"result": "passed"
}]
} That would seem to narrow it down to me a bit. I would guess that the ACVPProxy still thinks it has not successfully submit the large POST, since it did not receive a response, which makes sense. Are you able to short circuit your logic to confirm that the pass verdict is actually making it out to the public? This gives me something to work with at least, something seems to be going wrong in between the POST occurring and the response being returned from the ACVP server to your client. At least in these two instances, it appears that we fully processed both the mentioned vector sets, we just never gave you the response from the large POST stating we accepted your answers. Would you agree this is what's occurring? |
Am Montag, 24. Februar 2020, 18:32:23 CET schrieb Russ Hammett:
Hi Russ,
@smuellerDD both 150674 and 150673 are showing "passed" on our end, meaning
that they got the entire way through validating your responses being posted
from the large endpoint (i assume these responses were large endpoint
posts).
That is the same as in previous sessions.
The logging I have available to me is somewhat limited, so I have to try to
piece together the information, but for the vector set 150674, I'm assuming
it's failing on the ACVPProxy in a similar manner that no response is ever
returned from the server?
Precisely - after the upload of the POST request, the server returns nothing
and my client times out.
If that is the case, it appears that the POST
was in fact successful, and made it all the way through our system, which
fully passed the vector set (snip):
```json
{
"vsId": 150674,
"disposition": "passed",
"tests": [
{
"tcId": 1,
"result": "passed"
},
{
"tcId": 2,
"result": "passed"
},
{
"tcId": 3,
"result": "passed"
}]
}
```
That sounds exactly as before.
Everything after that point, i.e. what my code does is irrelevant. The key
issue is that the ACVP server does not reply the first POST.
That would seem to narrow it down to me a bit. I would *guess* that the
ACVPProxy still thinks it has not successfully submit the large POST, since
it did not receive a response, which makes sense. Are you able to short
circuit your logic to confirm that the pass verdict is actually making it
out to the public?
Yes, when I fetch the verdict, I get the pass. So, forget the 2nd upload
request on my side - my code has 3 retries in case of an error (like a
timeout).
This gives me something to work with at least, something seems to be going
wrong in between the POST occurring and the response being returned from
the ACVP server to your client. At least in these two instances, it
appears that we fully processed both the mentioned vector sets, we just
never gave you the response from the large POST stating we accepted your
answers.
Would you agree this is what's occurring?
Precisely.
As far as why it's only happening on demo, that's something I need to look
into. I'm assuming you've done a similarly large POST on prod for shake
testing?
Yes - search for libgcrypt and there you will find large SHAKE responses.
Ciao
Stephan
|
thanks @smuellerDD that helps, I think I'm now on the same page as you, sorry for the delay ;) Can you speak at all to the success / failure % of the large endpoint on demo? it is 100%, or something less? |
Am Montag, 24. Februar 2020, 19:11:42 CET schrieb Russ Hammett:
Hi Russ,
thanks @smuellerDD that helps, I think I'm now on the same page as you,
sorry for the delay ;)
Can you speak at all to the success / failure % of the large endpoint on
demo? it is 100%, or something less?
I think all large endpoint submissions fail in the same way.
Ciao
Stephan
|
@smuellerDD we haven't found any differences in code or configuration between the demo and prod environments that would shed some light on this issue. I'm hoping I'll be able to spend some cycles bringing up a version of your client to attempt to reproduce the issue in some of our internal environments, but I have not yet had the time. One thing I did notice from our access logs is this:
I confirmed this with a few separate tokens, but it seemed like your first required always results in a 401, the second with a 200 for that same token. I'm planning on placing a bit more logging around the large endpoint feature that will hopefully give me a bit of insight into what's happening; but it's a very strange situation nonetheless. I really don't see a reason for the connection to just drop without writing a response. And looking at the above log entry, at least with this and several others I checked, we are sending a 200 after an initial 401. From your curl commands it looks to have a slightly separate flow from what I've been seeing: yours:
our access log:
does your client ever have a third attempt? I would assume that without doing another post to the |
Am Freitag, 28. Februar 2020, 15:47:47 CET schrieb Russ Hammett:
Hi Russ,
@smuellerDD we haven't found any differences in code or configuration
between the demo and prod environments that would shed some light on this
issue.
I'm hoping I'll be able to spend some cycles bringing up a version of your
client to attempt to reproduce the issue in some of our internal
environments, but I have not yet had the time.
One thing I did notice from our access logs is this:
```
10.200.1.24 - - [24/Feb/2020:09:23:08 -0500] "0.503 sec"
"POST /acvp/v1/upload/[token] HTTP/1.1" 401 105 "-"
"ACVPProxy/1.2.5" 10.200.1.24 - - [24/Feb/2020:09:47:38 -0500]
"1666.292 sec" "POST /acvp/v1/upload/[token] HTTP/1.1"
200 - "-" "ACVPProxy/1.2.5" ```
I confirmed this with a few separate tokens, but it seemed like your first
required always results in a 401, the second with a 200 for that same
token.
I'm planning on placing a bit more logging around the large endpoint feature
that will hopefully give me a bit of insight into what's happening; but
it's a very strange situation nonetheless. I really don't see a reason for
the connection to just drop without writing a response. And looking at the
above log entry, at least with this and several others I checked, we are
sending a 200 after an initial 401.
From your curl commands it looks to have a slightly separate flow from what
I've been seeing:
yours:
* attempt #1 - ACVP server sends no response
* attempt #2 - ACVP server states unauthorized (seeming to indicate attempt
#1 was successful, though for whatever reason a response was not sent in
that first attempt)
The first attempt times out. And this is the core issue.
Then I have a loop that in case of any connection error my client tries again.
As the large endpoint is open for only one access operation, the second
connection naturally fails with some form of error.
our access log:
* attempt #1 - return 401
The question is why is there a 401? And why do I not see it on the client
side.
* attempt #2 - return 200
Sure, the HTTP connection is successful and I get an ACVP protocol level error
transported over HTTP.
does your client ever have a third attempt?
It would after 10 seconds, but then I terminated it manually already.
I would assume that without
doing another post to the `large` endpoint you would just get another
unauthorized return.
That is my expectation as well.
Ciao
Stephan
|
Did you change anything? I now got a verdict for a GCM response that has 5.9 MB (vsID 166270). It took a couple of attempts, but I got a verdict in the end. |
only 5.9 MB? does that even warrant the use of the large endpoint? Nothing has changed as far as I'm aware on the demo environment, but @jarnold01 would need to confirm. |
Am Mittwoch, 18. März 2020, 14:37:44 CET schrieb Russ Hammett:
Hi Russ,
only 5.9 MB? does that even warrant the use of the large endpoint? Nothing
has changed as far as I'm aware on the demo environment, but @jarnold01
would need to confirm.
The server sets 4MB as limit.
Ciao
Stephan
|
Confirmed, 4MB is the current limit, and no changes have been made to the environment. |
I am closing this one - I have submitted a number of test results this week to the demo server without seeing that issue again. |
After some waiting time to authenticate, I got:
Trying again shows
The text was updated successfully, but these errors were encountered: