Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java.sql.SQLRecoverableException: No more data to read from socket #20

Closed
jgebal opened this issue Jun 29, 2017 · 42 comments · Fixed by utPLSQL/utPLSQL-java-api#77
Closed
Labels

Comments

@jgebal
Copy link
Member

jgebal commented Jun 29, 2017

I get the following error when running tests using utplsql-cli:

java.sql.SQLRecoverableException: No more data to read from socket
	at oracle.jdbc.driver.T4CMAREngineNIO.prepareForReading(T4CMAREngineNIO.java:119)
	at oracle.jdbc.driver.T4CMAREngineNIO.unmarshalUB1(T4CMAREngineNIO.java:534)
	at oracle.jdbc.driver.T4CTTIfun.receive(T4CTTIfun.java:485)
	at oracle.jdbc.driver.T4CTTIfun.doRPC(T4CTTIfun.java:252)
	at oracle.jdbc.driver.T4C8Oall.doOALL(T4C8Oall.java:612)
	at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:223)
	at oracle.jdbc.driver.T4CCallableStatement.doOall8(T4CCallableStatement.java:56)
	at oracle.jdbc.driver.T4CCallableStatement.executeForRows(T4CCallableStatement.java:907)
	at oracle.jdbc.driver.OracleStatement.doExecuteWithTimeout(OracleStatement.java:1119)
	at oracle.jdbc.driver.OraclePreparedStatement.executeInternal(OraclePreparedStatement.java:3780)
	at oracle.jdbc.driver.T4CCallableStatement.executeInternal(T4CCallableStatement.java:1300)
	at oracle.jdbc.driver.OraclePreparedStatement.execute(OraclePreparedStatement.java:3887)
	at oracle.jdbc.driver.OracleCallableStatement.execute(OracleCallableStatement.java:4230)
	at oracle.jdbc.driver.OraclePreparedStatementWrapper.execute(OraclePreparedStatementWrapper.java:1079)
	at io.github.utplsql.api.TestRunner.run(TestRunner.java:144)
	at io.github.utplsql.cli.RunCommand.lambda$run$0(RunCommand.java:141)
	at io.github.utplsql.cli.RunCommand$$Lambda$7/1647809929.run(Unknown Source)
	at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
	at java.util.concurrent.FutureTask.run(FutureTask.java:266)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
	at java.lang.Thread.run(Thread.java:745)

The trouble is, the utplsql-cli does not stop/throw.
It seems to be stuck, doing nothing.

This leads to two potential problems:

  1. Exceptions are not handled properly and the cli is doing nothing
  2. timeout for getting data from output buffer is set to a low (too low) value.
    Single test can run for quite some time.
@pesse
Copy link
Member

pesse commented May 17, 2018

@jgebal experienced this issue at any time since 3.0.4 version?

@c0dw
Copy link

c0dw commented Feb 14, 2019

We're using the most recent version 3.1.2 in a custom docker container (alpine based). About one in 25 calls (in a loop, each iteration using a freshly initialized container) get's blocked and only after waiting one full hour get's terminated eventually.

The utplsql command generates no output at all in these (rare) cases.

Furthermore, as already mentioned the exception/error exit code, due to the timeout seems to get lost on the way. Our call does get exit-code zero even after such a timeout. Therefore only the missing output (and the delay) can be used a indirect means of error handling...

Any progress avoiding this blockade and/or at least a proper error escalation in case it can't be avoided would be greatly appreciated.

@pesse
Copy link
Member

pesse commented Feb 14, 2019

Hi @c0dw ,

are you using Oracle 11.2?
We have identified similar issues with a 11.2 setup (see #98)

Samuel

@c0dw
Copy link

c0dw commented Feb 14, 2019

Hi @pesse
No, we're running the utplsql Framework (serverside) on Oracle 12 (Enterprise Edition) and from using 'lsof' I know the utplsql cli uses a version 12 jdbc driver on client side as well:

/mnt/workspace # lsof -p 24
...
/opt/jdk1.8.0_141/bin/java	/opt/utPLSQL-cli/lib/ojdbc8-12.2.0.1.jar
...

@pesse
Copy link
Member

pesse commented Feb 14, 2019

I'll soon have a new version of cli ready with logging possibilities - I would then aks you to try it with this new version and provide the logs, if possible

@c0dw
Copy link

c0dw commented Feb 14, 2019

Absolutely! I look forward to verify the new version, thanks!
BTW, in case you're still open to feature suggestions for the new version: IMHO it would also help if the timeout were not hardcoded to 60 minutes, but could be customized when invoking utplsql depending on the expected runtime of the test. Our unittests usually take no langer than 1 or 2 minutes.

@pesse
Copy link
Member

pesse commented Feb 14, 2019

That's a good point - want to make an issue for it?
It's actually not much work to include that.

@c0dw
Copy link

c0dw commented Feb 14, 2019

OK, done! -> configurable timeout #124
... and many thanks for your swift response!

@pesse
Copy link
Member

pesse commented Feb 14, 2019

Hey @c0dw , can you try to reproduce it with this version of cli?
https://bintray.com/utplsql/utPLSQL-cli/download_file?file_path=utPLSQL-cli-develop-201902142127.zip

use
utplsql run user/pw@connectstring -d
for debug output

@c0dw
Copy link

c0dw commented Feb 15, 2019

Hi @pesse
I built a new Dockerimage with the utplsqlcli version 201902142127 you provided and tried to reproduce the bug while using the debug flag. It did not happen. After 100 successful iterations I broke out of the loop. For reference I include the debug output of the last iteration here anyway. And FYI, at the end I listed the invoking loop

iteration100.txt

After that, I tried to reproduce the error again with the official release Version 3.1.2 of utplsqlcli. It also took significantly more iterations until the error occured: More than 70 iterations in a first pass and 56 iterations in the second attempt. As you can see iteration 56 took one hour (3603 seconds) to timeout and it produced no log (zero lines in srqoauth2_doc.log). In the shebang of the script I set the '-e' flag to cause immediate termination in case of an error, which obviously never made it up the callstack...

15.02.2019 12:42:29 - Iteration 50, duration 6 seconds, number of output-lines: 24 target/srqoauth2_doc.log
15.02.2019 12:42:37 - Iteration 51, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log
15.02.2019 12:42:46 - Iteration 52, duration 6 seconds, number of output-lines: 24 target/srqoauth2_doc.log
15.02.2019 12:42:54 - Iteration 53, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log
15.02.2019 12:43:03 - Iteration 54, duration 6 seconds, number of output-lines: 24 target/srqoauth2_doc.log
15.02.2019 12:43:11 - Iteration 55, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log
15.02.2019 13:43:17 - Iteration 56, duration 3603 seconds, number of output-lines: 0 target/srqoauth2_doc.log
15.02.2019 13:43:26 - Iteration 57, duration 6 seconds, number of output-lines: 24 target/srqoauth2_doc.log
15.02.2019 13:43:34 - Iteration 58, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log

I managed to enter the running Docker container and took a threaddump with jstack, if this is any help understanding the problem:

jstack_threaddump.txt

I will try again to reproduce the failure and collect debugging info using your working version 201902142127. Next time I will be more patient and wait for more than 100 iterations just to be sure, the error does indeed not occur anymore.

@pesse
Copy link
Member

pesse commented Feb 15, 2019

thanks a lot for your thorough analysis, @c0dw
We did some tweaks in the newer version, too, so it might be that the main problem is resolved.

@c0dw
Copy link

c0dw commented Feb 15, 2019

That's the least I can do!
Meanwhile I switched back to the utplsqlcli working version 201902142127 and started a new session expecting it to keep running smoothly all day and next night. (Note to self: Stop expecting!)

As a matter of fact, it happend already in Iteration 17, this time I can provide the full debug log and for comparison the one before (iteration16), where all was well:

Iteration16-16_DEBUG.txt

As you can see, the process is stuck while 'Initializing reporters'. I get one line of output every 30 seconds stating...
2019-02-15 14:25:23 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=3, active=3, idle=0, waiting=0)

...and here is the according jstack threaddump of the utplsqlcli again:
jstack_threaddump_utplsqlcli201902142127.txt

@c0dw
Copy link

c0dw commented Feb 19, 2019

Hi @pesse

Please let me know, if I can contribute anything else to resolve this blocking issue. Apparently it persists in the development snapshot 201902142127 you provided, as outlined in my last posting.

I'd also like to point out, we still face the additional problem, that the timeout does not trigger a proper error escalation. I still don't get an exitcode after a timeout. My java skills are slightly rusty - so correct me if I'm wrong, but from your code (lines 203-209) I understand you're using java.util.concurrent.ExecutorService.shutdown() which causes an orderly shutdown and after that in line 209 you return 'returnCode[0]' which I suppose is initialized to literally zero as well. Therefore an invoking process does not get notified, that an irregular termination (due to a timeout) occurred, which makes error handling cumbersome. IMHO a timeout should push an error/exception up the call-stack.
Update: I digged a little deeper and I think the solution is using the return Value of the awaitTermination Method to determine the returnValue of your own function, like ...

if (!executorService.awaitTermination(timeoutInMinutes, TimeUnit.MINUTES)) {
        returnCode[0] = 1; // or whatever exitcode you might assign a timeout - except zero!
}

According to the API doc awaitTermination() returns true if this executor terminated and false if the timeout elapsed before termination.

@pesse
Copy link
Member

pesse commented Feb 19, 2019

Hi @c0dw

Can you open a seperate issue for the timeout not causing an error? I agree that a timeout should throw an exception and be visible from the outside.

As for the other issue: The most problematic thing is that we were not able to reproduce the issue yet. What you could help with is to investigate (maybe together with your DBA) which SQL statement is executed (or was executed last) in the case of a block.
I suspect it's the same as in #98, which means that this problem is not limited to 11g.

I will probably add some more log messages, too and come back with a new version.

I'd also like to ask you for patience because utPLSQL is a 100% leisure time project, nobody gets anything paid for it and we do it in our free time on evenings besides our normal day jobs. Therefore complicated issues like this can be very challenging.

@c0dw
Copy link

c0dw commented Feb 19, 2019

Hi @pesse

Thank's again for your fast response!

Don't get me wrong, patience is not not a problem, we've got a workaround in place which reruns the affected tests. I certainly did not intend to push you, on the contrary I wish support for expensive commercial products was always as timely and substantial as your own, it's much appreciated!!!

I followed up on my remarks, because I currently can reproduce the issue in my loop and hope this helps fixing it. If I can find out which SQL statement hangs, I'll let you know here.

Regarding error escalation in the timeout scenario, I created a new issue as you suggested.

@pesse
Copy link
Member

pesse commented Feb 19, 2019

Thank you for your kind and encouraging words :)
It's great you can reproduce it.
Can you also reproduce it with

  • Only one Reporter configured
  • Different Reporter than JUNIT configured

This might help to narrow down the problem.

@jgebal
Copy link
Member Author

jgebal commented Feb 19, 2019

Could you also check to see if the same issue occurs with latest utPLSQL framework build from develop branch?
There were some slight changes.
@pesse, not sure if you still have dependency on getting reporters list in API?

One more thing to mote is that the problem could somehow relate to oracle fixed table stats.
Can you check that system statistics are gathered at the DB that is facing the issue?

It would be great to get a trace enabled for the session. But it would need to be done by cli/api to enable capturing of the whole flow from the first statement.
@pesse - Maybe Oracle trace could be part of debug mode?

@pesse
Copy link
Member

pesse commented Feb 19, 2019

not sure if you still have dependency on getting reporters list in API?

No, we don't. If it was the case we would also not see Reporter initialization messages.

Maybe Oracle trace could be part of debug mode?

Sounds reasonable, I just have to admit that this is something I have absolutely no experience with, so chances are high I need some time. Also not sure what privileges are necessary etc.

@c0dw
Copy link

c0dw commented Feb 19, 2019

A short update on a preliminary result:

This issue seems to be related to #98 indeed. During two blocked sessions I was able to query the oracle database and came up with this:

select S.USERNAME, s.sid, s.osuser, t.sql_id, sql_text
from v$sqltext_with_newlines t,V$SESSION s
where t.address =s.sql_address
and t.hash_value = s.sql_hash_value
and s.status = 'ACTIVE'
and s.username <> 'SYSTEM'
order by s.sid,t.piece
/

USERNAME    SID OSUSER   SQL_ID        SQL_TEXT
---------- ---------- ---------- ------------- ------------------------------------------------------------
UTS      18 root   860hywfqqq60j SELECT COLUMN_VALUE AS TEXT FROM TABLE((:B1 ).GET_LINES(:B2 , :B
UTS      18 root   860hywfqqq60j 3 ))
UTS     753 root   860hywfqqq60j SELECT COLUMN_VALUE AS TEXT FROM TABLE((:B1 ).GET_LINES(:B2 , :B
UTS     753 root   860hywfqqq60j 3 ))
SYS     756 oracle   cmqxmqv8kxsn5 select S.USERNAME, s.sid, s.osuser, t.sql_id, sql_text
                 from v$sq

SYS     756 oracle   cmqxmqv8kxsn5 ltext_with_newlines t,V$SESSION s
                 where t.address =s.sql_address

SYS     756 oracle   cmqxmqv8kxsn5
                 and t.hash_value = s.sql_hash_value
                 and s.status = 'ACTIVE'
                 and

SYS     756 oracle   cmqxmqv8kxsn5  s.username <> 'SYSTEM'
                 order by s.sid,t.piece


8 rows selected.

Only SID 18 and 753 are relevant. These sessions were the two blocked ones in a loop of more than 300 iterations. So, when it happend in SID 18, the loop continued after the timeout until it happend again in SID 753. Meanwhile not only the currently blocked session 753, but also the former one SID 18 remained 'ACTIVE' in the DB.

In Issue #98, you mentioned the exact same query-text

SQL> select SQL_FULLTEXT from V$SQL where SQL_ID='860hywfqqq60j';

SQL_FULLTEXT
--------------------------------------------------------------------------------
SELECT COLUMN_VALUE AS TEXT FROM TABLE((:B1 ).GET_LINES(:B2 , :B3 ))

SQL> r
  1  SELECT NAME,POSITION,DATATYPE_STRING,VALUE_STRING
  2* FROM v$sql_bind_capture WHERE sql_id='860hywfqqq60j'

NAME	    POSITION DATATYPE_STRING	  VALUE_STRI
---------- --------- -------------------- ----------
:B1		   1 ADT
:B2		   2 NUMBER
:B3		   3 NUMBER

@c0dw
Copy link

c0dw commented Feb 19, 2019

Currently I'm in the next blocked iteration:

SQL> select S.USERNAME, s.sid, s.osuser, t.sql_id, sql_text
from v$sqltext_with_newlines t,V$SESSION s
where t.address =s.sql_address
and s.username='UTS'
and t.hash_value = s.sql_hash_value
and s.status = 'ACTIVE'
and s.username <> 'SYSTEM'
order by s.sid,t.piece
/
  2    3    4    5    6    7    8    9
                                        USERNAME	  SID OSUSER	 SQL_ID        SQL_TEXT
---------- ---------- ---------- ------------- ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
UTS		   18 root	 860hywfqqq60j SELECT COLUMN_VALUE AS TEXT FROM TABLE((:B1 ).GET_LINES(:B2 , :B
UTS		   18 root	 860hywfqqq60j 3 ))
UTS		  753 root	 860hywfqqq60j SELECT COLUMN_VALUE AS TEXT FROM TABLE((:B1 ).GET_LINES(:B2 , :B
UTS		  753 root	 860hywfqqq60j 3 ))
UTS		  757 root	 860hywfqqq60j SELECT COLUMN_VALUE AS TEXT FROM TABLE((:B1 ).GET_LINES(:B2 , :B
UTS		  757 root	 860hywfqqq60j 3 ))

All the session waits
Concerning 'MACHINE':

  • a42417892205 is the docker container id of the oracle DB
  • 54b1be3b7c3f is the docker container curently running utplsqlcli (as in SID 263, 747 and 757).
  • the other ids probably refer to the two previous blocked iterations (each on creates a new container for the client, but all use the same DB container on the server-end)
col username format a5
col machine format a30
col program format a40
col state format a10
col last_call_et head 'Called|secs ago' format 999999
col seconds_in_wait head 'Waiting|for secs' format 999999
col event format a50

select sid, machine, username, program,
        decode(state, 'WAITING', 'Waiting',
                'Working') state,
last_call_et, seconds_in_wait, event
from v$session
/
												      Called  Waiting
       SID MACHINE			  USERN PROGRAM 				 STATE	    secs ago for secs EVENT
---------- ------------------------------ ----- ---------------------------------------- ---------- -------- -------- --------------------------------------------------
	 1 a42417892205 			oracle@a42417892205 (PSP0)		 Waiting      346687	    0 rdbms ipc message
	 2 a42417892205 			oracle@a42417892205 (TT01)		 Waiting      346562	   45 Data Guard: Timer
	 3 a42417892205 		  SYS	oracle@a42417892205 (OFSD)		 Waiting      346685	    3 OFS idle
	 4 a42417892205 			oracle@a42417892205 (SVCB)		 Waiting      346685	    1 wait for unread message on broadcast channel
	 5 a42417892205 			oracle@a42417892205 (LGWR)		 Waiting      346685	    2 rdbms ipc message
	 6 a42417892205 			oracle@a42417892205 (LG01)		 Waiting      346685	 3203 LGWR worker group idle
	 7 a42417892205 			oracle@a42417892205 (LREG)		 Waiting      346685	    2 lreg timer
	 8 a42417892205 			oracle@a42417892205 (MMON)		 Waiting      346685	    2 rdbms ipc message
	10 a42417892205 			oracle@a42417892205 (W000)		 Waiting	2099	    4 Space Manager: slave idle wait
	17 a42417892205 			oracle@a42417892205 (W002)		 Waiting	2929	    4 Space Manager: slave idle wait
	18 9623c4348966 		  UTS	JDBC Thin Client			 Waiting       11708	    0 PL/SQL lock timer
       247 a42417892205 			oracle@a42417892205 (VKTM)		 Waiting      346687   116342 VKTM Logical Idle Wait
       248 a42417892205 			oracle@a42417892205 (GEN1)		 Waiting      346685	    1 rdbms ipc message
       249 a42417892205 			oracle@a42417892205 (SCMN)		 Waiting      346685	    2 watchdog main loop
       250 a42417892205 			oracle@a42417892205 (PMAN)		 Waiting      346685	    0 pman timer
       251 a42417892205 			oracle@a42417892205 (CKPT)		 Waiting      346685	    0 rdbms ipc message
       252 a42417892205 			oracle@a42417892205 (SMCO)		 Waiting      346685	    1 rdbms ipc message
       255 a42417892205 			oracle@a42417892205 (TT02)		 Waiting      346562	    0 heartbeat redo informer
       259 a42417892205 			oracle@a42417892205 (W003)		 Waiting	2146	    4 Space Manager: slave idle wait
       260 a42417892205 			oracle@a42417892205 (QM02)		 Waiting      346551	    6 Streams AQ: qmn coordinator idle wait
       263 54b1be3b7c3f 		  UTS	JDBC Thin Client			 Waiting	2435	 2437 SQL*Net message from client
       493 a42417892205 			oracle@a42417892205 (PMON)		 Waiting      346687	    0 pmon timer
       494 a42417892205 			oracle@a42417892205 (GEN0)		 Waiting      346685	    2 rdbms ipc message
       495 a42417892205 			oracle@a42417892205 (SCMN)		 Waiting      346685	    0 watchdog main loop
       496 a42417892205 			oracle@a42417892205 (DBRM)		 Waiting      346685	    1 rdbms ipc message
       497 a42417892205 			oracle@a42417892205 (DIA0)		 Waiting      346685	    1 DIAG idle wait
       498 a42417892205 			oracle@a42417892205 (LG00)		 Waiting      346685	 3203 LGWR worker group idle
       499 a42417892205 			oracle@a42417892205 (RECO)		 Waiting      346685	  949 rdbms ipc message
       500 a42417892205 			oracle@a42417892205 (PXMN)		 Waiting      346685	    1 rdbms ipc message
       502 a42417892205 			oracle@a42417892205 (AQPC)		 Waiting      346561	    4 AQPC idle
       503 a42417892205 			oracle@a42417892205 (CJQ0)		 Waiting      346561	    9 rdbms ipc message
       511 a42417892205 			oracle@a42417892205 (W004)		 Waiting	5911	    4 Space Manager: slave idle wait
       739 a42417892205 			oracle@a42417892205 (TT00)		 Waiting      346562	   47 Data Guard: Gap Manager
       740 a42417892205 			oracle@a42417892205 (CLMN)		 Waiting      346687	    1 pmon timer
       741 a42417892205 			oracle@a42417892205 (MMAN)		 Waiting      346685	    1 rdbms ipc message
       742 a42417892205 			oracle@a42417892205 (DIAG)		 Waiting      346685	    0 DIAG idle wait
       743 a42417892205 			oracle@a42417892205 (VKRM)		 Waiting      346685   116342 VKRM Idle
       744 a42417892205 			oracle@a42417892205 (DBW0)		 Waiting      346685	    3 rdbms ipc message
       745 a42417892205 			oracle@a42417892205 (SMON)		 Waiting      346685	  145 smon timer
       747 54b1be3b7c3f 		  UTS	JDBC Thin Client			 Waiting	2435	 2437 SQL*Net message from client
       748 a42417892205 			oracle@a42417892205 (MMNL)		 Waiting      346685	    0 rdbms ipc message
       749 a42417892205 			oracle@a42417892205 (TMON)		 Waiting      346685	    7 rdbms ipc message
       750 a42417892205 			oracle@a42417892205 (Q001)		 Waiting       24845	    6 Streams AQ: qmn slave idle wait
       753 26d6ba0c9892 		  UTS	JDBC Thin Client			 Waiting	7354	    1 PL/SQL lock timer
       755 a42417892205 			oracle@a42417892205 (Q002)		 Waiting      346551	69055 Streams AQ: waiting for time management or cleanup
														       tasks

       756 a42417892205 		  SYS	sqlplus@a42417892205 (TNS V1-V3)	 Working	   0	    0 SQL*Net message to client
       757 54b1be3b7c3f 		  UTS	JDBC Thin Client			 Waiting	2435	    1 PL/SQL lock timer

47 rows selected.

History of wait events in a specific session

set lines 120 trimspool on
col event head "Waited for" format a30
col total_waits head "Total|Waits" format 999,999
col tw_ms head "Waited|for (ms)" format 999,999.99
col aw_ms head "Average|Wait (ms)" format 999,999.99
col mw_ms head "Max|Wait (ms)" format 999,999.99
select event, total_waits, time_waited*10 tw_ms,
       average_wait*10 aw_ms, max_wait*10 mw_ms
from v$session_event
where sid = 757
/
                                  Total      Waited     Average         Max
Waited for                        Waits    for (ms)   Wait (ms)   Wait (ms)
------------------------------ -------- ----------- ----------- -----------
Disk file operations I/O              1         .00         .10         .00
log file sync                         2         .00         .60         .00
SQL*Net message to client            32         .00         .00         .00
SQL*Net message from client          32      760.00       23.90      210.00
PL/SQL lock timer                 2,711 ###########    1,036.00    1,250.00
events in waitclass Other            31         .00         .00         .00

@pesse
Copy link
Member

pesse commented Feb 21, 2019

Hi @c0dw ,
thanks again for your thorough analysis.
Could you please do the following in a stuck scenario:

select output_id, is_finished, count(text)
  from ut_output_buffer_tmp
  group by rollup (output_id, is_finished);

and
select * from ut_output_buffer_info_tmp;

and provide it both together with the cli logfile?

@c0dw
Copy link

c0dw commented Feb 22, 2019

Hi @pesse

Unfortunatly now I'm also having a hard time to reproduce the error condition again. In the past few days it did not happen once. Currently I have a loop running more than 1000 iterations and the test performed flawlessly each time. It is not running against the same Docker container anymore, I re-created it several times in past days, but I also tried using the very same docker image of the oracle db container we used before. As a matter of course I'm not aware of any changes to the Database which would explain this.

Regardless, this seems to confirm our suspicion, that the root cause might be related to the state of the oracle DB, but until we face the next situation it is virtually impossible to deduce specifics.

BTW, I heard , that by coincidence one of the utPLSQL committers, @PhilippSalvisberg is currently on assignment in our company. I'll try to schedule a consulting session with him.

In the meantime I'll keep running my test loop and let you know as soon as the problem arises again.

@c0dw
Copy link

c0dw commented Feb 22, 2019

Hi @pesse
At last, I had a new blocking call after literally thousands of iterations. I'v been able to collect the output of the two queries you mentioned, but I don't have the debugging output (cli logfile) on the client side. As debug is just to noisy, if one has to wait that many iterations until something interesting happens, I limited the output to one line per iteration:

...
2019.02.22 15:58:45 - Iteration 78, duration 6 seconds, number of output-lines: 24 target/srqoauth2_doc.log
2019.02.22 15:58:50 - Iteration 79, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log
2019.02.22 15:58:55 - Iteration 80, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log
2019.02.22 15:59:01 - Iteration 81, duration 6 seconds, number of output-lines: 24 target/srqoauth2_doc.log
2019.02.22 15:59:06 - Iteration 82, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log
2019.02.22 15:59:11 - Iteration 83, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log
2019.02.22 15:59:16 - Iteration 84, duration 5 seconds, number of output-lines: 24 target/srqoauth2_doc.log
<BLOCK>

The query on ut_output_buffer_tmp returned empty, no rows!

SQL> select output_id, is_finished, count(text)
  2    from ut3.ut_output_buffer_tmp
  3    group by rollup (output_id, is_finished);

no rows selected

The second query (on ut_output_buffer_info_tmp) returned 6868 rows! The spool is attached here, but I doubt it is very enlightening, since it contains only hexadecimal output IDs:

ut_output_buffer_info_tmp.txt

@PhilippSalvisberg
Copy link
Member

@pesse: Yesterday, I met with @c0dw. He explained me the problem and I think I've got a pretty good understanding of the faulty behaviour. I plan to look at it next Wednesday. I'll keep you updated. Either via this issue or via Slack.

@pesse
Copy link
Member

pesse commented Feb 25, 2019

@c0dw Thanks!
What is a bit strange is that you got no output for the 1st query, which means the block is happening because the reporter tries to fetch rows which are not there.
The second spool unfortunately doesn't tell us anything without the log (or at least this part of the client log: Database-reporter initialized, Type: ?, ID: ?)
If you can repeat your loop and pipe only the following lines
<timestamp> [main] DEBUG org.utplsql.api.reporter.Reporter - Database-reporter initialized[...]
this could show us if the ID used for initialization does appear in ut_output_buffer_info_tmp.

@PhilippSalvisberg Thanks a lot!

@jgebal
Copy link
Member Author

jgebal commented Feb 25, 2019

The one thing that surprises me is that there is no main session hanging while reporters are waiting for the output.
It would be important to understand what is happening to the maun session.

  • When it dies
  • Why it dies
  • Why there is no error when it dies
  • Why cli keeps remaining sessions open when main session dies

@SebGt
Copy link

SebGt commented Feb 26, 2019

@pesse, please find below result of a run that was stuck with debug enabled and the result of above queries:

C:\Users\$devtools>V:\Tools\UnitTests\Jenkins\utPLSQL-cliD\bin\utplsql run test/test@dev_db.WORLD -d -f=ut_xunit_reporter -o=tpkg_result.xml -f=ut_coverage_html_reporter -o=tpkg_coverage.html -p=test.tpkg -include=dev.pkg
2019-02-26 11:12:09 [main] INFO  org.utplsql.cli.RunCommand - 
"##################### utPLCSL cli #####################
"#                                                     #
"#   utPLSL-cli 3.1.3-SNAPSHOT.local                   #
"#   utPLSQL-java-api 3.1.3-SNAPSHOT.322               #
"#   Java-Version: 1.8.0_191                           #
"#   ORACLE_HOME: \\devfile\Programs\ORACLE            #
"#   NLS_LANG: null                                    #
"#                                                     #
"#   Thanks for testing!                               #
"#                                                     #
"#######################################################
2019-02-26 11:12:09 [main] INFO  org.utplsql.cli.RunCommand -
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - HikariPool-1 - configuration:
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - allowPoolSuspension.............false
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - autoCommit......................true
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - catalog.........................none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - connectionInitSql...............none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - connectionTestQuery.............none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - connectionTimeout...............30000
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - dataSource......................none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - dataSourceClassName.............none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - dataSourceJNDI..................none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - dataSourceProperties............{password=<masked>}
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - driverClassName.................none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - healthCheckProperties...........{}
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - healthCheckRegistry.............none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - idleTimeout.....................600000
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - initializationFailFast..........true
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - initializationFailTimeout.......1
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - isolateInternalQueries..........false
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - jdbc4ConnectionTest.............false
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - jdbcUrl.........................jdbc:oracle:oci8:@dev_db.WORLD
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - leakDetectionThreshold..........0
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - maxLifetime.....................1800000
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - maximumPoolSize.................10
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - metricRegistry..................none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - metricsTrackerFactory...........none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - minimumIdle.....................10
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - password........................<masked>
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - poolName........................"HikariPool-1"
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - readOnly........................false
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - registerMbeans..................false
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - scheduledExecutor...............none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - scheduledExecutorService........internal
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - schema..........................none
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - threadFactory...................internal
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - transactionIsolation............default
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - username........................"test"
2019-02-26 11:12:09 [main] DEBUG com.zaxxer.hikari.HikariConfig - validationTimeout...............5000
2019-02-26 11:12:09 [main] INFO  com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - HikariPool-1 - configuration:
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - allowPoolSuspension.............false
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - autoCommit......................true
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - catalog.........................none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - connectionInitSql...............none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - connectionTestQuery.............none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - connectionTimeout...............30000
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - dataSource......................none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - dataSourceClassName.............none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - dataSourceJNDI..................none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - dataSourceProperties............{password=<masked>}
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - driverClassName.................none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - healthCheckProperties...........{}
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - healthCheckRegistry.............none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - idleTimeout.....................600000
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - initializationFailFast..........true
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - initializationFailTimeout.......1
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - isolateInternalQueries..........false
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - jdbc4ConnectionTest.............false
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - jdbcUrl.........................jdbc:oracle:thin:@dev_db.WORLD
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - leakDetectionThreshold..........0
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - maxLifetime.....................1800000
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - maximumPoolSize.................10
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - metricRegistry..................none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - metricsTrackerFactory...........none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - minimumIdle.....................10
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - password........................<masked>
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - poolName........................"HikariPool-1"
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - readOnly........................false
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - registerMbeans..................false
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - scheduledExecutor...............none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - scheduledExecutorService........internal
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - schema..........................none
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - threadFactory...................internal
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - transactionIsolation............default
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - username........................"test"
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.HikariConfig - validationTimeout...............5000
2019-02-26 11:12:10 [main] INFO  com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Starting...
2019-02-26 11:12:10 [main] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Added connection oracle.jdbc.driver.T4CConnection@56aac163
2019-02-26 11:12:10 [main] INFO  com.zaxxer.hikari.HikariDataSource - HikariPool-1 - Start completed.
2019-02-26 11:12:10 [main] INFO  o.u.c.d.TestedDataSourceProvider - Use connectstring jdbc:oracle:thin:****/****@dev_db.WORLD
2019-02-26 11:12:10 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=1, active=1, idle=0, waiting=0)
2019-02-26 11:12:11 [main] INFO  org.utplsql.cli.RunCommand - Successfully connected to database. UtPLSQL core: v3.1.3.2402
2019-02-26 11:12:11 [main] INFO  org.utplsql.cli.RunCommand - Oracle-Version: 11.2.0.4.0
2019-02-26 11:12:11 [HikariPool-1 connection adder] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Added connection oracle.jdbc.driver.T4CConnection@6fb15ace
2019-02-26 11:12:11 [HikariPool-1 connection adder] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Added connection oracle.jdbc.driver.T4CConnection@7a6adf8c
2019-02-26 11:12:11 [HikariPool-1 connection adder] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - After adding stats (total=3, active=1, idle=2, waiting=0)
2019-02-26 11:12:11 [main] DEBUG org.utplsql.api.reporter.Reporter - Database-reporter initialized, Type: UT_XUNIT_REPORTER, ID: 82CA4A5F5DB375E7E0540010E0766450
2019-02-26 11:12:11 [main] DEBUG org.utplsql.api.reporter.Reporter - Database-reporter initialized, Type: UT_COVERAGE_HTML_REPORTER, ID: 82CA4A5F5DB775E7E0540010E0766450
2019-02-26 11:12:11 [pool-1-thread-1] INFO  org.utplsql.cli.RunCommand - Running tests now.
2019-02-26 11:12:11 [pool-1-thread-1] INFO  org.utplsql.cli.RunCommand - --------------------------------------
2019-02-26 11:12:11 [pool-1-thread-1] INFO  org.utplsql.api.TestRunner - TestRunner initialized
2019-02-26 11:12:11 [pool-1-thread-1] INFO  org.utplsql.api.TestRunner - Running on utPLSQL v3.1.3.2402
2019-02-26 11:12:11 [pool-1-thread-1] INFO  org.utplsql.api.TestRunner - Initializing reporters
2019-02-26 11:12:40 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=3, active=3, idle=0, waiting=0)
2019-02-26 11:13:10 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=3, active=3, idle=0, waiting=0)
2019-02-26 11:13:41 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=3, active=3, idle=0, waiting=0)
2019-02-26 11:14:11 [HikariPool-1 housekeeper] DEBUG com.zaxxer.hikari.pool.HikariPool - HikariPool-1 - Pool stats (total=3, active=3, idle=0, waiting=0)

Query 1;

select output_id, is_finished, count(text)
  from ut_output_buffer_tmp
  group by rollup (output_id, is_finished);

Returns no row

Query 2:

select * from ut_output_buffer_info_tmp;

returns:

82C88E6682AD85D1E0540010E0766450	26-Feb-19 09:08:02
82C88E90B93B8710E0540010E0766450	26-Feb-19 09:08:05
82C88E90B93D8710E0540010E0766450	26-Feb-19 09:08:06
82C8BCE1B3A8DC58E0540010E0766450	26-Feb-19 09:21:02
82C8D5E9A68F943AE0540010E0766450	26-Feb-19 09:28:02
82C8FD405234750AE0540010E0766450	26-Feb-19 09:39:02
82C912B9265CE631E0540010E0766450	26-Feb-19 09:45:02
82C91649671B8A62E0540010E0766450	26-Feb-19 09:46:02
82C91D6EE6B4F04DE0540010E0766450	26-Feb-19 09:48:02
82C9282AD26DC6A6E0540010E0766450	26-Feb-19 09:51:02
82C944C9CC4F9322E0540010E0766450	26-Feb-19 09:59:02
82CA0D12908ECA7FE0540010E0766450	26-Feb-19 10:55:02
82CA3D5CE5FB6FE1E0540010E0766450	26-Feb-19 11:08:33
82CA3D5CE5FD6FE1E0540010E0766450	26-Feb-19 11:08:33
82CA3D5CE5FF6FE1E0540010E0766450	26-Feb-19 11:08:33
82CA3D5CE6016FE1E0540010E0766450	26-Feb-19 11:08:33
82CA3EC94EF674B6E0540010E0766450	26-Feb-19 11:08:56
82CA3EC94EF874B6E0540010E0766450	26-Feb-19 11:08:57
82CA3EC94EFA74B6E0540010E0766450	26-Feb-19 11:08:57
82CA3EC94EFC74B6E0540010E0766450	26-Feb-19 11:08:57
82CA40AA34028058E0540010E0766450	26-Feb-19 11:09:28
82CA40AA34048058E0540010E0766450	26-Feb-19 11:09:28
82CA40AA34068058E0540010E0766450	26-Feb-19 11:09:28
82CA40AA34088058E0540010E0766450	26-Feb-19 11:09:28
82CA41439C378497E0540010E0766450	26-Feb-19 11:09:38
82CA41439C398497E0540010E0766450	26-Feb-19 11:09:38
82CA41439C3B8497E0540010E0766450	26-Feb-19 11:09:38
82CA41439C3D8497E0540010E0766450	26-Feb-19 11:09:39
82CA41CADAD08A27E0540010E0766450	26-Feb-19 11:09:47
82CA41CADAD28A27E0540010E0766450	26-Feb-19 11:09:47
82CA41CADAD48A27E0540010E0766450	26-Feb-19 11:09:47
82CA41CADAD68A27E0540010E0766450	26-Feb-19 11:09:47
82CA426CE13E93DEE0540010E0766450	26-Feb-19 11:09:58
82CA426CE14093DEE0540010E0766450	26-Feb-19 11:09:58
82CA426CE14293DEE0540010E0766450	26-Feb-19 11:09:58
82CA426CE14493DEE0540010E0766450	26-Feb-19 11:09:58
82CA432E6638C7B7E0540010E0766450	26-Feb-19 11:10:10
82CA432E663AC7B7E0540010E0766450	26-Feb-19 11:10:10
82CA432E663CC7B7E0540010E0766450	26-Feb-19 11:10:10
82CA432E663EC7B7E0540010E0766450	26-Feb-19 11:10:11
82CA4471201A1E53E0540010E0766450	26-Feb-19 11:10:31
82CA4471201C1E53E0540010E0766450	26-Feb-19 11:10:32
82CA4471201E1E53E0540010E0766450	26-Feb-19 11:10:32
82CA447120201E53E0540010E0766450	26-Feb-19 11:10:32
82CA450A07532611E0540010E0766450	26-Feb-19 11:10:41
82CA450A07552611E0540010E0766450	26-Feb-19 11:10:42
82CA450A07572611E0540010E0766450	26-Feb-19 11:10:42
82CA450A07592611E0540010E0766450	26-Feb-19 11:10:42
82CA45B0C50A2D46E0540010E0766450	26-Feb-19 11:10:52
82CA45B0C50C2D46E0540010E0766450	26-Feb-19 11:10:53
82CA45B0C50E2D46E0540010E0766450	26-Feb-19 11:10:53
82CA45B0C5102D46E0540010E0766450	26-Feb-19 11:10:53
82CA48A7B0795D25E0540010E0766450	26-Feb-19 11:11:42
82CA48A7B07B5D25E0540010E0766450	26-Feb-19 11:11:42
82CA48A7B07D5D25E0540010E0766450	26-Feb-19 11:11:42
82CA48A7B07F5D25E0540010E0766450	26-Feb-19 11:11:42
82CA493AECB6606AE0540010E0766450	26-Feb-19 11:11:52
82CA493AECB8606AE0540010E0766450	26-Feb-19 11:11:52
82CA493AECBA606AE0540010E0766450	26-Feb-19 11:11:52
82CA493AECBC606AE0540010E0766450	26-Feb-19 11:11:52
82CA49C6FDA7627CE0540010E0766450	26-Feb-19 11:12:01
82CA49C6FDA9627CE0540010E0766450	26-Feb-19 11:12:01
82CA49C6FDAB627CE0540010E0766450	26-Feb-19 11:12:01
82CA49C6FDAD627CE0540010E0766450	26-Feb-19 11:12:01
82CA4A5F5DB375E7E0540010E0766450	26-Feb-19 11:12:11
82CA4A5F5DB575E7E0540010E0766450	26-Feb-19 11:12:11
82CA4A5F5DB775E7E0540010E0766450	26-Feb-19 11:12:11
82CA4A5F5DB975E7E0540010E0766450	26-Feb-19 11:12:11

@SebGt
Copy link

SebGt commented Feb 26, 2019

Using flashback for query 1, I found these records:

select output_id, is_finished, count(text)
 from ut_output_buffer_tmp
AS OF TIMESTAMP
  TO_TIMESTAMP('2019-02-26 11:12:09', 'YYYY-MM-DD HH:MI:SS')
 group by rollup (output_id, is_finished);
82CA49C6FDA7627CE0540010E0766450	0	57
82CA49C6FDA7627CE0540010E0766450		57
		57

Same at TO_TIMESTAMP('2019-02-26 11:12:10'
And table is empty at TO_TIMESTAMP('2019-02-26 11:12:11'

@pesse
Copy link
Member

pesse commented Feb 28, 2019

@PhilippSalvisberg found the cause for the problem and we discussed approaches.
The main issue seems to be somewhere in the Oracle driver in combination with Object Types. In fact, when facing the scenario of @c0dw, the java stack shows that the Thread which calls the TestRunner is in waiting state at

"pool-1-thread-1" #15 prio=5 os_prio=0 tid=0x00007f81505f8800 nid=0x32 waiting for monitor entry [0x00007f813c9f1000]
   java.lang.Thread.State: BLOCKED (on object monitor)
  at oracle.sql.STRUCT.getOracleAttributes(STRUCT.java:324)
  - waiting to lock <0x00000000ed8e6d58> (a oracle.jdbc.driver.T4CConnection)
  at oracle.jdbc.oracore.OracleTypeADT.pickle81rec(OracleTypeADT.java:1925)
  at oracle.jdbc.oracore.OracleTypeADT.pickle81(OracleTypeADT.java:1844)
  at oracle.jdbc.oracore.OracleTypeUPT.pickle81(OracleTypeUPT.java:202)
  at oracle.jdbc.oracore.OracleTypeADT.pickle81rec(OracleTypeADT.java:1990)
  at oracle.jdbc.oracore.OracleTypeADT.pickle81(OracleTypeADT.java:1844)
  at oracle.jdbc.oracore.OracleTypeUPT.pickle81(OracleTypeUPT.java:202)
  at oracle.jdbc.oracore.OracleTypeCOLLECTION.pickle81(OracleTypeCOLLECTION.java:294)
  at oracle.jdbc.oracore.OracleTypeADT.pickle81(OracleTypeADT.java:1825)
  at oracle.jdbc.oracore.OracleTypeADT.linearize(OracleTypeADT.java:1465)
  - locked <0x00000000ebf48eb0> (a oracle.jdbc.driver.T4CConnection)
  at oracle.sql.ArrayDescriptor.toBytes(ArrayDescriptor.java:654)
  at oracle.sql.ARRAY.toBytes(ARRAY.java:681)
  - locked <0x00000000ebf48eb0> (a oracle.jdbc.driver.T4CConnection)
  at oracle.jdbc.driver.OraclePreparedStatement.setArrayCritical(OraclePreparedStatement.java:5874)
  at oracle.jdbc.driver.OraclePreparedStatement.setARRAYInternal(OraclePreparedStatement.java:5837)
  - locked <0x00000000ebf48eb0> (a oracle.jdbc.driver.T4CConnection)
  at oracle.jdbc.driver.OraclePreparedStatement.setArrayInternal(OraclePreparedStatement.java:5804)
  at oracle.jdbc.driver.OracleCallableStatement.setArray(OracleCallableStatement.java:4280)
  at oracle.jdbc.driver.OraclePreparedStatementWrapper.setArray(OraclePreparedStatementWrapper.java:129)
  at com.zaxxer.hikari.pool.HikariProxyCallableStatement.setArray(HikariProxyCallableStatement.java)
  at org.utplsql.api.testRunner.AbstractTestRunnerStatement.createStatement(AbstractTestRunnerStatement.java:47)
  at org.utplsql.api.testRunner.ActualTestRunnerStatement.createStatement(ActualTestRunnerStatement.java:43)
  at org.utplsql.api.testRunner.AbstractTestRunnerStatement.<init>(AbstractTestRunnerStatement.java:33)
  at org.utplsql.api.testRunner.ActualTestRunnerStatement.<init>(ActualTestRunnerStatement.java:16)
  at org.utplsql.api.testRunner.TestRunnerStatementProvider.getCompatibleTestRunnerStatement(TestRunnerStatementProvider.java:36)
  at org.utplsql.api.compatibility.CompatibilityProxy.getTestRunnerStatement(CompatibilityProxy.java:154)
  at org.utplsql.api.TestRunner.run(TestRunner.java:148)
  at org.utplsql.cli.RunCommand.lambda$run$0(RunCommand.java:193)
  at org.utplsql.cli.RunCommand$$Lambda$28/1523457748.run(Unknown Source)
  at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
  at java.util.concurrent.FutureTask.run(FutureTask.java:266)
  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  at java.lang.Thread.run(Thread.java:748)

This also explains why we have the reporter IDs in ut_output_buffer_info_tmp but no lines in ut_output_buffer_tmp for the ID: The reporter is correctly initialized, but TestRunner never started its work.
The good part is that the stuck situation appears in Statement creation, not execution. Therefore I'll try to implement a watchdog which retries to create the TestRunner (probably with a new connection) if the createStatement part needs more time as expected.
The watchdog will allow us to work around this problem and also give us the opportunity to further analyze the issue, maybe resulting in a SR at oracle.

@PhilippSalvisberg
Copy link
Member

@pesse, fyi I can reproduce the error on my local machine with this test case (running a single utPLSQL test):

package org.utplsql.cli;

import static org.junit.jupiter.api.Assertions.assertEquals;
import static org.junit.jupiter.api.Assertions.assertTrue;

import java.util.List;

import org.junit.jupiter.api.Test;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.utplsql.api.reporter.CoreReporters;

/**
 * Unit test for run command.
 */
class RunCommandIssue20Test {
	
    private static final Logger logger = LoggerFactory.getLogger(RunCommandIssue20Test.class);

    @Test
    void runLoop() {
        RunCommand runCmd = TestHelper.createRunCommand("plscope/plscope@//macphs:1521/odb.docker", "-p=test_etl.test_sal_of_dept", "-f=ut_documentation_reporter");
        List<ReporterOptions> reporterOptionsList = runCmd.getReporterOptionsList();
        ReporterOptions reporterOptions1 = reporterOptionsList.get(0);
        assertEquals(CoreReporters.UT_DOCUMENTATION_REPORTER.name(), reporterOptions1.getReporterName());
        assertTrue(reporterOptions1.outputToScreen());
        // Loop in same JVM, uses a lot of new connections without closing existing ones, this might lead to 
        //   "Could not establish connection to database. Reason: IO Error: Got minus one from a read call"
        // before hitting the hanger at oracle.jdbc.driver.OracleStruct.getOracleAttributes(OracleStruct.java:347)
        // You may increase processes and implicitly sessions by "alter system set processes=1024 scope=spfile;"
        for (int i=0; i <= 120; i++) {
        	logger.info("=======================");
        	logger.info("Loop number " + i);
        	logger.info("=======================");
        	int result = runCmd.run();
        	if (result != 0) {
        		logger.error("Got an error during run. Return Code was " + result + "." );
        		break;
        	}
        }
    }

}

Against a 12.2 I need to run the test 3 to 4 times until test run hangs. However, when I run it against 18.5 or 19.2 then issue happens always during the first run. So, if you are going to test it locally, it will help to use a newer version of the Oracle Database to reproduce the error (and check if the workaround solves the problem).

The thread dumps I've taken are consistent with the one mentioned above.

@pesse
Copy link
Member

pesse commented Mar 4, 2019

Awesome! Thank you very much, @PhilippSalvisberg!

@pesse
Copy link
Member

pesse commented Mar 14, 2019

@c0dw @SebGt Could you please both try the following develop-version of cli?
https://bintray.com/utplsql/utPLSQL-cli/download_file?file_path=utPLSQL-cli-develop-201903142211.zip
There should be no stuck anymore.
In case of the previously stuck-situation, there should be now a log-message like
WARNING: Caught Oracle stuck during creation of Runner-Statement. Retrying (1)

@pesse pesse reopened this Mar 14, 2019
@SebGt
Copy link

SebGt commented Mar 15, 2019

@pesse, thanks. I will test it today and I give my feedback asap.

@SebGt
Copy link

SebGt commented Mar 15, 2019

@pesse, All my tests (around 30 different set of tests) were all passing this morning.
Thank you very much to have fixed this issue.

I will see the next days if it is confirmed but I have no doubt

@pesse
Copy link
Member

pesse commented Mar 15, 2019

Great! Thanks for the feedback.
Did you see this line?
WARNING: Caught Oracle stuck during creation of Runner-Statement. Retrying (1)

@SebGt
Copy link

SebGt commented Mar 15, 2019

No but I did not check all outputs. I will if it can help you.

@pesse
Copy link
Member

pesse commented Mar 15, 2019

If you could check it would be great because it means the (suspected) Oracle bug still occurs but is caught reliably (which is a far better message than the Oracle bug we tried to catch didn't occur for whatever reason during x runs)

@SebGt
Copy link

SebGt commented Mar 15, 2019

@pesse, none of my executions (~20 Jenkins jobs) displayed a warning.
They are all like that :

Successfully connected to database. UtPLSQL core: v3.1.3.2402
Oracle-Version: 11.2.0.4.0
Running tests now.
--------------------------------------
TestRunner initialized
Running on utPLSQL v3.1.3.2402
Initializing reporters
Running tests
Running tests finished.
--------------------------------------

I will check next week.

@SebGt
Copy link

SebGt commented Mar 18, 2019

Hi @pesse, most of my tests ran without any issues today. Only one had the warning message twice time. Between the 2 runs with the warning I had other runs of the test without issues

First one:

##################### utPLSQL cli #####################
#                                                     #
#   utPLSQL-cli 3.1.3-SNAPSHOT.local                  #
#   utPLSQL-java-api 3.1.3-SNAPSHOT.390               #
#   Java-Version: 1.8.0_191                           #
#   ORACLE_HOME: \\Programs\ORACLE                    #
#   NLS_LANG: null                                    #
#                                                     #
#   Thanks for testing!                               #
#                                                     #
#######################################################

Use connection string jdbc:oracle:thin:****/****@dev_xxx.WORLD
Successfully connected to database. UtPLSQL core: v3.1.3.2402
Oracle-Version: 11.2.0.4.0
Running tests now.
--------------------------------------
TestRunner initialized
Running on utPLSQL v3.1.3.2402
Initializing reporters
Detected Oracle driver stuck during Statement initialization
HikariPool-1 - Connection oracle.jdbc.driver.T4CConnection@6c49835d marked as broken because of SQLSTATE(08003), ErrorCode(17008)
java.sql.SQLRecoverableException: Closed Connection
	at oracle.jdbc.driver.PhysicalConnection.clearWarnings(PhysicalConnection.java:2573)
	at com.zaxxer.hikari.pool.ProxyConnection.close(ProxyConnection.java:249)
	at org.utplsql.cli.ReporterManager.lambda$startReporterGatherers$0(ReporterManager.java:130)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
HikariPool-1 - Connection oracle.jdbc.driver.T4CConnection@7d523fbc marked as broken because of SQLSTATE(08003), ErrorCode(17008)
java.sql.SQLRecoverableException: Closed Connection
	at oracle.jdbc.driver.PhysicalConnection.clearWarnings(PhysicalConnection.java:2573)
	at com.zaxxer.hikari.pool.ProxyConnection.close(ProxyConnection.java:249)
	at org.utplsql.cli.ReporterManager.lambda$startReporterGatherers$0(ReporterManager.java:130)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
HikariPool-1 - Connection oracle.jdbc.driver.T4CConnection@5081ad32 marked as broken because of SQLSTATE(08003), ErrorCode(17008)
java.sql.SQLRecoverableException: Closed Connection
	at oracle.jdbc.driver.PhysicalConnection.prepareCall(PhysicalConnection.java:1708)
	at oracle.jdbc.driver.PhysicalConnection.prepareCall(PhysicalConnection.java:1677)
	at oracle.jdbc.oracore.OracleTypeADT.initMetadata11_2(OracleTypeADT.java:741)
	at oracle.jdbc.oracore.OracleTypeADT.initMetadata(OracleTypeADT.java:479)
	at oracle.jdbc.oracore.OracleTypeADT.init(OracleTypeADT.java:443)
	at oracle.sql.ArrayDescriptor.initPickler(ArrayDescriptor.java:1499)
	at oracle.sql.ArrayDescriptor.<init>(ArrayDescriptor.java:274)
	at oracle.sql.ArrayDescriptor.createDescriptor(ArrayDescriptor.java:127)
	at oracle.sql.ArrayDescriptor.createDescriptor(ArrayDescriptor.java:79)
	at oracle.jdbc.driver.OraclePreparedStatement.setNullCritical(OraclePreparedStatement.java:4359)
	at oracle.jdbc.driver.OraclePreparedStatement.setNullInternal(OraclePreparedStatement.java:4321)
	at oracle.jdbc.driver.OraclePreparedStatement.setNull(OraclePreparedStatement.java:4299)
	at oracle.jdbc.driver.OraclePreparedStatementWrapper.setNull(OraclePreparedStatementWrapper.java:1013)
	at com.zaxxer.hikari.pool.HikariProxyCallableStatement.setNull(HikariProxyCallableStatement.java)
	at org.utplsql.api.testRunner.AbstractTestRunnerStatement.createStatement(AbstractTestRunnerStatement.java:58)
	at org.utplsql.api.testRunner.ActualTestRunnerStatement.createStatement(ActualTestRunnerStatement.java:44)
	at org.utplsql.api.testRunner.AbstractTestRunnerStatement.<init>(AbstractTestRunnerStatement.java:33)
	at org.utplsql.api.testRunner.ActualTestRunnerStatement.<init>(ActualTestRunnerStatement.java:17)
	at org.utplsql.api.testRunner.TestRunnerStatementProvider.getCompatibleTestRunnerStatement(TestRunnerStatementProvider.java:44)
	at org.utplsql.api.compatibility.CompatibilityProxy.getTestRunnerStatement(CompatibilityProxy.java:157)
	at org.utplsql.api.TestRunner.lambda$initStatementWithTimeout$0(TestRunner.java:199)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
WARNING: Caught Oracle stuck during creation of Runner-Statement. Retrying (1)
##################### utPLSQL cli #####################
#                                                     #
#   utPLSQL-cli 3.1.3-SNAPSHOT.local                  #
#   utPLSQL-java-api 3.1.3-SNAPSHOT.390               #
#   Java-Version: 1.8.0_191                           #
#   ORACLE_HOME: \\Programs\ORACLE                    #
#   NLS_LANG: null                                    #
#                                                     #
#   Thanks for testing!                               #
#                                                     #
#######################################################

Use connection string jdbc:oracle:thin:****/****@dev_xxx.WORLD
Successfully connected to database. UtPLSQL core: v3.1.3.2402
Oracle-Version: 11.2.0.4.0
Running tests now.
--------------------------------------
TestRunner initialized
Running on utPLSQL v3.1.3.2402
Initializing reporters
Running tests
Running tests finished.
--------------------------------------
All tests done.

Second one:

##################### utPLSQL cli #####################
#                                                     #
#   utPLSQL-cli 3.1.3-SNAPSHOT.local                  #
#   utPLSQL-java-api 3.1.3-SNAPSHOT.390               #
#   Java-Version: 1.8.0_191                           #
#   ORACLE_HOME: \\Programs\ORACLE                    #
#   NLS_LANG: null                                    #
#                                                     #
#   Thanks for testing!                               #
#                                                     #
#######################################################

Use connection string jdbc:oracle:thin:****/****@dev_xxx.WORLD
Successfully connected to database. UtPLSQL core: v3.1.3.2402
Oracle-Version: 11.2.0.4.0
Running tests now.
--------------------------------------
TestRunner initialized
Running on utPLSQL v3.1.3.2402
Initializing reporters
Detected Oracle driver stuck during Statement initialization
HikariPool-1 - Connection oracle.jdbc.driver.T4CConnection@6c49835d marked as broken because of SQLSTATE(08003), ErrorCode(17008)
java.sql.SQLRecoverableException: Closed Connection
	at oracle.jdbc.driver.PhysicalConnection.clearWarnings(PhysicalConnection.java:2573)
	at com.zaxxer.hikari.pool.ProxyConnection.close(ProxyConnection.java:249)
	at org.utplsql.cli.ReporterManager.lambda$startReporterGatherers$0(ReporterManager.java:130)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
HikariPool-1 - Connection oracle.jdbc.driver.T4CConnection@2b370e4b marked as broken because of SQLSTATE(08003), ErrorCode(17008)
java.sql.SQLRecoverableException: Closed Connection
	at oracle.jdbc.driver.PhysicalConnection.clearWarnings(PhysicalConnection.java:2573)
	at com.zaxxer.hikari.pool.ProxyConnection.close(ProxyConnection.java:249)
	at org.utplsql.cli.ReporterManager.lambda$startReporterGatherers$0(ReporterManager.java:130)
	at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source)
	at java.util.concurrent.FutureTask.run(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
	at java.lang.Thread.run(Unknown Source)
WARNING: Caught Oracle stuck during creation of Runner-Statement. Retrying (1)
##################### utPLSQL cli #####################
#                                                     #
#   utPLSQL-cli 3.1.3-SNAPSHOT.local                  #
#   utPLSQL-java-api 3.1.3-SNAPSHOT.390               #
#   Java-Version: 1.8.0_191                           #
#   ORACLE_HOME: \\Programs\ORACLE                    #
#   NLS_LANG: null                                    #
#                                                     #
#   Thanks for testing!                               #
#                                                     #
#######################################################

Use connection string jdbc:oracle:thin:****/****@dev_xxx.WORLD
Successfully connected to database. UtPLSQL core: v3.1.3.2402
Oracle-Version: 11.2.0.4.0
Running tests now.
--------------------------------------
TestRunner initialized
Running on utPLSQL v3.1.3.2402
Initializing reporters
Running tests
Running tests finished.
--------------------------------------
All tests done.

Don't hesitate if you need more details.

@pesse
Copy link
Member

pesse commented Mar 18, 2019

Thank you very much @SebGt !
The log proves that the bug is caught and dealt with :)

@SebGt
Copy link

SebGt commented Mar 18, 2019

Thank you very much @SebGt !
The log proves that the bug is caught and dealt with :)

You're welcome.
I don't have stuck execution anymore.
Thank you for taking the time to solve this problem related to the 11g.

@pesse
Copy link
Member

pesse commented Mar 19, 2019

@c0dw and @PhilippSalvisberg did some extensive testing and analysis yesterday (on Oracle 12.2) which they shared with me afterward. The findings:

  • Watchdog catches the stuck situation reliably (appeared 5 times in 200 runs)
  • In one case the stuck situation happened two times in a row but was resolved in the 3rd try
  • The reported results were correct

The I/O exception caused by aborting the connection might be a bit irritating, so I'll try to only show it in debug-mode.
I think we can finally close that issue now 🍾
Thanks for all your help (also @SebGt and @jgebal) - I wouldn't have found the source without you!

@pesse pesse closed this as completed Mar 19, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants