Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Improvement] The interval of container status check is too short #2270

Closed
mchades opened this issue Feb 20, 2024 · 1 comment
Closed

[Improvement] The interval of container status check is too short #2270

mchades opened this issue Feb 20, 2024 · 1 comment
Assignees
Labels
improvement Improvements on everything

Comments

@mchades
Copy link
Contributor

mchades commented Feb 20, 2024

What would you like to be improved?

There are some failing tests because of container status check failed. see example:

image

Corresponding key logs are as follows:

......
2024-02-20 00:23:38 INFO 8:543 - Container datastrato/gravitino-ci-hive:0.1.8 started in PT56.579618497S
2024-02-20 00:23:40 ERROR HiveContainer:78 - Command [bash /tmp/check-status.sh] exited with 1
2024-02-20 00:23:40 ERROR HiveContainer:79 - stderr: ++ hdfs dfsadmin -report
++ grep 'Live datanodes'
++ awk '{print $3}'
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

  • hdfs_ready=
  • [[ '' == (\1): ]]
  • echo 'HDFS is not ready'
  • exit 1

2024-02-20 00:23:40 ERROR HiveContainer:80 - stdout: HDFS is not ready

2024-02-20 00:23:46 ERROR HiveContainer:78 - Command [bash /tmp/check-status.sh] exited with 1
2024-02-20 00:23:46 ERROR HiveContainer:79 - stderr: ++ hdfs dfsadmin -report
++ grep 'Live datanodes'
++ awk '{print $3}'
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

  • hdfs_ready=
  • [[ '' == (\1): ]]
  • echo 'HDFS is not ready'
  • exit 1

2024-02-20 00:23:46 ERROR HiveContainer:80 - stdout: HDFS is not ready

2024-02-20 00:23:53 ERROR HiveContainer:78 - Command [bash /tmp/check-status.sh] exited with 1
2024-02-20 00:23:53 ERROR HiveContainer:79 - stderr: ++ hdfs dfsadmin -report
++ grep 'Live datanodes'
++ awk '{print $3}'
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

  • hdfs_ready=
  • [[ '' == (\1): ]]
  • echo 'HDFS is not ready'
  • exit 1

2024-02-20 00:23:53 ERROR HiveContainer:80 - stdout: HDFS is not ready

2024-02-20 00:23:59 ERROR HiveContainer:78 - Command [bash /tmp/check-status.sh] exited with 1
2024-02-20 00:23:59 ERROR HiveContainer:79 - stderr: ++ hdfs dfsadmin -report
++ grep 'Live datanodes'
++ awk '{print $3}'
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

  • hdfs_ready=
  • [[ '' == (\1): ]]
  • echo 'HDFS is not ready'
  • exit 1

2024-02-20 00:23:59 ERROR HiveContainer:80 - stdout: HDFS is not ready

2024-02-20 00:24:06 ERROR HiveContainer:78 - Command [bash /tmp/check-status.sh] exited with 1
2024-02-20 00:24:06 ERROR HiveContainer:79 - stderr: ++ hdfs dfsadmin -report
++ grep 'Live datanodes'
++ awk '{print $3}'
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=128m; support was removed in 8.0

  • hdfs_ready=
  • [[ '' == (\1): ]]
  • echo 'HDFS is not ready'
  • exit 1

2024-02-20 00:24:06 ERROR HiveContainer:80 - stdout: HDFS is not ready

2024-02-20 00:24:11 INFO HiveConf:187 - Found configuration file null
2024-02-20 00:24:11 WARN NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
2024-02-20 00:24:11 INFO metastore:405 - Trying to connect to metastore with URI thrift://172.17.0.3:9083
2024-02-20 00:24:11 INFO metastore:479 - Opened a connection to metastore, current connections: 1
......

We can see that the Hive container has started, but the check fails before HDFS is initialized, resulting in a Precondition failed

How should we improve?

Retry interval should be increased

@mchades
Copy link
Contributor Author

mchades commented Apr 11, 2024

It may fixed by #2871

@mchades mchades closed this as not planned Won't fix, can't repro, duplicate, stale Apr 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
improvement Improvements on everything
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants