-
Notifications
You must be signed in to change notification settings - Fork 409
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[#2226] dev(docker): Add a Hive with Kerberos mode Docker image #2364
Conversation
|
||
<property> | ||
<name>dfs.namenode.kerberos.principal</name> | ||
<value>hdfs/mockhost@HADOOPKRB</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can this "mockhost" work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
HADOOPKRB = { | ||
kdc = tcp/localhost:88 | ||
admin_server = localhost | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the usage of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define the address and port of the realm of HADOOPKRB
. This is a general practice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mean what is the usage of this "FILE", you have a similar file "krb5.conf".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As explained by @qqqttt123 , krb5formac.conf
is used as the configuration files for ITs to connect KDC installed the docker and need to be copied to the host computer, however krb5.conf
is used as the files for the KDC server as the configuration files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add more comments on this. Also in which scenario do you need to copy this conf to the host machine?
<configuration> | ||
<property> | ||
<name>mapreduce.framework.name</name> | ||
<value>yarn</value> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here you configured yarn, that's why it must run on yarn. @mchades .
We don't have to set this, it will run locally, can you try this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we modify Docker without Kerberos by the way about this point? @jerryshao
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think so. Besides, can you try if it works or not?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So here you configured yarn, that's why it must run on yarn. @mchades .
We don't have to set this, it will run locally, can you try this.
You are right, I have already verified it locally. We just need to change the value to local
, and then we can execute HQL in the Hive server without relying on yarn.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jerryshao
All has been resolved. I have added yarn-site.xml
back as we need to configuration Kerberos for yarn or the following command will throw exception:
Hive> insert into default.t1 values(1);
Exception stacks are as follows:
WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20240521075212_df45dfa5-c63a-4168-b90f-5ee02f6bc716
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: Can't get Master Kerberos principal for use as renewer
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:166)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:414)
at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.io.IOException(Can't get Master Kerberos principal for use as renewer)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Can't get Master Kerberos principal for use as renewer
${HADOOP_HOME}/sbin/hadoop-daemon.sh start namenode | ||
|
||
echo "Starting DataNode..." | ||
${HADOOP_HOME}/sbin/start-secure-dns.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the command to start DN?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping me when everything is worked and code is polished/ready-to-review.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this the command to start DN?
Yes, Hive with Kerberos enable is different with that without Kerberos, If we use ${HADOOP_HOME}/sbin/hadoop-daemon.sh start datanode
, the following error will occur:
JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run secure datanodes
That's why Qe Qi change it.
@jerryshao |
# This file is used to configure the Kerberos client on the Mac OS. For example, | ||
# HDFS client need to use this file to connect to the Kerberos server. Code as below: | ||
# System.setProperty("java.security.krb5.conf", "/tmp/krb5.conf"); and the file '/tmp/krb5.conf' | ||
# is the content of this file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One left issue, can this be used for Linux. If so, you'd better change to a better file name.
### What changes were proposed in this pull request? Add all the Docker image files required by Hive with Kerberos enabled. ### Why are the changes needed? We need to confirm that everything goes smoothly in a Kerberos-enabled Hive cluster. Fix: #2226 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Test locally. --------- Co-authored-by: Heng Qin <[email protected]> Co-authored-by: yuqi <[email protected]>
…apache#2364) ### What changes were proposed in this pull request? Add all the Docker image files required by Hive with Kerberos enabled. ### Why are the changes needed? We need to confirm that everything goes smoothly in a Kerberos-enabled Hive cluster. Fix: apache#2226 ### Does this PR introduce _any_ user-facing change? N/A ### How was this patch tested? Test locally. --------- Co-authored-by: Heng Qin <[email protected]> Co-authored-by: yuqi <[email protected]>
What changes were proposed in this pull request?
Add all the Docker image files required by Hive with Kerberos enabled.
Why are the changes needed?
We need to confirm that everything goes smoothly in a Kerberos-enabled Hive cluster.
Fix: #2226
Fix: #3408
Does this PR introduce any user-facing change?
N/A
How was this patch tested?
Test locally.