Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[#2226] dev(docker): Add a Hive with Kerberos mode Docker image #2364

Merged
merged 21 commits into from
May 21, 2024

Conversation

qqqttt123
Copy link
Contributor

@qqqttt123 qqqttt123 commented Feb 27, 2024

What changes were proposed in this pull request?

Add all the Docker image files required by Hive with Kerberos enabled.

Why are the changes needed?

We need to confirm that everything goes smoothly in a Kerberos-enabled Hive cluster.

Fix: #2226
Fix: #3408

Does this PR introduce any user-facing change?

N/A

How was this patch tested?

Test locally.

@qqqttt123 qqqttt123 marked this pull request as draft February 27, 2024 11:21
@yuqi1129 yuqi1129 marked this pull request as ready for review May 20, 2024 08:26
@yuqi1129 yuqi1129 requested a review from jerryshao May 20, 2024 13:22

<property>
<name>dfs.namenode.kerberos.principal</name>
<value>hdfs/mockhost@HADOOPKRB</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can this "mockhost" work?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"mockhost" will become the real value.

image

HADOOPKRB = {
kdc = tcp/localhost:88
admin_server = localhost
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is the usage of this file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Define the address and port of the realm of HADOOPKRB. This is a general practice

Copy link
Contributor

@jerryshao jerryshao May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mean what is the usage of this "FILE", you have a similar file "krb5.conf".

Copy link
Contributor

@yuqi1129 yuqi1129 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As explained by @qqqttt123 , krb5formac.conf is used as the configuration files for ITs to connect KDC installed the docker and need to be copied to the host computer, however krb5.conf is used as the files for the KDC server as the configuration files.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add more comments on this. Also in which scenario do you need to copy this conf to the host machine?

<configuration>
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here you configured yarn, that's why it must run on yarn. @mchades .

We don't have to set this, it will run locally, can you try this.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we modify Docker without Kerberos by the way about this point? @jerryshao

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. Besides, can you try if it works or not?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So here you configured yarn, that's why it must run on yarn. @mchades .

We don't have to set this, it will run locally, can you try this.

You are right, I have already verified it locally. We just need to change the value to local, and then we can execute HQL in the Hive server without relying on yarn.

Copy link
Contributor

@yuqi1129 yuqi1129 May 21, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jerryshao
All has been resolved. I have added yarn-site.xml back as we need to configuration Kerberos for yarn or the following command will throw exception:

Hive> insert into default.t1 values(1);

Exception stacks are as follows:

WARNING: Hive-on-MR is deprecated in Hive 2 and may not be available in the future versions. Consider using a different execution engine (i.e. spark, tez) or using Hive 1.X releases.
Query ID = root_20240521075212_df45dfa5-c63a-4168-b90f-5ee02f6bc716
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks is set to 0 since there's no reduce operator
java.io.IOException: Can't get Master Kerberos principal for use as renewer
	at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:116)
	at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodesInternal(TokenCache.java:100)
	at org.apache.hadoop.mapreduce.security.TokenCache.obtainTokensForNamenodes(TokenCache.java:80)
	at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:166)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
	at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
	at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
	at java.security.AccessController.doPrivileged(Native Method)
	at javax.security.auth.Subject.doAs(Subject.java:422)
	at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
	at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
	at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
	at org.apache.hadoop.hive.ql.exec.mr.ExecDriver.execute(ExecDriver.java:414)
	at org.apache.hadoop.hive.ql.exec.mr.MapRedTask.execute(MapRedTask.java:151)
	at org.apache.hadoop.hive.ql.exec.Task.executeTask(Task.java:199)
	at org.apache.hadoop.hive.ql.exec.TaskRunner.runSequential(TaskRunner.java:100)
	at org.apache.hadoop.hive.ql.Driver.launchTask(Driver.java:2183)
	at org.apache.hadoop.hive.ql.Driver.execute(Driver.java:1839)
	at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1526)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1237)
	at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1227)
	at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:233)
	at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:184)
	at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:403)
	at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:821)
	at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:759)
	at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:686)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:498)
	at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:136)
Job Submission failed with exception 'java.io.IOException(Can't get Master Kerberos principal for use as renewer)'
FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask. Can't get Master Kerberos principal for use as renewer

${HADOOP_HOME}/sbin/hadoop-daemon.sh start namenode

echo "Starting DataNode..."
${HADOOP_HOME}/sbin/start-secure-dns.sh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the command to start DN?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ping me when everything is worked and code is polished/ready-to-review.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the command to start DN?

Yes, Hive with Kerberos enable is different with that without Kerberos, If we use ${HADOOP_HOME}/sbin/hadoop-daemon.sh start datanode, the following error will occur:

JSVC_HOME is not set correctly so jsvc cannot be found. jsvc is required to run secure datanodes

That's why Qe Qi change it.

@yuqi1129
Copy link
Contributor

@jerryshao
Do you have any further comments on it?

# This file is used to configure the Kerberos client on the Mac OS. For example,
# HDFS client need to use this file to connect to the Kerberos server. Code as below:
# System.setProperty("java.security.krb5.conf", "/tmp/krb5.conf"); and the file '/tmp/krb5.conf'
# is the content of this file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One left issue, can this be used for Linux. If so, you'd better change to a better file name.

@yuqi1129 yuqi1129 merged commit c73ddfa into apache:main May 21, 2024
23 checks passed
github-actions bot pushed a commit that referenced this pull request May 21, 2024
### What changes were proposed in this pull request?

Add all the Docker image files required by Hive with Kerberos enabled.

### Why are the changes needed?

We need to confirm that everything goes smoothly in a Kerberos-enabled
Hive cluster.

Fix: #2226 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Test locally.

---------

Co-authored-by: Heng Qin <[email protected]>
Co-authored-by: yuqi <[email protected]>
diqiu50 pushed a commit to diqiu50/gravitino that referenced this pull request Jun 13, 2024
…apache#2364)

### What changes were proposed in this pull request?

Add all the Docker image files required by Hive with Kerberos enabled.

### Why are the changes needed?

We need to confirm that everything goes smoothly in a Kerberos-enabled
Hive cluster.

Fix: apache#2226 

### Does this PR introduce _any_ user-facing change?

N/A

### How was this patch tested?

Test locally.

---------

Co-authored-by: Heng Qin <[email protected]>
Co-authored-by: yuqi <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add KDC and Hive with security in docker container [Improvement] Add a Kerberos CI image
4 participants