Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability for aws_glue_job to use Python 3 and glue version 1.0 #9524

Closed
robertaves opened this issue Jul 26, 2019 · 22 comments · Fixed by #10237
Closed

Add ability for aws_glue_job to use Python 3 and glue version 1.0 #9524

robertaves opened this issue Jul 26, 2019 · 22 comments · Fixed by #10237
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/glue Issues and PRs that pertain to the glue service.
Milestone

Comments

@robertaves
Copy link

Community Note

  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment

Description

AWS Glue now supports the ability to run ETL jobs on Apache Spark 2.4.3 (with Python 3).. terraform support needed

New or Affected Resource(s)

aws_glue_job

Potential Terraform Configuration

resource "aws_glue_job" "aws_glue_job_foo" {
  glue_version = "1"
  name         = "job-name"
  description  = "job-desc"
  role_arn     = data.aws_iam_role.aws_glue_iam_role.arn
  max_capacity = 1
  max_retries  = 1
  connections  = [aws_glue_connection.connection.name]
  timeout      = 5

  command {
    name            = "pythonshell"
    script_location = "s3://bucket/script.py"
    python_version  = "3"
  }

  default_arguments = {    
    "--job-language" = "python"
    "--ENV"          = "env"
    "--ROLE_ARN"     = data.aws_iam_role.aws_glue_iam_role.arn
  }

  execution_property {
    max_concurrent_runs = 1
  }
}

References

@robertaves robertaves added the enhancement Requests to existing resources that expand the functionality or scope. label Jul 26, 2019
@github-actions github-actions bot added the needs-triage Waiting for first response or review from a maintainer. label Jul 26, 2019
@ewbankkit
Copy link
Contributor

It looks like this is enabled via the Glue version for a job, added in AWS SDK v1.21.4.
Requires:

@bflad bflad added service/glue Issues and PRs that pertain to the glue service. and removed needs-triage Waiting for first response or review from a maintainer. labels Jul 31, 2019
@nywilken
Copy link
Contributor

Related: #9409

@ezidio
Copy link

ezidio commented Aug 7, 2019

+1

@ezidio
Copy link

ezidio commented Aug 8, 2019

Alternative to set python and glue version

resource "aws_glue_job" "etl" {
  name     = "${var.job_name}"
  role_arn = "${var.iam_role_arn}"

  command {
    script_location = "s3://${var.bucket_name}/${aws_s3_bucket_object.script.key}"
  }

  default_arguments = {
    "--enable-metrics" = ""
    "--job-language" = "python"
    "--TempDir" = "s3://${var.bucket_name}/TEMP"
  }

  # Manually set python 3 and glue 1.0
  provisioner "local-exec" {
    command = "aws glue update-job --job-name ${var.job_name} --job-update 'Command={ScriptLocation=s3://${var.bucket_name}/${aws_s3_bucket_object.script.key},PythonVersion=3,Name=glueetl},GlueVersion=1.0,Role=${var.iam_role_arn},DefaultArguments={--enable-metrics=\"\",--job-language=python,--TempDir=\"s3://${var.bucket_name}/TEMP\"}'"
  }
}

@Vedant-R
Copy link

Vedant-R commented Aug 21, 2019

Any idea when does this change get pushed?

The solution/workaround provided by @ezidio works exactly as expected.

But it would be good if this change is made through terraform and pushed.

@jarosmpost
Copy link

I will do the same as @ezidio suggests, but with job-language scala instead of python.
I also think it would be good if it would work without this workaround.

@whitney
Copy link

whitney commented Sep 9, 2019

Maybe more urgent given the announcement of python 2's official sunsetting on Jan 1 2020.

@ztane
Copy link

ztane commented Sep 11, 2019

the "workaround" is a horrible PITA as all the arguments need to be flattened into one string... with nested escaping.

@ztane
Copy link

ztane commented Sep 11, 2019

In fact there is no proper workaround because any modification will reset the job back to Python 2. The local-exec provisioner will not be rerun.

@Vedant-R
Copy link

In fact there is no proper workaround because any modification will reset the job back to Python 2. The local-exec provisioner will not be rerun.

I think the below script logic should do the job for you. Using null resource based on timestamp.

resource "aws_glue_job" "etl" {
  name     = "${local.name}"
  role_arn = "${module.crawler_role.role_arn}"

  command {
    script_location = "s3://abc/abc.py"
  }

  default_arguments = {
    "--job-language" = "python"
    "--database"     = "${local.name}"
    "--s3bucket"    = "${var.bucket_name}"
  }
}

resource "null_resource" "cluster" {
  depends_on = ["aws_glue_job.etl"]

  triggers = {
    time = "${timestamp()}"
  }

  provisioner "local-exec" {
    command = "aws glue update-job --job-name ${local.name} --job-update 'Role=${module.crawler_role.role_arn}, Command={ScriptLocation=s3://abc/abc.py,PythonVersion=3,Name=glueetl}, DefaultArguments={--job-language=python,--database=${local.name},--s3bucket=<bucket-name>}, Connections={Connections=[${local.name}]}, GlueVersion=1.0'"
  }
}

@ztane
Copy link

ztane commented Sep 12, 2019

@Vedant-R now it will be always run, 40 times for the 40 jobs, without any changes... :F

@Vedant-R
Copy link

@Vedant-R now it will be always run, 40 times for the 40 jobs, without any changes... :F

Yes, but it solves your purpose of not getting reset to python2.

@g-sree
Copy link

g-sree commented Sep 25, 2019

The python_version = "3" option is enabled in latest provider terraform-provider-aws_v2.29.0, however that did not modify "Spark version" on an existing job, because of which the job failed with the following error.

JobName:XXXXXXX and JobRunId:jr_XXXXXXXXX failed to execute with exception Unsupported pythonVersion 3 for given glueVersion 0.9

@vickymca2005
Copy link

The python_version = "3" option is enabled in latest provider terraform-provider-aws_v2.29.0, however that did not modify "Spark version" on an existing job, because of which the job failed with the following error.

JobName:XXXXXXX and JobRunId:jr_XXXXXXXXX failed to execute with exception Unsupported pythonVersion 3 for given glueVersion 0.9

I am also getting the same issue. What i am missing?

@g-sree
Copy link

g-sree commented Oct 2, 2019

The python_version = "3" option is enabled in latest provider terraform-provider-aws_v2.29.0, however that did not modify "Spark version" on an existing job, because of which the job failed with the following error.
JobName:XXXXXXX and JobRunId:jr_XXXXXXXXX failed to execute with exception Unsupported pythonVersion 3 for given glueVersion 0.9

I am also getting the same issue. What i am missing?

For time being I have deployed with python_version 3 and then from AWS console modified the job with glueVersion 1.. This fixed it.. However its good to have a fix from the provider

@RamaIndia
Copy link

This issue can be fixed through Cloud formation template. In cloud formation template we can directly declare Glue version and Python version. It will be very easy way and no need update aws provider.

{
"Description": "AWS Glue Job ",
"Resources": {
"GlueJob": {
"Type": "AWS::Glue::Job",
"Properties": {
"Command": {
"Name": "glueetl",
"ScriptLocation": "${script_location}",
"PythonVersion" : "3"
},
"DefaultArguments": {
"--job-language": "${job-language}",
"--TempDir" : "${TempDir}",
"--extra-jars" : "${extra-jars}"
},
"Name": "${Name}",
"Role": "${role_arn}",
"MaxCapacity" : 10,
"GlueVersion" : "1.0"
}
}
}
}

@ztane
Copy link

ztane commented Oct 9, 2019

@g-sree all updates using Terraform will always reset the Python version...

@g-sree
Copy link

g-sree commented Oct 9, 2019

updates using Terraform will always reset the Python version...

This is my terraform declaration ..

resource "aws_glue_job" "test_glue_job" {
  name     = "name"
  role_arn = "iam_role"

  command {
    script_location = "script"
    python_version  = 3
  }

  default_arguments = {
    ~~~~ truncated ~~~~
  }
}

I'm using python3.. however if you modify an existing job only python version changes and not the glue version which should be 1.0 . In this case the job will fail in the next run. I have manually updated the glue version from 0.9 to 1.0 in the AWS console. I never had a problem afterwards .

@mmell
Copy link
Contributor

mmell commented Oct 24, 2019

A cloudformation-based workaround: #8526 (comment)

resource "aws_cloudformation_stack" "network" {
  name = "${local.name}-glue-job"

  template_body = <<STACK
{
  "Resources" : {
    "MyJob": {
      "Type": "AWS::Glue::Job",
      "Properties": {
        "Command": {
          "Name": "glueetl",
          "ScriptLocation": "s3://${local.bucket_name}/jobs/${var.job}"
        },
        "ExecutionProperty": {
         "MaxConcurrentRuns": 2
        },
        "MaxRetries": 0,
        "Name": "${local.name}",
        "Role": "${var.role}"
      }
    }
  }
}
STACK
}

@bflad bflad closed this as completed in cb8453f Oct 30, 2019
@bflad
Copy link
Contributor

bflad commented Oct 30, 2019

Support for the new glue_version argument in the aws_glue_job resource has been merged and will release with version 2.34.0 of the Terraform AWS Provider, on Thursday. 👍

@bflad
Copy link
Contributor

bflad commented Oct 31, 2019

This has been released in version 2.34.0 of the Terraform AWS provider. Please see the Terraform documentation on provider versioning or reach out if you need any assistance upgrading.

@ghost
Copy link

ghost commented Dec 4, 2019

I'm going to lock this issue because it has been closed for 30 days ⏳. This helps our maintainers find and focus on the active issues.

If you feel this issue should be reopened, we encourage creating a new issue linking back to this one for added context. Thanks!

@ghost ghost locked and limited conversation to collaborators Dec 4, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
enhancement Requests to existing resources that expand the functionality or scope. service/glue Issues and PRs that pertain to the glue service.
Projects
None yet