Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenTelemetry: Quarkus GraalVM AWS lambda-rest- issue #26250

Closed
fazeem84 opened this issue Jun 21, 2022 · 31 comments
Closed

OpenTelemetry: Quarkus GraalVM AWS lambda-rest- issue #26250

fazeem84 opened this issue Jun 21, 2022 · 31 comments
Assignees
Labels
area/amazon-lambda kind/bug Something isn't working

Comments

@fazeem84
Copy link

fazeem84 commented Jun 21, 2022

Describe the bug

25708, the pull request raised against this issue solved open telemetry connection issue from my local environment ,but when I deployed it as lambda service ,it seems communication between the AWS lambda and open telemetry server(Elastic APM in my case) is not established

Expected behavior

AWS Lambda should be able to send the application performance data to OpenTelemetry server during lambda execution

Actual behavior

it seems communication between the AWS lambda and open telemetry server(Elastic APM in my case) is not established .and no data received in the Server side

How to Reproduce?

1 .Create Gradle project using the following gradle file

plugins {
    id 'java'
    id 'io.quarkus' version '2.10.0.CR1'
}

group 'org.example'
version '1.0-SNAPSHOT'

repositories {
    mavenCentral()
}

dependencies {
    implementation enforcedPlatform("io.quarkus:quarkus-bom:2.10.0.CR1")
    testImplementation 'org.junit.jupiter:junit-jupiter-api:5.8.1'
    testRuntimeOnly 'org.junit.jupiter:junit-jupiter-engine:5.8.1'
    implementation 'io.quarkus:quarkus-resteasy-reactive'
    implementation 'io.quarkus:quarkus-opentelemetry-exporter-otlp'
    implementation 'io.quarkus:quarkus-arc'
    implementation 'io.quarkus:quarkus-rest-client'
    implementation 'io.quarkus:quarkus-amazon-lambda-rest'
}

test {
    useJUnitPlatform()
}
  1. Create a resource class
package org.acme.opentelemetry;

import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;
import org.jboss.logging.Logger;

@Path("/hello")
public class TracedResource {

    private static final Logger LOG = Logger.getLogger(TracedResource.class);

    @GET
    @Produces(MediaType.TEXT_PLAIN)
    public String hello() {
        LOG.info("hello");
        return "hello";
    }
}
  1. In application property file add following properties
quarkus.opentelemetry.enabled=true
quarkus.opentelemetry.tracer.exporter.otlp.endpoint=END-POINT
quarkus.opentelemetry.tracer.exporter.otlp.headers=Header
  1. deploy it as AWS lambda
    6.Invoke lambda by exposing it through AWS API gateway
  2. Verify that AWS lambda sending data to Open telemetry server

Output of uname -a or ver

No response

Output of java -version

No response

GraalVM version (if different from Java)

No response

Quarkus version or git rev

No response

Build tool (ie. output of mvnw --version or gradlew --version)

No response

Additional information

No response

@fazeem84 fazeem84 added the kind/bug Something isn't working label Jun 21, 2022
@quarkus-bot
Copy link

quarkus-bot bot commented Jun 21, 2022

/cc @matejvasek, @patriot1burke

@fazeem84 fazeem84 changed the title Quarkus GraalVM AWS lambda-rest-Open Telemetry issue Open Telemetry Quarkus GraalVM AWS lambda-rest- issue Jun 22, 2022
@fazeem84 fazeem84 changed the title Open Telemetry Quarkus GraalVM AWS lambda-rest- issue OpenTelemetry: Quarkus GraalVM AWS lambda-rest- issue Jun 22, 2022
@patriot1burke
Copy link
Contributor

Is there any stack trace or warning? Is the telemetry server visible to the lambda? I don't know if you have to do any AWS config magic to allow a lambda to connect outside its process.

@RobDTech
Copy link

+1

@fazeem84
Copy link
Author

fazeem84 commented Sep 6, 2022

@brunobat Any update on this?
@patriot1burke
Please find the answers to your queries
Is there any stack trace or warning?
No Exception stack trace is logged into cloud watch
Is the telemetry server visible to the lambda?
yes, lambda is able to communicate with telemetry if we switch run time environment to NodeJS or Java

@RobDTech
Copy link

RobDTech commented Oct 7, 2022

Hi, is this still being looked at please? @brunobat

@brunobat
Copy link
Contributor

brunobat commented Oct 7, 2022

Not in the next couple of weeks, sorry @RobDTech

@RobDTech
Copy link

RobDTech commented Nov 5, 2022

Hi @brunobat any update on this one please? Apologies for hassling, we have a number of teams requesting this at the moment and it has been months.

@brunobat
Copy link
Contributor

brunobat commented Nov 7, 2022

Will pick it up next

@RobDTech
Copy link

RobDTech commented Nov 7, 2022

That's fantastic, many thanks

@brunobat brunobat self-assigned this Nov 7, 2022
@rtim75
Copy link

rtim75 commented Nov 8, 2022

I'm also experiencing this issue, but with another sample lambda: the one from Amazon Lambda guide. I did add the opentelemetry dependency to it using ./mvnw quarkus:add-extension -Dextensions='opentelemetry-otlp-exporter' command provided in the opentelemetry guide.

My application.properties:

quarkus.opentelemetry.enabled=true
quarkus.opentelemetry.tracer.exporter.otlp.endpoint=http://some-ip:4317
quarkus.log.level=DEBUG
quarkus.log.console.format=%d{HH:mm:ss} %-5p traceId=%X{traceId}, parentId=%X{parentId}, spanId=%X{spanId}, sampled=%X{sampled} [%c{2.}] (%t) %s%e%n

quarkus.lambda.handler=test

I added a couple of log messages to the lambda to see whether any spans, traces or something gets created. So far, nothing:

16:40:15 INFO  traceId=, parentId=, spanId=, sampled= [or.ac.TestLambda] (main) processing event
16:40:15 INFO  traceId=, parentId=, spanId=, sampled= [or.ac.ProcessingService] (main) Sending Ping Request to some-ip

and I don't see any traffic coming in on port 4317 of my agent. Whenever I run code provided by @fazeem84 locally, I can see traffic coming in on the port as well as traces showing up in our monitoring system.

@brunobat
Copy link
Contributor

brunobat commented Nov 8, 2022

@rtim75 @RobDTech @fazeem84 Have you guys tried this locally with sam local ... or deployed directly on amazon?

@fazeem84
Copy link
Author

fazeem84 commented Nov 8, 2022

@brunobat we deployed directly to aws

@RobDTech
Copy link

RobDTech commented Nov 8, 2022

Deployed directly

@brunobat
Copy link
Contributor

brunobat commented Nov 14, 2022

Have you guys added this property to the resource under sam.*.yaml?
Tracing: Active
According to: https://docs.aws.amazon.com/lambda/latest/operatorguide/trace-requests.html

@asimpson2004
Copy link

Yes I have tried enabling Active Tracing in Xray for the lambda (I have tried this with the Quarkus libraries without compilation so running as Java11)

@brunobat
Copy link
Contributor

brunobat commented Nov 14, 2022

Did you follow the AWS configuration instructions in here?
https://aws.amazon.com/blogs/opensource/auto-instrumenting-a-python-application-with-an-aws-distro-for-opentelemetry-lambda-layer/
Except from the python app part...

@asimpson2004
Copy link

asimpson2004 commented Nov 14, 2022

I did not use the aws ADOT layer as in that example, I followed this guide:
[https://quarkus.io/guides/opentelemetry]

where the instructions set up the same configuration as the AWS layer. (I was comparing them whilst doing it)

I have attached some logs:

removed logs

@brunobat
Copy link
Contributor

asimpson2004, the https://quarkus.io/guides/opentelemetry is a generic guide that assumes an OTLP collector will be available. It doesn't include the AWS specific configurations that you need to perform at your AWS account in order to activate tracing on your lambdas.

@asimpson2004
Copy link

@brunobat Thanks, I also set up the otlp collector and enabled active tracing in xray. looking through the guide you provided I don't think there is anything missed.

@asimpson2004
Copy link

asimpson2004 commented Nov 21, 2022

@brunobat I managed to spend some time on this and figured out what the problem is. The problem is not Quarkus specific, it is an opentelemetry / lambda issue that the lambda invocation ends and pauses before the BatchSpanProcessor has time to flush it's contents to the collector, therefore nothing is being sent to the collector.

The problem has been detailed here in point 2.1.

The above link describes the following solution:

The SDK side needs to call force_flush() at the end of Lambda instrumentation.

Is there any way to get hold of the SdkTracerProvider created by io.quarkus.opentelemetry.runtime.tracing.TracerRecorder so we can manually call forceFlush()?

Is there some other preferred way to force flush?

@asimpson2004
Copy link

Just to add, I've managed to prove that this is the issue by setting the following environment variables:

OTEL_BSP_MAX_QUEUE_SIZE = 0
OTEL_BSP_SCHEDULE_DELAY = 0

Setting these allows the BatchSpanProcessor to flush data occasionally before the lambda invocation ends but not 100% of the time.

@brunobat
Copy link
Contributor

We would need an implementation of a SimpleSpanProcessor. However, right now there is no way to wire that in Quarkus.
I also think think this is a bug:
https://github.com/open-telemetry/opentelemetry-java/blob/60cc5f452a3609ce748441c6ad75cac471b35bfd/exporters/otlp/all/src/main/java/io/opentelemetry/exporter/otlp/trace/OtlpGrpcSpanExporter.java#L66

The spans might be lost because the thread doing the export might not have started (it's scheduled) before the service shuts down.
We should force existing spans out before shutdown... or at least have a configuration for it.

@asimpson2004
Copy link

I can see in the Quarkus TracerRecorder this is set:
// Register shutdown tasks shutdownContext.addShutdownTask(() -> { tracerProvider.forceFlush(); tracerProvider.shutdown(); });

However, I'm unsure if this is invoked when the lambda pauses

@asimpson2004
Copy link

I have managed to get this working 100% of the time without setting the queue size or delay mentioned above.

Following the suggested force flush solution from here, I have created an interceptor that calls force flush on the LateBoundBatchSpanProcessor:

import io.opentelemetry.sdk.common.CompletableResultCode;
import io.quarkus.opentelemetry.exporter.otlp.runtime.LateBoundBatchSpanProcessor;
import org.eclipse.microprofile.config.inject.ConfigProperty;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import javax.enterprise.inject.Any;
import javax.enterprise.inject.spi.CDI;
import javax.interceptor.AroundInvoke;
import javax.interceptor.Interceptor;
import javax.interceptor.InvocationContext;

@OtelLambdaHandler
@Interceptor
public class OtelLambdaHandlerInterceptor {

    private static final Logger LOG = LoggerFactory.getLogger(OtelLambdaHandlerInterceptor.class);

    @ConfigProperty(name = "flushTimeoutMs")
    long flushTimeoutMs;

    @AroundInvoke
    public Object aroundInvoke(InvocationContext ic) throws Exception {

        LateBoundBatchSpanProcessor delayedProcessor = CDI.current().select(LateBoundBatchSpanProcessor.class, Any.Literal.INSTANCE).get();

        try {
            return ic.proceed();
        } finally {
            LOG.info("forcing flush");
            CompletableResultCode completableResultCode = delayedProcessor.forceFlush();

            long startTime = System.currentTimeMillis();

            while(!completableResultCode.isDone()) {
                if(System.currentTimeMillis()-startTime >= flushTimeoutMs) {
                    LOG.info("flush timed out after {} ms", flushTimeoutMs);
                    break;
                }
            }

            if(completableResultCode.isSuccess()){
                LOG.info("flush succeeded");
            } else {
                LOG.info("flush failed");
            }
        }
    }
}

@asimpson2004
Copy link

@brunobat I agree the proper solution would be the ability to use SimpleSpanProcessor so that we don't need to call force flush :)

@brunobat
Copy link
Contributor

brunobat commented Nov 23, 2022

Have you added an issue in the OTel side? I think this missing flush should be reported as a bug.

@brunobat
Copy link
Contributor

Added an issue on the Quarkus side to support SimpleSpanProcessor... And others, effectively.
Probably we should close this issue. Do you agree @asimpson2004 ?

@asimpson2004
Copy link

@brunobat Agree this can be closed

agree this does seem like a bug as it is getting batched up and this is what the BatchSpanProcessor is using, I've not opened a report.

@brunobat brunobat closed this as not planned Won't fix, can't repro, duplicate, stale Nov 23, 2022
@bunkerskyi
Copy link

@asimpson2004 Could you please provide extended example of usage of your interceptor for flushing and your lambda setup?

I'm set up lambda from Quarkus docs (https://quarkus.io/guides/amazon-lambda), added OTeL for Quarkus (https://quarkus.io/guides/opentelemetry) and implemented this interceptor, but lambda still don't send any data.

@asimpson2004
Copy link

asimpson2004 commented Nov 30, 2022

@KriBetko

Collector setup yaml:

receivers:
  otlp:
    protocols:
      http:
      grpc:

exporters:
  logging:
    loglevel: debug
  otlp/elastic:
    # APM server https endpoint without https:// but including the port.
     endpoint: "url-here:443"
    headers:
      # APM Server secret token
      Authorization: "Bearer token-here"

service:
  pipelines:
    traces:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]
    metrics:
      receivers: [otlp]
      exporters: [logging, otlp/elastic]

Environment vars:

QUARKUS_OPENTELEMETRY_TRACER_EXPORTER_OTLP_ENDPOINT  http://localhost:4317
QUARKUS_OPENTELEMETRY_TRACER_RESOURCE_ATTRIBUTES  deployment.environment=dev
OPENTELEMETRY_COLLECTOR_CONFIG_FILE  /var/task/otel-collector-config.yaml

Lambda Layer:
arn:aws:lambda:eu-west-1:901920570463:layer:aws-otel-collector-amd64-ver-0-62-1:1

Extra libraries build.gradle:

implementation 'io.quarkus:quarkus-opentelemetry-exporter-otlp'
implementation 'io.quarkus:quarkus-rest-client'
implementation 'io.opentelemetry:opentelemetry-extension-aws'
implementation 'org.jboss.slf4j:slf4j-jboss-logmanager'

Interceptor annotation:

@InterceptorBinding
@Target( { METHOD, TYPE } )
@Retention( RUNTIME )
public @interface OtelLambdaHandler {
}

Then annotate lambda handler with @OtelLambdaHandler

You need to make sure that the yaml file is included in the root of the zip that is uploaded to the lambda for this to be picked up using the above.

You should see the collector booting up in the lambda logs and then at the end of the lambda you should see the collector log the spans.

@asimpson2004
Copy link

The above also doesn't create any spans, you will need to generate a span to be sent

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/amazon-lambda kind/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

7 participants