Skip to content

Observability Support for SQS #1369

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

tomazfernandes
Copy link
Contributor

@tomazfernandes tomazfernandes commented Apr 6, 2025

This PR Introduces Micrometer Observability Support for Spring Cloud AWS SQS.

Features

  • Automatic span creation for sending and receiving messages
  • Automatic scope opening for non-async components
  • Support for manual scope opening for async components
  • Extensible / replaceable Observation Conventions for sending and receiving messages
  • Extension point for supporting custom Thread Local operations for non-async components

Batch sending and receiving of messages is not supported to avoid increasing this PR further.

Available Observations

Spring Cloud AWS SQS provides the following observations:

  1. spring.aws.sqs.template - Records SQS operations performed through the SqsTemplate
  2. spring.aws.sqs.listener - Records message processing through the SqsMessageListenerContainer

Both observations include the following common tags:

Low cardinality tags:

  • messaging.system: "sqs"
  • messaging.operation: "publish" (template) or "receive" (listener)
  • messaging.destination.name or messaging.source.name: The queue name
  • messaging.destination.kind or messaging.source.kind: "queue"

High cardinality tags:

  • messaging.message.id: The SQS message ID from AWS.

For FIFO queues, the following additional high cardinality tags are included:

  • messaging.message.message-group.id: The message group ID
  • messaging.message.message-deduplication.id: The message deduplication ID

Resolves #1367

@github-actions github-actions bot added component: sqs SQS integration related issue type: documentation Documentation or Samples related issue labels Apr 6, 2025
@MatejNedic
Copy link
Member

Awesome!!! Can't wait to take a look tomorrow 🥳🥳🥳🥳

@jkuipers
Copy link
Contributor

jkuipers commented Apr 9, 2025

I just installed a local build and tested it with my services where I removed my own extensions for trace propagations: after setting spring.cloud.aws.sqs.observation-enabled to true I see that my trace is correctly propagated to my receiving service and restored, so that it ends up in the logging and can be correlated.
Very nice! I see that my received SQS message has a traceparent MessageAttribute, as expected.

In my own code, which still uses Brave directly instead of the Micrometer APIs, I also ensured that baggage wouldn't be blindly propagated as there's a hard limit of 10 on the nr of MessageAttributes that you can have in one message: I haven't checked the code of this PR yet to to see if something similar is done (or needed) here, will try that as well.

@jkuipers
Copy link
Contributor

jkuipers commented Apr 9, 2025

I just configured the following property:

management.tracing.baggage.remote-fields=X1,X2,X3,X4,X5,X6,X7,X8,X9

I then sent an HTTP request to the service that triggers the sending of an SQS message in response, with headers X1 up to X9.
As I feared, this results in an exception:

io.awspring.cloud.sqs.operations.MessagingOperationFailedException: Message send operation failed for message 1cba106b-ba6c-f70f-b865-39376a5a0aa5 to endpoint marketing-system_mailRequests
	at io.awspring.cloud.sqs.operations.AbstractMessagingTemplate.lambda$sendAsync$11(AbstractMessagingTemplate.java:309)
...
Caused by: java.util.concurrent.CompletionException: software.amazon.awssdk.services.sqs.model.InvalidAttributeValueException: Number of message attributes [13] exceeds the allowed maximum [10]. (Service: Sqs, Status Code: 400, Request ID: null)

There is very limited space for message attributes in SQS (and application code might use some as well), so I believe that by default the framework should NOT respect the baggage remote-fields configuration: it should limit itself to the traceparent and baggage attributes. Those two plus the JavaType and contentType already take up four of the available slots.
A dedicated configuration option could be added to include certain baggage remote-fields as-is, but I think it should be opt-in and come with a warning that this may cause applications to fail when code adds its own message attributes as well.

@jkuipers
Copy link
Contributor

jkuipers commented Apr 9, 2025

I also verified that this solution ensures that the trace context is still active when you implement an ErrorHandler or MessageInterceptor: this is indeed the case, which is excellent! It would be quite problematic if error logging couldn't be correlated to the message processing. My own "hack" to have trace propagation suffers from this shortcoming, so my error logging had to be implemented elsewhere.
Nice job man, for sync single-message processing everything seems to work as expected!

@tomazfernandes
Copy link
Contributor Author

tomazfernandes commented Apr 10, 2025

Thanks a lot for looking into this @jkuipers!

There is very limited space for message attributes in SQS (and application code might use some as well), so I believe that by default the framework should NOT respect the baggage remote-fields configuration: it should limit itself to the traceparent and baggage attributes. Those two plus the JavaType and contentType already take up four of the available slots.
A dedicated configuration option could be added to include certain baggage remote-fields as-is, but I think it should be opt-in and come with a warning that this may cause applications to fail when code adds its own message attributes as well.

Good call, that makes sense to me. I think for the scope of this PR we can remove baggage remote-fields propagation in messages, then in a separate issue / PR we can discuss how to include them as opt-in.

I wonder if @jonatan-ivanov has an opinion on this.

Also, any suggestions on how to configure baggage propagation this way are welcome.

Thanks!

@tomazfernandes
Copy link
Contributor Author

Hey @jkuipers, to your suggestion, I've limited the propagated attributes to traceparent, tracestate and baggage.

If you can, please give it another go and let me know how that works for you.

Thanks!

@jkuipers
Copy link
Contributor

Hey @jkuipers, to your suggestion, I've limited the propagated attributes to traceparent, tracestate and baggage.

If you can, please give it another go and let me know how that works for you.

Thanks!

Looks good! This is similar to what I'm currently doing in my own code. I am only wondering if anyone might still configure their Spring Boot 3.x app to use b3 headers rather than the W3C Trace Context traceparent header; just in case you could whitelist b3 as a key as well. No one who wants to use SQS with trace propagation should configure Spring Boot to use the individual B3 headers that are called X-B3-<something> as that would use all available attributes very quickly, so that's not something that needs to supported I'd say.

@tomazfernandes
Copy link
Contributor Author

just in case you could whitelist b3 as a key as well

@jkuipers good call, we should support B3 as it's supported by Micrometer - just pushed that change.

No one who wants to use SQS with trace propagation should configure Spring Boot to use the individual B3 headers that are called X-B3- as that would use all available attributes very quickly, so that's not something that needs to supported I'd say.

Agreed, we can always add opt-in support later if users request, but for default behavior the current approach seems sensible and should work well for the majority of users.

Anything else you think is relevant? I'm aiming to merge this PR next week or so if there's nothing else.

Super thanks again for looking into this!

@jkuipers
Copy link
Contributor

jkuipers commented May 3, 2025

Anything else you think is relevant? I'm aiming to merge this PR next week or so if there's nothing else.

This covers everything that I thought of when building my own (simpler) trace support, so as far as I’m concerned this is ready to merge!

@MatejNedic
Copy link
Member

Code looks good to me. @tomazfernandes when you want merge :)

@tomazfernandes
Copy link
Contributor Author

Thanks a lot @jkuipers and @MatejNedic!

This was a tough one to pull off given the async nature of the integration, I'm happy with the result 🙌🏼

I'll keep the PR open for a few more days, and hopefully merge it at the end of the next week or so.

Then there are a few Observability and ThreadLocal related issues I'll open after that - more to come!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
component: sqs SQS integration related issue type: documentation Documentation or Samples related issue
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add Micrometer Observability Support for SQS
3 participants