Description
A couple years ago I opened an issue about how the cancellation signal was not being properly handled: spring-projects/spring-boot#18444
On it, we ended up implementing a solution where the cancellation signal would only be processed in case it was received without any onNext signal being emitted before:
@bclozel what's the current implementation behaviour when we are consuming the body as a Flux and we limit it with something like next() or take(n). I implemented something very similar on a custom ExchangeFilterFunction for logging and in those scenarios, I was logging false positives for timeouts as after receiving the requested elements a cancellation signal is sent upstream to stop the producer. I think the same might happen here with the current implementation.
The way I solved it was simply flagging somehow on the doOnNext() operations that we already received something, so if that happens I just ignore the cancellation signal assuming it's desired behaviour. Of course, there's the edge case where we are actually timing out on a Flux if a particular element takes too long, but I think it's an acceptable tradeoff given that we don't know the reason of the cancellation.
What do you think?
That was the way it was approached on Spring Boot 2.x:
@rubasace this feature is not about recording timeouts, but recording cancelled requests. This can happen for various operators like timeout or take, and in all cases we can't know whether this is the intended behavior or an error.
From a pure WebClient point of view, we're recording cancellations because the request did not complete and we cannot record tags with the response (status, execution time).
The workaround you're describing might work, but we'd still get "false negatives" if the client receives the response headers but no body, or only a partial body.
You're right that in this case we could still record metrics twice for the same request: once when receiving the response, and another time if a cancel event happens while reading the body.I think it's still a good compromise and I'll adapt these changes to only record cancellations if we did not receive any response at all.
With the new Observation API implementation, the onCancel signal is being processed without checking if the onNext signal has already being processed, creating the same problems I was describing on the old issue:
return responseMono.doOnNext(observationContext::setResponse)
.doOnError(observationContext::setError)
.doOnCancel(() -> {
observationContext.setAborted(true);
observation.stop();
})
.doOnTerminate(observation::stop);
I believe it would make sense to introduce a check on the onCancel()
as, right now, a WebClient call that receives a Flux and converts and calls a method like .next()
or take(n)
registers each call twice (one as SUCCESS and one as CLIENT_ERROR).
EDIT: sorry, missclicked when creating the issue, before adding any description or a proper title