Description
What version of Go are you using (go version
)?
$ go version go version go1.19.2 linux/amd64
Does this issue reproduce with the latest release?
Yes
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="" GOENV="" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="" GOPRIVATE="" GOPROXY="" GOROOT="/usr/lib/go" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.19.2" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/tmp/bug/go.mod" GOWORK="" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build3528461531=/tmp/go-build -gno-record-gcc-switches"
What did you do?
In a proprietary HTTP + gRPC reverse proxy, when issuing unary gRPC calls, I observe intermittent occurrences of the error:
rpc error: code = Internal desc = server closed the stream without sending trailers
The occurrence of this error is not deterministically reproducible, and affects only a small percentage of requests. Usually, a client retry of the RPC alleviates the problem.
See the investigation notes after the survey questions in this issue.
What did you expect to see?
I expect to see no occurrences of this error under regular operation.
What did you see instead?
I see this error affecting 1 - 5% of requests.
Context
I'm working with a proprietary HTTP reverse proxy with built-in support for gRPC over HTTP/2.
Example
The proxy is proprietary, but its core logic is demonstrated below.
package main
import (
"crypto/tls"
"net"
"net/http"
"net/http/httputil"
"golang.org/x/net/http2"
"golang.org/x/net/http2/h2c"
)
func main() {
tr := &http2.Transport{
AllowHTTP: true,
DialTLS: func(network string, addr string, cfg *tls.Config) (net.Conn, error) {
return net.Dial("tcp", "localhost:7001")
},
}
rp := &httputil.ReverseProxy{
Transport: tr,
FlushInterval: -1,
Director: func(req *http.Request) {
req.URL.Scheme = "http"
},
}
handler := h2c.NewHandler(rp, &http2.Server{})
srv := &http.Server{
Addr: "localhost:7000",
Handler: handler,
}
srv.ListenAndServe()
}
Symptom
Clients issuing gRPC calls through the proxy that return gRPC application-level errors intermittently (non-determinstically) observe errors from the grpc-go library server closed the stream without sending trailers
.
GODEBUG=http2debug=2
reveals that the issue manifests only when Go's http2.Server
writes a HEADERS
frame with flag END_HEADERS
followed by a zero-byte DATA
frame with flag END_STREAM
.
The issue does not manifest (i.e. the application-level error is propagated correctly) when Go's http2.Server
writes a HEADERS
frame with flags END_HEADERS | END_STREAM
.
Note that there are no trailers included in this message.
Example trace with no errors (RPC returns successfully)
2022/10/18 00:12:57 http2: Transport encoding header ":method" = "POST"
2022/10/18 00:12:57 http2: Transport encoding header ":path" = "/service/Method"
2022/10/18 00:12:57 http2: Transport encoding header ":scheme" = "http"
2022/10/18 00:12:57 http2: Transport encoding header "te" = "trailers"
2022/10/18 00:12:57 http2: Transport encoding header "grpc-timeout" = "998862u"
2022/10/18 00:12:57 http2: Transport encoding header "content-type" = "application/grpc"
2022/10/18 00:12:57 http2: Transport encoding header "user-agent" = "grpc-go/1.49.0"
2022/10/18 00:12:57 http2: Transport encoding header "x-forwarded-for" = "127.0.0.1"
2022/10/18 00:12:57 http2: Transport encoding header "accept-encoding" = "gzip"
2022/10/18 00:12:57 http2: Framer 0xc000366000: wrote HEADERS flags=END_HEADERS stream=7 len=17
2022/10/18 00:12:57 http2: Framer 0xc000366000: wrote DATA stream=7 len=12 data="\x00\x00\x00\x00\a\n\x05\b\x02\x12\x01a"
2022/10/18 00:12:57 http2: Framer 0xc000366000: wrote DATA flags=END_STREAM stream=7 len=0 data=""
2022/10/18 00:12:57 http2: Framer 0xc000366000: read WINDOW_UPDATE len=4 (conn) incr=12
2022/10/18 00:12:57 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=12
2022/10/18 00:12:57 http2: Framer 0xc000366000: read PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:57 http2: Transport received PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:57 http2: Framer 0xc000366000: wrote PING flags=ACK len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:57 http2: Framer 0xc000366000: read HEADERS flags=END_STREAM|END_HEADERS stream=7 len=4
2022/10/18 00:12:57 http2: decoded hpack field header field ":status" = "200"
2022/10/18 00:12:57 http2: decoded hpack field header field "content-type" = "application/grpc"
2022/10/18 00:12:57 http2: decoded hpack field header field "grpc-status" = "5"
2022/10/18 00:12:57 http2: decoded hpack field header field "grpc-message" = "open /tmp/a: no such file or directory"
2022/10/18 00:12:57 http2: Transport received HEADERS flags=END_STREAM|END_HEADERS stream=7 len=4
Example trace with error (internal error raised by grpc-go)
2022/10/18 00:12:55 http2: Transport encoding header ":method" = "POST"
2022/10/18 00:12:55 http2: Transport encoding header ":path" = "/service/Method"
2022/10/18 00:12:55 http2: Transport encoding header ":scheme" = "http"
2022/10/18 00:12:55 http2: Transport encoding header "grpc-timeout" = "998585u"
2022/10/18 00:12:55 http2: Transport encoding header "content-type" = "application/grpc"
2022/10/18 00:12:55 http2: Transport encoding header "user-agent" = "grpc-go/1.49.0"
2022/10/18 00:12:55 http2: Transport encoding header "te" = "trailers"
2022/10/18 00:12:55 http2: Transport encoding header "x-forwarded-for" = "127.0.0.1"
2022/10/18 00:12:55 http2: Transport encoding header "accept-encoding" = "gzip"
2022/10/18 00:12:55 http2: Framer 0xc000366000: wrote HEADERS flags=END_HEADERS stream=5 len=18
2022/10/18 00:12:55 http2: Framer 0xc000366000: wrote DATA stream=5 len=12 data="\x00\x00\x00\x00\a\n\x05\b\x02\x12\x01a"
2022/10/18 00:12:55 http2: Framer 0xc000366000: wrote DATA flags=END_STREAM stream=5 len=0 data=""
2022/10/18 00:12:55 http2: Framer 0xc000366000: read WINDOW_UPDATE len=4 (conn) incr=12
2022/10/18 00:12:55 http2: Transport received WINDOW_UPDATE len=4 (conn) incr=12
2022/10/18 00:12:55 http2: Framer 0xc000366000: read PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:55 http2: Transport received PING len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:55 http2: Framer 0xc000366000: wrote PING flags=ACK len=8 ping="\x02\x04\x10\x10\t\x0e\a\a"
2022/10/18 00:12:55 http2: Framer 0xc000366000: read HEADERS flags=END_STREAM|END_HEADERS stream=5 len=4
2022/10/18 00:12:55 http2: decoded hpack field header field ":status" = "200"
2022/10/18 00:12:55 http2: decoded hpack field header field "content-type" = "application/grpc"
2022/10/18 00:12:55 http2: decoded hpack field header field "grpc-status" = "5"
2022/10/18 00:12:55 http2: decoded hpack field header field "grpc-message" = "open /tmp/a: no such file or directory"
2022/10/18 00:12:55 http2: Transport received HEADERS flags=END_STREAM|END_HEADERS stream=5 len=4
2022/10/18 00:12:55 http2: server encoding header ":status" = "200"
2022/10/18 00:12:55 http2: server encoding header "content-type" = "application/grpc"
2022/10/18 00:12:55 http2: server encoding header "grpc-message" = "open /tmp/a: no such file or directory"
2022/10/18 00:12:55 http2: server encoding header "grpc-status" = "5"
2022/10/18 00:12:55 http2: server encoding header "date" = "Tue, 18 Oct 2022 07:12:55 GMT"
2022/10/18 00:12:55 http2: Framer 0xc0003900e0: wrote HEADERS flags=END_HEADERS stream=1 len=89
2022/10/18 00:12:55 http2: Framer 0xc0003900e0: wrote DATA flags=END_STREAM stream=1 len=0 data=""
RCA
I believe this is due to a race caused by concurrent http.Handler
execution in http2/server.go
.
In the case that handler execution completes before headers are written, rws.handlerDone
is true and Go includes END_STREAM
in the initial HEADERS
frame. In the case that handler execution is still in-progress when the first write occurs, the HEADERS
frame is written without END_STREAM
, and a subsequent write sends a zero-byte data frame with END_STREAM
, acting purely as a control message.
Ultimately this causes non-determinism where the specific scenario that unary gRPC methods that return errors quickly are disproportionately affected.
According to gRPC specification, END_STREAM
should be included in the last HEADERS
frame to indicate termination of the response. In grpc-go, encountering END_STREAM
in a data frame is an explicit error case. However, HTTP/2 protocol specification itself doesn't prohibit this.
Proposal
A similar (identical?) issue was identified in nghttp2 (see: nghttp2/nghttp2#588). The submitted fix was to include END_STREAM
in the HEADERS
payload if the body is empty and no trailers exist. I'm not sure if a similar approach is feasible in http2.Server
.