Skip to content

net: Resolver's lookup function behavior is oblivious to a custom Dial function #54244

Closed as not planned
@th0m

Description

@th0m

What version of Go are you using (go version)?

$ go version
go version go1.18.2 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/vagrant/.cache/go-build"
GOENV="/home/vagrant/.config/go/env"
GOEXE=""
GOEXPERIMENT=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/vagrant/go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/vagrant/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/local/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/local/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.18.2"
GCCGO="gccgo"
GOAMD64="v1"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
GOWORK=""
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build1847147377=/tmp/go-build -gno-record-gcc-switches"

What did you do?

Create a Resolver, override its Dial function to set a fixed resolver ip to be used: 1.2.3.4:53.
Set resolv.conf to be

server 1.1.1.1
server 8.8.8.8
server 8.8.4.4

Use it to resolve cloudflare.com NS.

Source code

package main

import (
	"context"
	"fmt"
	"net"
	"time"
)

func main() {
	r := &net.Resolver{
		PreferGo: true,
		Dial: func(ctx context.Context, network, address string) (net.Conn, error) {
			d := net.Dialer{
				Timeout: time.Duration(3 * time.Second),
			}
			return d.DialContext(ctx, "udp", "1.2.3.4:53")
		},
	}
	_, err := r.LookupNS(context.Background(), "cloudflare.com")
	if err != nil {
		fmt.Println(fmt.Sprintf("%v", err))
	}
}

I have noticed the lookup function loops over all resolvers in resolv.conf despite the Dial function having been overridden.
This leads to the number of retries to be correlated to the number of entries in resolv.conf without actually using these resolv.conf servers since the Dial function is overridden.

What did you expect to see?

I would have expected the (r *Resolver) lookup function to not fetch resolv.conf configuration when the Dial function has been overridden with a custom server.

Also I would have expected the number of servers in resolv.conf to not influence the number of retries.

What did you see instead?

With 3 resolvers in resolv.conf, I am seeing 6 retries total (3 * 2)

21:25:00.403789 IP 10.0.2.15.38464 > 1.2.3.4.53: 16792+ NS? cloudflare.com. (32)
21:25:05.404578 IP 10.0.2.15.36711 > 1.2.3.4.53: 53187+ NS? cloudflare.com. (32)
21:25:10.410653 IP 10.0.2.15.48693 > 1.2.3.4.53: 10620+ NS? cloudflare.com. (32)
21:25:15.418829 IP 10.0.2.15.42359 > 1.2.3.4.53: 44619+ NS? cloudflare.com. (32)
21:25:20.425065 IP 10.0.2.15.39995 > 1.2.3.4.53: 9573+ NS? cloudflare.com. (32)
21:25:25.431093 IP 10.0.2.15.41456 > 1.2.3.4.53: 36792+ NS? cloudflare.com. (32)

With 2 resolvers in resolv.conf (I deleted server 1.1.1.1), I am seeing 4 retries total (2 *2)

21:26:22.121101 IP 10.0.2.15.48327 > 1.2.3.4.53: 57107+ NS? cloudflare.com. (32)
21:26:27.126828 IP 10.0.2.15.42868 > 1.2.3.4.53: 61507+ NS? cloudflare.com. (32)
21:26:32.132852 IP 10.0.2.15.39008 > 1.2.3.4.53: 2860+ NS? cloudflare.com. (32)
21:26:37.145549 IP 10.0.2.15.51778 > 1.2.3.4.53: 2160+ NS? cloudflare.com. (32

In both cases the program completes with the following error that contains the wrong server (8.8.4.4:53 isn't actually used, see #43703)

lookup cloudflare.com on 8.8.4.4:53: read udp 10.0.2.15:41456->1.2.3.4:53: i/o timeout

Feel free to let me know if you need any other information, thank you!

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions