Skip to content

x/net/idna: add pre-defined profile for non-transitional IDNA2008 #47510

Closed
@favonia

Description

@favonia

What version of Go are you using (go version)?

$ go version
go version go1.16.6 linux/amd64

Does this issue reproduce with the latest release?

Yes

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

GO111MODULE=""
GOARCH="amd64"
GOBIN=""
GOCACHE="/home//.cache/go-build"
GOENV="/home/
/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home//go/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/
/go"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/usr/lib/go"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/usr/lib/go/pkg/tool/linux_amd64"
GOVCS=""
GOVERSION="go1.16.6"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build***=/tmp/go-build -gno-record-gcc-switches"

What did you do?

https://play.golang.org/p/zSGcE1no8yN

package main

import (
	"fmt"

	"golang.org/x/net/idna"
)

func main() {
	eszett, _ := idna.Lookup.ToASCII("ß")
	fmt.Printf("ß => %s", eszett)
}

What did you expect to see?

I wish that ß was kept as ß as in Firefox and Safari.

ß => ß

I am currently using the idna package to develop DNS-related tools (https://github.com/favonia/cloudflare-ddns), and I hope there is a predefined profile that uses non-transitional IDNA2008 processing. Despite the warnings that the actual Lookup profile could evolve over time, I understand that current software could rely on its specific behavior. Therefore, I propose adding a new profile IDNA2008Lookup or NontransitionalLookup (or some other good name---I am not a native English speaker anyways) as follows:

IDNA2008Lookup = &Profile{options{
	transitional: false,
	useSTD3Rules: true,
	checkHyphens: true,
	checkJoiners: true,
	trie:         trie,
	fromPuny:     validateFromPunycode,
	mapping:      validateAndMap,
	bidirule:     bidirule.ValidString,
}}

Alternatively, the idna can provide new methods to create new profiles from the existing ones. It would then be trivial to create such a profile from existing Lookup profile (especially after the bugfix https://go-review.googlesource.com/c/text/+/317729/ is merged). Here is the pseudocode demonstrating the idea:

func main() {
	eszett, _ := idna.Lookup.Derive(idna.Transitional(false)).ToASCII("ß")
	fmt.Printf("ß => %s", eszett)
}

What did you see instead?

ß is mapped to ss, as in Chrome/Chromium.

ß => ss

The Lookup profile currently implements what's called transitional processing in Unicode TS #46. It is the current behavior of Chrome/Chromium, though there is an open issue discussing a possible switch.

Related Issues

This is closely related to #46001, which proposed to change the default behavior of net/http. I like #46001 very much, but chose to propose something more conservative in case there's any objection to #46001. This proposal merely adds a new helpful definition to facilitate non-transitional processing. Whether #46001 should be accepted or in general whether the standard library and other libraries should change their behaviors is out of the scope of this issue, though those changes would probably benefit from this proposal. I strongly believe the Go language should make it easy to write software using non-transitional IDNA2008 processing even if the standard library sticks to the current behavior.

Further Justification

It is true that one can already define desirable profiles by carefully combining the correct set of options. I found the construction very unintuitive and strongly prefer a predefined profile ready to use.

Implementation Details and Auxiliary Changes

Introducing a new profile should be trivial. In fact, the above quoted code is almost all the necessary changes (modulo documentation and testing). Relatedly, the comment It is used by most browsers when resolving domain names. for Transitional is no longer accurate and should also be changed, in my opinion.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions