Skip to content

runtime: async signals not reliably delivered to Go threads under TSAN #18717

Closed
@bcmills

Description

@bcmills
go version devel +f4be767501 Wed Jan 18 16:21:28 2017 -0500 linux/amd64

The fix for #17753 causes signal delivery in Go programs running under TSAN to pass through the TSAN interceptors.

Unfortunately, it appears that the TSAN interceptors defer delivery of asynchronous signals until the next blocking libc call on the same thread. This interacts poorly with the Go runtime's use of direct 'futex' syscalls to park idle threads: if the thread receiving the signal happens to be one that the runtime is about to park, it may never make a libc call and the signal will be lost.

A simple program illustrating this behavior:

src/tsansig/tsansig.go:

package main

import (
	"os"
	"os/signal"
	"syscall"
)

/*
#cgo CFLAGS: -g -fsanitize=thread
#cgo LDFLAGS: -g -fsanitize=thread
*/
import "C"

func main() {
	c := make(chan os.Signal, 1)
	signal.Notify(c, syscall.SIGUSR1)
	defer signal.Stop(c)
	syscall.Kill(syscall.Getpid(), syscall.SIGUSR1)
	<-c
}
$ go build tsansig
$ ./tsansig

It should exit with status 0 almost immediately. Instead, it hangs forever.

One might expect that injecting a blocking libc call would help:

package main

import (
	"os"
	"os/signal"
	"syscall"
	"time"
)

/*
#cgo CFLAGS: -g -fsanitize=thread
#cgo LDFLAGS: -g -fsanitize=thread

#include <stddef.h>
#include <time.h>
*/
import "C"

func nanosleep(d time.Duration) error {
	req := C.struct_timespec{
		tv_sec:  C.__time_t(d / time.Second),
		tv_nsec: C.__syscall_slong_t(d % time.Second),
	}
	var rem C.struct_timespec
	for {
		switch _, err := C.nanosleep(&req, &rem); err {
		case nil:
			return nil
		case syscall.EINTR:
			req = rem
			continue
		default:
			return err
		}
	}
}

func main() {
	c := make(chan os.Signal, 1)
	signal.Notify(c, syscall.SIGUSR1)
	defer signal.Stop(c)
	syscall.Kill(syscall.Getpid(), syscall.SIGUSR1)

	for {
		select {
		case <-c:
			return
		default:
			if err := nanosleep(10 * time.Nanosecond); err != nil {
				panic(err)
			}
		}
	}
}

But that attempt at a workaround does not succeed: the thread that receives the signal happens not to be the one the runtime uses for the subsequent cgo calls.

Impact on Go 1.8

This is (unfortunately) a significant regression over Go 1.7. In 1.7, Go programs without significant C signal behavior would exhibit correct Go signal behavior. In the current 1.8 RC, the C signal behavior in the presence of the Go runtime is improved, but the Go signal behavior is broken even for programs with no interesting C signal activity.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions