Skip to content

pkg/cli/image/extract: disable pigz to prevent race condition #104

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Oct 8, 2019

Conversation

stbenjam
Copy link
Member

There is a race condition in the vendored version of docker code that
can cause image extraction to panic. When the pigz package is
installed, docker's DecompressStream code prefers to use it, however,
that code can return the io buffer to the pool while the command is
still writing to it. If the buffer is reused while that's happening,
oc will panic. There are many reports of this happening.

As the vendored docker comes from kubectl, and the version there (even
for k8s 1.16) is quite old, this disables using pigz at all by setting
MOBY_DISABLE_PIGZ environment variable.

fixes #58

@openshift-ci-robot openshift-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Sep 23, 2019
There is a race condition in the vendored version of docker code that
can cause image extraction to panic.  When the `pigz` package is
installed, docker's DecompressStream code prefers to use it, however,
that code can return the io buffer to the pool while the command is
still writing to it. If the buffer is reused while that's happening,
`oc` will panic.  There are many reports of this happening.

As the vendored docker comes from kubectl, and the version there (even
for k8s 1.16) is quite old, this disables using pigz at all by setting
MOBY_DISABLE_PIGZ environment variable.
Copy link
Contributor

@soltysh soltysh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@openshift-ci-robot openshift-ci-robot added the lgtm Indicates that a PR is ready to be merged. label Oct 7, 2019
@soltysh
Copy link
Contributor

soltysh commented Oct 7, 2019

/retest

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: soltysh, stbenjam

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-ci-robot openshift-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Oct 7, 2019
@stbenjam
Copy link
Member Author

stbenjam commented Oct 8, 2019

/test e2e-aws

@openshift-merge-robot openshift-merge-robot merged commit b8f5b21 into openshift:master Oct 8, 2019
@@ -483,6 +483,10 @@ func (o *Options) Run() error {
}

func layerByEntry(r io.Reader, options *archive.TarOptions, layerInfo LayerInfo, fn TarEntryFunc, allLayers bool, alreadySeen map[string]struct{}) (bool, error) {
// Prevents race condition present in vendored version of docker
// https://github.com/moby/moby/issues/39859
os.Setenv("MOBY_DISABLE_PIGZ", "true")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This may introduce new race conditions; changing the environment in a multi-threaded process can create undefined behavior; see:

rust-lang/rust#24741

Copy link
Member Author

@stbenjam stbenjam Oct 18, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The situation is certainly better with this fix that it was before this, the panics were unbearable. Would a mutex around this be sufficient to mitigate the risk or is that not sufficient? I think with pigz disabled we’re not shelling out for anything

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a quick look, golang has a mutex around os.{G,S}etenv - so if there's no C threads running, it's generally OK.

But, it'd be safer to hoist this to early in main().

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, it's also more efficient, no need to keep calling it. The only reason I did it in layerByEntry was so it was close to the image extraction code, I was worried if I drop this in main it'll get forgotten and once we have a newer docker, we'll lose out on the performance improvements of using pigz (not that I've benchmarked it to know that it's actually better).

Anyway, I proposed #130

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. lgtm Indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

"panic: runtime error: slice bounds out of range" while "oc adm release extract"
5 participants