Skip to content

Regression: 3.12.y kernels are prone to filesystem dirty page count accounting failures #617

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
P33M opened this issue Jun 14, 2014 · 4 comments

Comments

@P33M
Copy link
Contributor

P33M commented Jun 14, 2014

Following on from an email conversation where the issue was first found:

On 3.12.y kernels, it is possible to break the filesystem dirty page accounting (making it wrap around to 2^32-1 kB of dirty pages) which has the following effects:

  • Writes to the filesystem slow to a crawl
  • The ACT LED is almost permanently illuminated during IO
  • All processes waiting on IO spend a lot of the time sleeping, waiting for IO to complete.

The reason that this occurs is that an interrupt- or preempt-unsafe double decrement on the global number of dirty filesystem pages causes the counter to wrap to an extremely large value, above the value where the kernel will start to flush dirtied filesystem pages out to disk. In effect, every file write operation will immediately trigger a write of the affected page to the physical media on which the filesystem is based: SD card or USB device. For certain workloads this has much of the same symptoms as "swap hell".

A bisect on the git tree between 3.11 and 3.12.21 produced two commits (yet to be bisected):

commit 36bc08cc01709b4a9bb563b35aa530241ddc63e3
Author: Gu Zheng <[email protected]>
Date:   Tue Jul 16 17:56:16 2013 +0800

    fs/aio: Add support to aio ring pages migration

    As the aio job will pin the ring pages, that will lead to mem migrated
    failed. In order to fix this problem we use an anon inode to manage the aio
    pages, and  setup the migratepage callback in the anon inode's address space
    that when mem migrating the aio ring pages will be moved to other mem node s

    Signed-off-by: Gu Zheng <[email protected]>
    Signed-off-by: Benjamin LaHaise <[email protected]>

commit 55708698c5f153f4e390175cdfc395333b2eafbd
Author: Gu Zheng <[email protected]>
Date:   Tue Jul 16 17:56:12 2013 +0800

    fs/anon_inode: Introduce a new lib function anon_inode_getfile_private()

    Introduce a new lib function anon_inode_getfile_private(), it creates a new
    instance by hooking it up to an anonymous inode, and a dentry that describe
    "class" of the file, similar to anon_inode_getfile(), but each file holds a
    single inode. Furthermore, anyone who wants to create a private anon file wi
    benefit from this change.

    Signed-off-by: Gu Zheng <[email protected]>
    Signed-off-by: Benjamin LaHaise <[email protected]>

Either of these could be culprit for allowing the page counter to wrap. Testing required on each in turn.

The canonical test is to stop and restart the mysqld service. If nr_dirty in /proc/meminfo turns into ~4 billion kB instead of a few hundred, then the bug exists.

@P33M
Copy link
Contributor Author

P33M commented Jun 16, 2014

The first bad commit is 36bc08c.

The bug still exists in rpi-3.15.y branch.

@popcornmix
Copy link
Collaborator

So, what is your recommendation? Revert that commit? Fix it?

@P33M
Copy link
Contributor Author

P33M commented Jun 16, 2014

Investigating. There have been some changes to the code surrounding the patch between 3.12 and 3.15 - one fixing a race condition, which is suspicious.

popcornmix pushed a commit to raspberrypi/firmware that referenced this issue Jun 18, 2014
…tive

See: raspberrypi/linux#617

kernel: vchiq: Include SIGSTOP and SIGCONT in list of signals not-masked by vchiq to allow gdb to work
See: http://www.raspberrypi.org/forums/viewtopic.php?f=67&t=76377

firmware: gencmd: Add queries for malloc and reloc free memory
See: http://forum.stmlabs.com/showthread.php?tid=13737&pid=104854#pid104854

firmware: display: Add support for 32bpp palettes
See: #276

firmware: Alterations to support camera DRC.  (Dynamic range compression)
See: http://www.raspberrypi.org/forums/viewtopic.php?f=43&t=79622

userland: egl: Call khrn_init_options so env vars like V3D_DOUBLE_BUFFER are respected

userland: hello_fft: Add qasm source files
popcornmix pushed a commit to Hexxeh/rpi-firmware that referenced this issue Jun 18, 2014
…tive

See: raspberrypi/linux#617

kernel: vchiq: Include SIGSTOP and SIGCONT in list of signals not-masked by vchiq to allow gdb to work
See: http://www.raspberrypi.org/forums/viewtopic.php?f=67&t=76377

firmware: gencmd: Add queries for malloc and reloc free memory
See: http://forum.stmlabs.com/showthread.php?tid=13737&pid=104854#pid104854

firmware: display: Add support for 32bpp palettes
See: raspberrypi/firmware#276

firmware: Alterations to support camera DRC.  (Dynamic range compression)
See: http://www.raspberrypi.org/forums/viewtopic.php?f=43&t=79622

userland: egl: Call khrn_init_options so env vars like V3D_DOUBLE_BUFFER are respected

userland: hello_fft: Add qasm source files
neuschaefer pushed a commit to neuschaefer/raspi-binary-firmware that referenced this issue Feb 27, 2017
…tive

See: raspberrypi/linux#617

kernel: vchiq: Include SIGSTOP and SIGCONT in list of signals not-masked by vchiq to allow gdb to work
See: http://www.raspberrypi.org/forums/viewtopic.php?f=67&t=76377

firmware: gencmd: Add queries for malloc and reloc free memory
See: http://forum.stmlabs.com/showthread.php?tid=13737&pid=104854#pid104854

firmware: display: Add support for 32bpp palettes
See: raspberrypi#276

firmware: Alterations to support camera DRC.  (Dynamic range compression)
See: http://www.raspberrypi.org/forums/viewtopic.php?f=43&t=79622

userland: egl: Call khrn_init_options so env vars like V3D_DOUBLE_BUFFER are respected

userland: hello_fft: Add qasm source files
raspbian-autopush pushed a commit to raspbian-packages/linux that referenced this issue Mar 7, 2017
commit 52a6bd88d5e34f3227da24ed1eb882f296e9ee32
Author: popcornmix <[email protected]>
Date:   Wed Jun 18 13:42:01 2014 +0100

    vmstat: Workaround for issue where dirty page count goes negative
    
    See:
    raspberrypi/linux#617
    http://www.spinics.net/lists/linux-mm/msg72236.html


Gbp-Pq: Topic rpi
Gbp-Pq: Name rpi_1049_52a6bd88d5e34f3227da24ed1eb882f296e9ee32.patch
raspbian-autopush pushed a commit to raspbian-packages/linux that referenced this issue Mar 7, 2017
commit 52a6bd88d5e34f3227da24ed1eb882f296e9ee32
Author: popcornmix <[email protected]>
Date:   Wed Jun 18 13:42:01 2014 +0100

    vmstat: Workaround for issue where dirty page count goes negative
    
    See:
    raspberrypi/linux#617
    http://www.spinics.net/lists/linux-mm/msg72236.html


Gbp-Pq: Topic rpi
Gbp-Pq: Name rpi_1049_52a6bd88d5e34f3227da24ed1eb882f296e9ee32.patch
@P33M P33M closed this as completed May 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants