Skip to content

OOM - Segmentation fault (not ulimit, not cgroups, not max-space, not exhausted RAM) #4474

Closed as not planned
@riverego

Description

@riverego

Node.js Version

v22.7.0 & previous

NPM Version

v10.8.2 & previous

Operating System

Linux ip-10-8-1-229 6.1.0-23-cloud-amd64 #1 SMP PREEMPT_DYNAMIC Debian 6.1.99-1 (2024-07-15) x86_64 GNU/Linux

Subsystem

Other

Description

The code works as expected on my own computer : it crashes when max-old-space is reached around 32G...

But on cloud VMs (of Outscale) it always runs OOM around 20G.

The problem happens on all images that I have tested : Debian12, Debian 11 & Ubuntu 20 (outscale out of the box images) with same result on 128 and 64Go of RAM Vms and all tested node versions (22, 20 & 16)

$ cat /proc/<pid>/limits
Max cpu time              unlimited            unlimited            seconds
Max file size             unlimited            unlimited            bytes
Max data size             unlimited            unlimited            bytes
Max stack size            8388608              unlimited            bytes
Max core file size        0                    unlimited            bytes
Max resident set          unlimited            unlimited            bytes
Max processes             257180               257180               processes
Max open files            1048576              1048576              files
Max locked memory         unlimited            unlimited            bytes
Max address space         unlimited            unlimited            bytes
Max file locks            unlimited            unlimited            locks
Max pending signals       257180               257180               signals
Max msgqueue size         819200               819200               bytes
Max nice priority         0                    0
Max realtime priority     0                    0
Max realtime timeout      unlimited            unlimited            us

I checked ulimits, cgroups (even if cgroups kills a process with oom reaper, it doesn't throws a segfault), I found nothing...
I tried to put 50G fixed value on ulimits to see if unlimited hides a low default value and it's the same.

I looked with /proc/sys/vm/overcommit_memory 0,1,2 values and its the same.
I tried to recompile nodejs on the VM.... Same....
I exhausted ChatGPT ideas....

I thought maybe this is a host limit applied on processes, so I tried this :

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main(int argc,char* argv[]){
        size_t oneG=1024*1048576;
        size_t maxMem=17*oneG;
        void *memPointer = NULL;
        do{
                if(memPointer != NULL){
                        printf("Max Tested Memory = %zi\n",maxMem);
                        memset(memPointer,0,maxMem);
                        free(memPointer);
                }
                maxMem+=oneG;
                memPointer=malloc(maxMem);
        }while(memPointer != NULL);
        maxMem -= oneG;
        printf("Max Usable Memory aprox = %zi\n",maxMem);

        memPointer = malloc(maxMem);
        memset(memPointer,1,maxMem);
        sleep(30);

        return 0;
}

But this can reach the VM RAM limit (64G or 128G) without any problem.
Same for the stress command....

stress -m 1 --vm-bytes 32G --vm-keep

So I'm running out of ideas... I can't figure out what makes NodeJS run OOM around 20G on these VMs....

I hope someone here has a clue about what is happening....

Thank you.

Minimal Reproduction

const fill = new Array(1000).fill('o').join('')
const bufs = []
let i = 0
while (true) {
  ++i
  bufs.push(Array.from({ length: 10*1024 * 1024 }, (_,i) => i+fill))
  // console.log(i)
}

The code just have to reach the OOM point.

Output

$ node --max-old-space-size=32000 --trace-gc index.js
[...traces]
[12808:0x6f27120]   146468 ms: Scavenge 19279.2 (19571.3) -> 19263.9 (19571.3) MB, 50.10 / 0.00 ms  (average mu = 0.831, current mu = 0.831) allocation failure;
[12808:0x6f27120]   146787 ms: Scavenge 19317.6 (19610.3) -> 19302.1 (19610.5) MB, 35.85 / 0.00 ms  (average mu = 0.831, current mu = 0.831) allocation failure;
Segmentation fault

Before You Submit

  • I have looked for issues that already exist before submitting this
  • My issue follows the guidelines in the README file, and follows the 'How to ask a good question' guide at https://stackoverflow.com/help/how-to-ask

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions