Skip to content

CMA not working with Kernels 3.12.y #520

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
msperl opened this issue Feb 3, 2014 · 9 comments
Closed

CMA not working with Kernels 3.12.y #520

msperl opened this issue Feb 3, 2014 · 9 comments

Comments

@msperl
Copy link
Contributor

msperl commented Feb 3, 2014

Hi!

CMA settings are not working with kernels 3.12 and later - this is related to #503, but in a much broader context than just a fixed memory split, so a different ticket as it may or may not be related.
So here in summary again:
booting with the following config.txt:

gpu_mem_256=112
gpu_mem_512=368
cma_lwm=16
cma_hwm=32
cma_offline_start=16

and kernel parameters coherent_pool=6M smsc95xx.turbo_mode=N ...

works perfectly fine on a 3.11 kernel (8f768c5) but on a 3.12 (a93bfa0) and 3.13(6928683) kernel it does not work and some memory allocations fail making the Network fail to work.

Seems as if this is mostly related to a change in the early memory management of coherent dma memory regions...

Here the "offending" lines in the dmesg messages (which unfortunately do not make it into /var/log/messages, why is an open question and hinders debugging a bit...)

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.13.0+ (root@raspberrypi) (gcc version 4.6.3 (Deb$
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), c$
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instru$
[    0.000000] Machine: BCM2708
[    0.000000] early_vc_cma_mem(0/0x14c00000@0xa000000)
[    0.000000]  -> initial 0, size 14c00000, base a000000<3>[    0.000000] vc_cma: dma_declare_contiguous(14c00000,a000000) failed
[    0.000000] Memory policy: Data cache writeback
[    0.000000] On node 0 totalpages: 121856
[    0.000000] free_area_init_node: node 0, pgdat c05f86b4, node_mem_map c06a60$
[    0.000000]   Normal zone: 984 pages used for memmap
[    0.000000]   Normal zone: 0 pages reserved
[    0.000000]   Normal zone: 121856 pages, LIFO batch:31
[    0.000000] pcpu-alloc: s0 r0 d32768 u32768 alloc=1*32768
[    0.000000] pcpu-alloc: [0] 0

So there are 2 issues:

First a cosmetic missing newline in the format string of "-> initial 0, size 14c00000, base a000000" (in 3.12 in file drivers/char/broadcom/vc_cma/vc_cma.c on line 178)

And secondly the failed dma_declare_contiguous(14c00000,a000000) call.
This is probably related to changes that have occurred in the upstream kernels with either commit f825c73 or a254738 and are probably related to configs having moved to different locations....

Martin

P.s: all kernels have been compiled with the default configs shipped with that specific kernel (make bcmrpi_defconfig)

@msperl
Copy link
Contributor Author

msperl commented Feb 3, 2014

Note that this is possibly related to the new CONFIG_DMA_CMA config option, which is not enabled in the defconfigs...

Recompiling the kernel 3.12 kernel with this enabled to see if this solves the issue...

Martin

@msperl
Copy link
Contributor Author

msperl commented Feb 3, 2014

Well - now with CONFIG_DMA_CMA configured the original error is gone!

BUT - now there are issues allocating some DMA-memory for the USB and SD card driver...

[    1.463468] dwc_otg: version 3.00a 10-AUG-2012 (platform bus)
[    1.669390] Core Release: 2.80a
[    1.672662] Setting default values for core params
[    1.677508] Finished setting default values for core params
[    1.883177] Using Buffer DMA mode
[    1.886523] Periodic Transfer Interrupt Enhancement - disabled
[    1.892400] Multiprocessor Interrupt Enhancement - disabled
[    1.897992] OTG VER PARAM: 0, OTG VER FLAG: 0
[    1.902393] Dedicated Tx FIFOs mode
[    1.905926] ERROR::pcd_init:1209: dwc_otg_pcd_init failed
[    1.905926] 
[    1.912866] ERROR::dwc_otg_driver_probe:949: pcd_init failed
[    1.912866] 
[    1.920082] dwc_otg: probe of bcm2708_usb failed with error -12
...
[    1.962289] sdhci: Secure Digital Host Controller Interface driver
[    1.968495] sdhci: Copyright(c) Pierre Ossman
[    1.972985] sdhci: Enable low-latency mode
[    1.977158] bcm2708_sdhci bcm2708_sdhci.0: cannot allocate DMA CBs
[    1.983439] bcm2708_sdhci bcm2708_sdhci.0: probe failed, err -12
[    1.989487] bcm2708_sdhci: probe of bcm2708_sdhci.0 failed with error -12
[    1.996397] sdhci-pltfm: SDHCI platform and OF driver helper
...
[    2.053730] Waiting for root device /dev/mmcblk0p2...

The full booting sequence now looks like this:

[    0.000000] Booting Linux on physical CPU 0x0
[    0.000000] Initializing cgroup subsys cpu
[    0.000000] Initializing cgroup subsys cpuacct
[    0.000000] Linux version 3.12.9+ (root@raspberrypi) (gcc version 4.6.3 (Debian 4.6.3-14+rpi1) ) #22 PREEMPT Mon Feb 3 14:42:57 UTC 2014
[    0.000000] CPU: ARMv6-compatible processor [410fb767] revision 7 (ARMv7), cr=00c5387d
[    0.000000] CPU: PIPT / VIPT nonaliasing data cache, VIPT nonaliasing instruction cache
[    0.000000] Machine: BCM2708
[    0.000000] early_vc_cma_mem(0/0x14c00000@0xa000000)
[    0.000000]  -> initial 0, size 14c00000, base a000000[    0.000000] cma: CMA: reserved 332 MiB at 0a000000
[    0.000000] cma: CMA: reserved 16 MiB at 08000000
[    0.000000] Memory policy: ECC disabled, Data cache writeback
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.  Total pages: 120872
[    0.000000] Kernel command line: dma.dmachans=0x7f35 bcm2708_fb.fbwidth=656 bcm2708_fb.fbheight=416 bcm2708.boardrev=0xf bcm2708.serial=0xXXXXX smsc95xx.macaddr=XXXXX sdhci-bcm2708.emmc_clock_freq=250000000 vc-cma-mem=0/0x14c00000@0xa000000 mem=0x9000000@0x0 mem=0x14c00000@0xa000000 vc_mem.mem_base=0x1ec00000 vc_mem.mem_size=0x20000000  coherent_pool=16M  smsc95xx.turbo_mode=N dwc_otg.lpm_enable=0 console=ttyAMA0,115200 kgdboc=ttyAMA0,115200 console=tty1 root=/dev/mmcblk0p2 rootfstype=ext4 elevator=deadline rootwait
[    0.000000] PID hash table entries: 2048 (order: 1, 8192 bytes)
[    0.000000] Dentry cache hash table entries: 65536 (order: 6, 262144 bytes)
[    0.000000] Inode-cache hash table entries: 32768 (order: 5, 131072 bytes)
[    0.000000] Memory: 119344K/487424K available (4335K kernel code, 228K rwdata, 1340K rodata, 143K init, 681K bss, 368080K reserved)
[    0.000000] Virtual kernel memory layout:
[    0.000000]     vector  : 0xffff0000 - 0xffff1000   (   4 kB)
[    0.000000]     fixmap  : 0xfff00000 - 0xfffe0000   ( 896 kB)
[    0.000000]     vmalloc : 0xdf000000 - 0xff000000   ( 512 MB)
[    0.000000]     lowmem  : 0xc0000000 - 0xdec00000   ( 492 MB)
[    0.000000]     modules : 0xbf000000 - 0xc0000000   (  16 MB)
[    0.000000]       .text : 0xc0008000 - 0xc0592f90   (5676 kB)
[    0.000000]       .init : 0xc0593000 - 0xc05b6c84   ( 144 kB)
[    0.000000]       .data : 0xc05b8000 - 0xc05f12a0   ( 229 kB)
[    0.000000]        .bss : 0xc05f12ac - 0xc069b8b0   ( 682 kB)
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] Preemptible hierarchical RCU implementation.
[    0.000000] NR_IRQS:330
[    0.000000] sched_clock: 32 bits at 1000kHz, resolution 1000ns, wraps every 4294967ms
[    0.000000] Switching to timer-based delay loop
[    0.000000] Console: colour dummy device 80x30
[    0.000000] console [tty1] enabled
[    0.001378] Calibrating delay loop (skipped), value calculated using timer frequency.. 2.00 BogoMIPS (lpj=10000)
[    0.001443] pid_max: default: 32768 minimum: 301
[    0.001908] Mount-cache hash table entries: 512
[    0.002698] Initializing cgroup subsys devices
[    0.002768] Initializing cgroup subsys freezer
[    0.002804] Initializing cgroup subsys blkio
[    0.002948] CPU: Testing write buffer coherency: ok
[    0.003425] Setting up static identity map for 0xc041df38 - 0xc041df94
[    0.005348] devtmpfs: initialized
[    0.017286] VFP support v0.3: implementor 41 architecture 1 part 20 variant b rev 5
[    0.056936] NET: Registered protocol family 16
[    0.077113] DMA: preallocated 16384 KiB pool for atomic coherent allocations
[    0.077813] cpuidle: using governor ladder
[    0.077865] cpuidle: using governor menu
[    0.078521] bcm2708.uart_clock = 0
[    0.080166] hw-breakpoint: found 6 breakpoint and 1 watchpoint registers.
[    0.080231] hw-breakpoint: maximum watchpoint size is 4 bytes.
[    0.080270] mailbox: Broadcom VideoCore Mailbox driver
[    0.080434] bcm2708_vcio: mailbox at f200b880
[    0.080552] bcm_power: Broadcom power driver
[    0.080593] bcm_power_open() -> 0
[    0.080620] bcm_power_request(0, 8)
[    0.581343] bcm_mailbox_read -> 00000080, 0
[    0.581386] bcm_power_request -> 0
[    0.581603] Serial: AMBA PL011 UART driver
[    0.581775] dev:f1: ttyAMA0 at MMIO 0x20201000 (irq = 83, base_baud = 0) is a PL011 rev3
[    0.957728] console [ttyAMA0] enabled
[    0.982342] bio: create slab <bio-0> at 0
[    0.987572] SCSI subsystem initialized
[    0.991791] usbcore: registered new interface driver usbfs
[    0.997383] usbcore: registered new interface driver hub
[    1.002999] usbcore: registered new device driver usb
[    1.009617] Switched to clocksource stc
[    1.013942] FS-Cache: Loaded
[    1.017117] CacheFiles: Loaded
[    1.032882] NET: Registered protocol family 2
[    1.038353] TCP established hash table entries: 4096 (order: 3, 32768 bytes)
[    1.045777] TCP bind hash table entries: 4096 (order: 2, 16384 bytes)
[    1.052350] TCP: Hash tables configured (established 4096 bind 4096)
[    1.058806] TCP: reno registered
[    1.062110] UDP hash table entries: 256 (order: 0, 4096 bytes)
[    1.067988] UDP-Lite hash table entries: 256 (order: 0, 4096 bytes)
[    1.074726] NET: Registered protocol family 1
[    1.079720] RPC: Registered named UNIX socket transport module.
[    1.085687] RPC: Registered udp transport module.
[    1.090485] RPC: Registered tcp transport module.
[    1.095210] RPC: Registered tcp NFSv4.1 backchannel transport module.
[    1.102717] bcm2708_dma: DMA manager at f2007000
[    1.107496] bcm2708_gpio: bcm2708_gpio_probe c05c5d90
[    1.113033] vc-mem: phys_addr:0x00000000 mem_base=0x1ec00000 mem_size:0x20000000(512 MiB)
[    1.122556] audit: initializing netlink socket (disabled)
[    1.128067] type=2000 audit(0.950:1): initialized
[    1.298376] VFS: Disk quotas dquot_6.5.2
[    1.302732] Dquot-cache hash table entries: 1024 (order 0, 4096 bytes)
[    1.311531] FS-Cache: Netfs 'nfs' registered for caching
[    1.318379] NFS: Registering the id_resolver key type
[    1.323667] Key type id_resolver registered
[    1.327878] Key type id_legacy registered
[    1.332659] msgmni has been set to 929
[    1.338535] Block layer SCSI generic (bsg) driver version 0.4 loaded (major 252)
[    1.346408] io scheduler noop registered
[    1.350495] io scheduler deadline registered (default)
[    1.355996] io scheduler cfq registered
[    1.360189] bcm2708_fb bcm2708_fb: cannot allocate DMA CBs
[    1.365718] bcm2708_fb bcm2708_fb: probe failed, err -12
[    1.371168] bcm2708_fb: probe of bcm2708_fb failed with error -12
[    1.377657] uart-pl011 dev:f1: no DMA platform data
[    1.382665] kgdb: Registered I/O driver kgdboc.
[    1.387878] vc-cma: Videocore CMA driver
[    1.391964] vc-cma: vc_cma_base      = 0x0a000000
[    1.396694] vc-cma: vc_cma_size      = 0x14c00000 (332 MiB)
[    1.402358] vc-cma: vc_cma_initial   = 0x00000000 (0 MiB)
[    1.416781] brd: module loaded
[    1.425039] loop: module loaded
[    1.428480] vchiq: vchiq_init_state: slot_zero = 0xc8000000, is_master = 0
[    1.436249] vchiq_get_state: g_state.remote->initialised != 1 (0)
[    1.445431] Loading iSCSI transport class v2.0-870.
[    1.451303] usbcore: registered new interface driver ax88179_178a
[    1.457530] usbcore: registered new interface driver smsc95xx
[    1.463468] dwc_otg: version 3.00a 10-AUG-2012 (platform bus)
[    1.669390] Core Release: 2.80a
[    1.672662] Setting default values for core params
[    1.677508] Finished setting default values for core params
[    1.883177] Using Buffer DMA mode
[    1.886523] Periodic Transfer Interrupt Enhancement - disabled
[    1.892400] Multiprocessor Interrupt Enhancement - disabled
[    1.897992] OTG VER PARAM: 0, OTG VER FLAG: 0
[    1.902393] Dedicated Tx FIFOs mode
[    1.905926] ERROR::pcd_init:1209: dwc_otg_pcd_init failed
[    1.905926] 
[    1.912866] ERROR::dwc_otg_driver_probe:949: pcd_init failed
[    1.912866] 
[    1.920082] dwc_otg: probe of bcm2708_usb failed with error -12
[    1.926485] usbcore: registered new interface driver usb-storage
[    1.932975] mousedev: PS/2 mouse device common for all mice
[    1.938681] usbcore: registered new interface driver xpad
[    1.944796] bcm2835-cpufreq: min=700000 max=700000 cur=700000
[    1.950765] bcm2835-cpufreq: switching to governor powersave
[    1.956454] bcm2835-cpufreq: switching to governor powersave
[    1.962289] sdhci: Secure Digital Host Controller Interface driver
[    1.968495] sdhci: Copyright(c) Pierre Ossman
[    1.972985] sdhci: Enable low-latency mode
[    1.977158] bcm2708_sdhci bcm2708_sdhci.0: cannot allocate DMA CBs
[    1.983439] bcm2708_sdhci bcm2708_sdhci.0: probe failed, err -12
[    1.989487] bcm2708_sdhci: probe of bcm2708_sdhci.0 failed with error -12
[    1.996397] sdhci-pltfm: SDHCI platform and OF driver helper
[    2.002194] ledtrig-cpu: registered to indicate activity on CPUs
[    2.008341] hidraw: raw HID events driver (C) Jiri Kosina
[    2.014052] usbcore: registered new interface driver usbhid
[    2.019701] usbhid: USB HID core driver
[    2.024071] TCP: cubic registered
[    2.027416] Initializing XFRM netlink socket
[    2.031783] NET: Registered protocol family 17
[    2.036437] Key type dns_resolver registered
[    2.042098] registered taskstats version 1
[    2.046738] drivers/rtc/hctosys.c: unable to open rtc device (rtc0)
[    2.053730] Waiting for root device /dev/mmcblk0p2...

So there is still an issue somewhere...

@msperl
Copy link
Contributor Author

msperl commented Feb 4, 2014

found out that from now on you have to use the cma=16M in /boot/cmdline.txt for everything to work.
Still - it needs the CONFIG_DMA_CMA options.

So please add that to the default_configs of the kernel ...
and maybe add it as a dependency so that it gets automatically included...

@msperl
Copy link
Contributor Author

msperl commented Feb 4, 2014

actually you can define coherent_pool=16M cma=64M
but it only works if cma > coherent_pool
otherwise it will fail...

@msperl
Copy link
Contributor Author

msperl commented Feb 4, 2014

As far as I understand it requires the following diff for the config to make it work with 3.12 and 3.13:

diff --git a/drivers/char/broadcom/Kconfig b/drivers/char/broadcom/Kconfig
index f089943..4888d57 100644
--- a/drivers/char/broadcom/Kconfig
+++ b/drivers/char/broadcom/Kconfig
@@ -9,7 +9,7 @@ menuconfig BRCM_CHAR_DRIVERS

 config BCM_VC_CMA
        bool "Videocore CMA"
-       depends on CMA && BRCM_CHAR_DRIVERS && BCM2708_VCHIQ
+       depends on CMA && DMA_CMA && BRCM_CHAR_DRIVERS && BCM2708_VCHIQ
        default n
         help
           Helper for videocore CMA access.

@amtssp
Copy link

amtssp commented Feb 4, 2014

I don't know if this is related.

But usually (kernel 3.10.y and 3.11.y I'm able to increase the max sample rate using HDMI from 48kHz to 192kHz.
Usually I make these changes in the bcm2835-pcm.c before building the modules.
Old:
.formats = SNDRV_PCM_FMTBIT_U8 | SNDRV_PCM_FMTBIT_S16_LE,
.rates = SNDRV_PCM_RATE_CONTINUOUS | SNDRV_PCM_RATE_8000_48000,
.rate_min = 8000,
.rate_max = 48000,
.channels_min = 1,
.channels_max = 2,

Changed to:
.formats = SNDRV_PCM_FMTBIT_U8 | SNDRV_PCM_FMTBIT_S16_LE | SNDRV_PCM_FMTBIT_S32_LE,
.rates = SNDRV_PCM_RATE_CONTINUOUS | SNDRV_PCM_RATE_8000_192000,
.rate_min = 8000,
.rate_max = 192000,
.channels_min = 1,
.channels_max = 2,

However, using kernel 3.12.6 and 3.12.7 I'm unable to do this. The module builds without error, but somehow the card 0: ALSA [bcm2835 ALSA], device 0: bcm2835 ALSA [bcm2835 ALSA] is not present.

@msperl
Copy link
Contributor Author

msperl commented Feb 5, 2014

Ok - the patch above does not handle everything correctly - the camera stops to work and I get messages like this when running rpi-still:

...
[   47.626792] vc_cma_alloc_chunks: chunk phys 62c0000, vc_cma a000000-1ebfffff - bad SPARSEMEM configuration?
[   47.626807] vc_cma_alloc_chunks: dev->cma_area =   (null)
[   47.626807] 
[   47.626820] vc_cma_alloc_chunks: ===============================
[   47.626835] vc_cma_alloc_chunks: dma_alloc_from_contiguous failed for 40000 bytes (alloc 0 of 10, 1328 free)
[   48.628138] vc_cma_alloc_chunks: ===============================
[   48.628178] vc_cma_alloc_chunks: chunk phys 6300000, vc_cma a000000-1ebfffff - bad SPARSEMEM configuration?
[   48.628193] vc_cma_alloc_chunks: dev->cma_area =   (null)
[   48.628193] 
[   48.628207] vc_cma_alloc_chunks: ===============================
[   48.628222] vc_cma_alloc_chunks: dma_alloc_from_contiguous failed for 40000 bytes (alloc 0 of 10, 1328 free)
[   49.629061] vc_cma_alloc_chunks: ===============================
[   49.629102] vc_cma_alloc_chunks: chunk phys 6340000, vc_cma a000000-1ebfffff - bad SPARSEMEM configuration?
[   49.629116] vc_cma_alloc_chunks: dev->cma_area =   (null)
[   49.629116] 
[   49.629130] vc_cma_alloc_chunks: ===============================
[   49.629145] vc_cma_alloc_chunks: dma_alloc_from_contiguous failed for 40000 bytes (alloc 0 of 10, 1328 free)
...

This may also be related to changes to the CMA infrastructure, because without CMA everything works as expected...
But as the message above mentions: SPARSEMEM may not be configured correctly...

Seems as if drivers/char/broadcom/vc_cma/vc_cma.c needs to get reviewed for necessary changes.

For now I will run without CMA.

@japrogramer
Copy link

I also have issues with cma when it is enabled. Uname http://codepad.org/vj0i85Hw dmesg http://codepad.org/so1Tvc0j .. the error happens when I start xbmc, the screen turns white an I have to kill over ssh.

@maxnet
Copy link
Contributor

maxnet commented Jun 8, 2014

[ 48.628178] vc_cma_alloc_chunks: chunk phys 6300000, vc_cma a000000-1ebfffff - bad SPARSEMEM configuration?

It seems that with 3.12.y two CMA regions are created:

  1. One by the main Linux CMA code
  2. One by the vc_cma code

The way I understand the problem, is that when the Videocore nudges that it wants to borrow a little bit of extra memory from ARM, the vc_cma code requests some CMA space from the kernel, but gets assigned a piece from region 1, while it was expecting to receive from 2, and it complains about that with that error message.

You can teach Linux to treat pool 2 as a private pool, and always assign the vc_cma code memory from there, and all other modules from pool 1 by reverting this patch: 6e1f8bc

@msperl msperl closed this as completed Jun 25, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants