尽管有可用内存,系统无法分配内存。

我在我的服务器上运行Gentoo,并且刚刚从内核4.4.39升级到4.9.6,但内核配置基本没有改变。我的系统日志被以下错误报告填满了:
[50547.483577] ksoftirqd/0: page allocation failure: order:0, mode:0x2280020(GFP_ATOMIC|__GFP_NOTRACK)
[50547.483605] CPU: 0 PID: 3 Comm: ksoftirqd/0 Not tainted 4.9.6-gentoo-r1 #2
[50547.483608] Hardware name:    /LakePort, BIOS 6.00 PG 02/20/2009
[50547.483613]  f5473bd0 c13e692e c17a9870 00000000 f5473c00 c10d03a7 c17a79dc 02280020
[50547.483626]  f5473c08 f5473c10 c17a9870 f5473be4 f5282d37 00000008 00000000 00000030
[50547.483638]  f5473cbc c10d0769 02280020 c17a9870 00000000 f5473c34 00000000 e5dca054
[50547.483652] Call Trace:
[50547.483670]  [<c13e692e>] dump_stack+0x47/0x69
[50547.483679]  [<c10d03a7>] warn_alloc+0xf7/0x120
[50547.483686]  [<c10d0769>] __alloc_pages_nodemask+0x329/0xb40
[50547.483697]  [<c1107114>] new_slab+0x2a4/0x460
[50547.483704]  [<c1108e62>] ___slab_alloc.constprop.81+0x392/0x540
[50547.483713]  [<c159fe11>] ? __build_skb+0x21/0x100
[50547.483721]  [<c1109027>] __slab_alloc.constprop.80+0x17/0x30
[50547.483727]  [<c11090c2>] kmem_cache_alloc+0x82/0xb0
[50547.483733]  [<c159fe11>] ? __build_skb+0x21/0x100
[50547.483738]  [<c159fe11>] __build_skb+0x21/0x100
[50547.483744]  [<c159ffda>] __netdev_alloc_skb+0x9a/0xe0
[50547.483751]  [<c1017774>] ? nommu_map_page+0x34/0x60
[50547.483771]  [<f81f64be>] e1000_alloc_rx_buffers+0x18e/0x1f0 [e1000e]
[50547.483788]  [<f81f3d54>] e1000_clean_rx_irq+0x244/0x3f0 [e1000e]
[50547.483804]  [<f81fa176>] e1000e_poll+0x96/0x2d0 [e1000e]
[50547.483810]  [<c11098f1>] ? kmem_cache_free_bulk+0x1c1/0x280
[50547.483817]  [<c15ad7ca>] net_rx_action+0x16a/0x270
[50547.483825]  [<c1043df7>] __do_softirq+0xb7/0x1a0
[50547.483832]  [<c169b108>] ? __schedule+0x138/0x510
[50547.483839]  [<c1043ef8>] run_ksoftirqd+0x18/0x40
[50547.483846]  [<c105c01c>] smpboot_thread_fn+0xfc/0x160
[50547.483851]  [<c105bf20>] ? sort_range+0x30/0x30
[50547.483857]  [<c1058ac3>] kthread+0xa3/0xc0
[50547.483863]  [<c1058a20>] ? kthread_park+0x50/0x50
[50547.483868]  [<c169ef43>] ret_from_fork+0x1b/0x28
[50547.483872] Mem-Info:
[50547.483887] active_anon:20896 inactive_anon:4650 isolated_anon:0
                active_file:120066 inactive_file:528731 isolated_file:115
                unevictable:1558 dirty:2365 writeback:0 unstable:0
                slab_reclaimable:135114 slab_unreclaimable:6440
                mapped:16650 shmem:7338 pagetables:452 bounce:0
                free:4552 free_pcp:30 free_cma:0
[50547.483899] Node 0 active_anon:83584kB inactive_anon:18600kB active_file:480264kB inactive_file:2114924kB unevictable:6232kB isolated(anon):0kB isolated(file):460kB mapped:66600kB dirty:9460kB writeback:0kB shmem:29352kB writeback_tmp:0kB unstable:0kB pages_scanned:29 all_unreclaimable? no
[50547.483911] DMA free:3356kB min:68kB low:84kB high:100kB active_anon:0kB inactive_anon:0kB active_file:3360kB inactive_file:0kB unevictable:0kB writepending:16kB present:15988kB managed:15912kB mlocked:0kB slab_reclaimable:8908kB slab_unreclaimable:184kB kernel_stack:8kB pagetables:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[50547.483913] lowmem_reserve[]: 0 834 3265 3265
[50547.483932] Normal free:1376kB min:3660kB low:4572kB high:5484kB active_anon:0kB inactive_anon:0kB active_file:227508kB inactive_file:96kB unevictable:0kB writepending:4104kB present:892920kB managed:855240kB mlocked:0kB slab_reclaimable:531548kB slab_unreclaimable:25576kB kernel_stack:1784kB pagetables:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB
[50547.483933] lowmem_reserve[]: 0 0 19444 19444
[50547.483951] HighMem free:13476kB min:512kB low:3176kB high:5840kB active_anon:83584kB inactive_anon:18600kB active_file:249396kB inactive_file:2114740kB unevictable:6232kB writepending:5340kB present:2488904kB managed:2488904kB mlocked:6232kB slab_reclaimable:0kB slab_unreclaimable:0kB kernel_stack:0kB pagetables:1808kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB
[50547.483952] lowmem_reserve[]: 0 0 0 0
[50547.483960] DMA: 17*4kB (UM) 15*8kB (U) 32*16kB (UE) 23*32kB (UME) 14*64kB (UME) 8*128kB (UM) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 3356kB
[50547.483989] Normal: 105*4kB (ME) 122*8kB (UM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 1396kB
[50547.484013] HighMem: 2049*4kB (UM) 111*8kB (UM) 25*16kB (UM) 12*32kB (M) 8*64kB (UM) 3*128kB (M) 3*256kB (UM) 4*512kB (M) 0*1024kB 0*2048kB 0*4096kB = 13580kB
[50547.484030] 657546 total pagecache pages
[50547.484030] 0 pages in swap cache
[50547.484030] Swap cache stats: add 0, delete 0, find 0/0
[50547.484030] Free swap  = 0kB
[50547.484030] Total swap = 0kB
[50547.484030] 849453 pages RAM
[50547.484030] 622226 pages HighMem/MovableOnly
[50547.484030] 9439 pages reserved

如果我理解正确的话,内核正在尝试并且未能分配一个4KB的页面,尽管有16MB的完全空闲内存和2GB以上的磁盘缓存可以轻松释放。
运行cat /proc/buddyinfo显示内存严重碎片化,但是在分配单个页面时,碎片化不应该是一个问题。虽然这可能是潜在问题的症状。你知道发生了什么吗?
1个回答

你的问题显示在这一行:
[50547.483932] Normal free:1376kB min:3660kB low:4572kB high:5484kB active_anon:0kB inactive_anon:0kB active_file:227508kB inactive_file:96kB unevictable:0kB writepending:4104kB present:892920kB managed:855240kB mlocked:0kB slab_reclaimable:531548kB slab_unreclaimable:25576kB kernel_stack:1784kB pagetables:0kB bounce:0kB free_pcp:120kB local_pcp:120kB free_cma:0kB

2个重要的值是freemin。内核是唯一允许系统降至min值以下的东西。当发生这种情况时,用户空间基本上会冻结,直到它恢复到min值以上。如果启用了OOM killer,它可以自由地开始杀死进程。
您可以使用sysctl参数vm.min_free_kbytes来控制此行为。
请参阅本文以获得关于此主题的良好解释。

2进一步的调查表明,我很可能遇到了这个内核 bug:OOM but no swap used - Mark
3对于其他遇到这个问题的人来说,这个错误似乎已经在4.9.12和4.9.18之间的某个版本中修复了。 - Mark