这里的目标是在Linux内存紧张时保持每个运行进程的可执行代码在内存中。
在Linux中,我可以通过以下命令立即(1秒)引起高内存压力并触发OOM-killer:stress --vm-bytes $(awk '/MemAvailable/{printf "%d\n", $2 + 4000;}' < /proc/meminfo)k --vm-keep -m 4 --timeout 10s
(来自这里),在Qubes OS R4.0 Fedora 28 AppVM中最大RAM为24000MB。编辑4:可能相关的是我忘记提到的一点,即我没有启用交换分区(即CONFIG_SWAP
未设置)。
dmesg报告:
[ 867.746593] Mem-Info:
[ 867.746607] active_anon:1390927 inactive_anon:4670 isolated_anon:0
active_file:94 inactive_file:72 isolated_file:0
unevictable:13868 dirty:0 writeback:0 unstable:0
slab_reclaimable:5906 slab_unreclaimable:12919
mapped:1335 shmem:4805 pagetables:5126 bounce:0
free:40680 free_pcp:978 free_cma:0
有趣的部分是active_file:94 inactive_file:72
,它们以千字节为单位,非常低。
问题在于,在内存压力期间,可执行代码正在从磁盘重新读取,导致磁盘抖动,进而引发操作系统冻结。(但在上述情况下,这种情况只发生了不到1秒钟)
if (page_referenced(page, 0, sc->target_mem_cgroup,
&vm_flags)) {
nr_rotated += hpage_nr_pages(page);
/*
* Identify referenced, file-backed active pages and
* give them one more trip around the active list. So
* that executable code get better chances to stay in
* memory under moderate memory pressure. Anon pages
* are not likely to be evicted by use-once streaming
* IO, plus JVM can create lots of anon VM_EXEC pages,
* so we ignore them here.
*/
if ((vm_flags & VM_EXEC) && page_is_file_cache(page)) {
list_add(&page->lru, &l_active);
continue;
}
}
我认为如果有人能指出如何更改这个代码,使得它不是“给他们在活动列表上多走一圈”,而是“给他们无限次在活动列表上行走”,那么工作就做好了。或者也许还有其他办法?
我可以打补丁并测试自定义内核。我只是不知道要在代码中更改什么,以便始终将活动可执行代码保留在内存中(实际上,我相信这样可以避免磁盘抖动)。
编辑: 这是我迄今为止取得的成果(应用于内核4.18.5之上):
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 32699b2..7636498 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -208,7 +208,7 @@ enum lru_list {
#define for_each_lru(lru) for (lru = 0; lru < NR_LRU_LISTS; lru++)
-#define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_ACTIVE_FILE; lru++)
+#define for_each_evictable_lru(lru) for (lru = 0; lru <= LRU_INACTIVE_FILE; lru++)
static inline int is_file_lru(enum lru_list lru)
{
diff --git a/mm/vmscan.c b/mm/vmscan.c
index 03822f8..1f3ffb5 100644
--- a/mm/vmscan.c
+++ b/mm/vmscan.c
@@ -2234,7 +2234,7 @@ static void get_scan_count(struct lruvec *lruvec, struct mem_cgroup *memcg,
anon = lruvec_lru_size(lruvec, LRU_ACTIVE_ANON, MAX_NR_ZONES) +
lruvec_lru_size(lruvec, LRU_INACTIVE_ANON, MAX_NR_ZONES);
- file = lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
+ file = //lruvec_lru_size(lruvec, LRU_ACTIVE_FILE, MAX_NR_ZONES) +
lruvec_lru_size(lruvec, LRU_INACTIVE_FILE, MAX_NR_ZONES);
spin_lock_irq(&pgdat->lru_lock);
@@ -2345,7 +2345,7 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
sc->priority == DEF_PRIORITY);
blk_start_plug(&plug);
- while (nr[LRU_INACTIVE_ANON] || nr[LRU_ACTIVE_FILE] ||
+ while (nr[LRU_INACTIVE_ANON] || //nr[LRU_ACTIVE_FILE] ||
nr[LRU_INACTIVE_FILE]) {
unsigned long nr_anon, nr_file, percentage;
unsigned long nr_scanned;
@@ -2372,7 +2372,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
* stop reclaiming one LRU and reduce the amount scanning
* proportional to the original scan target.
*/
- nr_file = nr[LRU_INACTIVE_FILE] + nr[LRU_ACTIVE_FILE];
+ nr_file = nr[LRU_INACTIVE_FILE] //+ nr[LRU_ACTIVE_FILE]
+ ;
nr_anon = nr[LRU_INACTIVE_ANON] + nr[LRU_ACTIVE_ANON];
/*
@@ -2391,7 +2392,8 @@ static void shrink_node_memcg(struct pglist_data *pgdat, struct mem_cgroup *memc
percentage = nr_anon * 100 / scan_target;
} else {
unsigned long scan_target = targets[LRU_INACTIVE_FILE] +
- targets[LRU_ACTIVE_FILE] + 1;
+ //targets[LRU_ACTIVE_FILE] +
+ 1;
lru = LRU_FILE;
percentage = nr_file * 100 / scan_target;
}
同样也可以在github上这里看到,因为在上面的代码中,tab键被转换成了空格!(镜像1,镜像2)
我已经测试了上述的补丁(现在我的最大RAM为4000MB,比之前少了20G!),即使是已知会把操作系统磁盘刷入永久冻结状态的Firefox编译,也不会再出现这种情况(oom-killer几乎立即杀死了有问题的进程),还有上面的stress
命令,现在的输出结果为:
[ 745.830511] Mem-Info:
[ 745.830521] active_anon:855546 inactive_anon:20453 isolated_anon:0
active_file:26925 inactive_file:76 isolated_file:0
unevictable:10652 dirty:0 writeback:0 unstable:0
slab_reclaimable:26975 slab_unreclaimable:13525
mapped:24238 shmem:20456 pagetables:4028 bounce:0
free:14935 free_pcp:177 free_cma:0
那是active_file:26925 inactive_file:76
,近27兆的活跃文件...所以,我不知道这有多好。我是不是把所有的活跃文件都保留在内存中,而不仅仅是可执行文件?在 Firefox 编译期间,我有大约 500 兆的 Active(file)(但根据 dmesg 的上述 active_file: 显示的值与 cat /proc/meminfo|grep -F -- 'Active(file)' 不同!),这让我怀疑它只是 exes/libs...也许有人能建议如何只保留可执行代码吗?(如果已经不是这样的话)你有什么想法?编辑3:使用上面的补丁似乎需要(定期?)运行
sudo sysctl vm.drop_caches=1
来释放一些陈旧的内存(?),这样,如果我在 Firefox 编译后调用stress
,我会得到:active_file:142281 inactive_file:0 isolated_file:0
(142兆),然后清除文件缓存(另一种方法:echo 1|sudo tee /proc/sys/vm/drop_caches
),然后再次运行stress
,我会得到:active_file:22233 inactive_file:160 isolated_file:0
(22兆)-我不确定...没有上面的补丁结果:这里,使用上面的补丁结果:这里。
mlockall()
。为了将多个可执行文件保留在内存中,我会考虑创建一个小的ramfs分区,并将所需的可执行文件复制到其中。 - gudokEDIT
的补丁完成了将每个活动的可执行文件保存在RAM中的工作(似乎如此),从而几乎完全减少了磁盘抖动,因此我不再遇到操作系统永久冻结的情况。感谢提供earlyoom链接! - user10239615le9g.patch
https://gist.github.com/constantoverride/84eba764f487049ed642eb2111a20830#gistcomment-2997383 - user11509478