这似乎是一个完全有效的答案,但从理论上讲,似乎可以构建一个允许大部分数组被垃圾收集的数组垃圾收集器,同时留下一些仍在使用的小子数组。换句话说,如果有一个50,000元素的数组,其中只有一个小的100元素切片仍在使用,垃圾收集器可以将数组分成三个部分- 100元素切片之前的元素,100元素切片本身和100元素切片之后的元素- 然后回收这些部分的第一个和最后一个。
我的问题是是否有任何语言实现实际使用这种类型的垃圾收集器,或者它只存在于理论上。是否有人知道具有此类垃圾收集器的语言实现的示例?
Imagine we have some object layouts:
====================================
A: { int, object, double, object }
B: { object, object }
C: { int }
Memory Data:
============
Now we need to describe what is in memory.
The following list is composed of memory positions,
and the data contained in each position.
There are 2 kinds of notation I will use:
- Address: ROOT[ list-of-root-references ]
- Address: type-identifier { object-data }
Note that each object can span multiple memory
positions from the first one.
e.g. 90: B { 0, 0 } -- this reads as
"stored at address 90: type-identifier of B followed by raw data { 0, 0 }"
- A reference is represented by a number,
that point to the memory address of the pointed object.
1: ROOT[ 90, 20, 30 ]
20: A { 1236, 30, 0.5, 60 }
30: B { 60, 70 }
40: C { 1237 }
50: A { 1238, 20, 0.8, 50 } There is a self-reference here!
60: C { 1234 }
70: A { 1234, 0, 0.7, 80 }
80: C { 1235 }
90: B { 0, 0 } Note that 0 is a null reference!
The GC need to know the layout of each object.
Otherwise it wouldn't be abled to tell
what knid of information is stored in each memory position.
Running the GC:
===============
Garbage collecting steps, to clean the memory described above:
Step 1: Get ROOT references: 2, 3, 9 and mark as 'in-use'
Step 2: Get references from 2, 3 and 9: 3, 6, 7. Note that 0 is a null reference!
Step 3: Get references from 3, 6 and 7: 6, 7, 8, and mark them.
Step 4: Get references from 6, 7, 8, and mark them: only 8!
Step 5: Get references from 8... it has none! We are finished with marking objects.
Step 6: Delete unmarked objects.
This shows what happened in each step with each object stored in the memory.
Step -> 1 2 3 4 5
Memory
20 x
30 x x
40 DELETE
50 DELETE
60 x x
70 x x
80 x x
90 x
To represent the memory data I will introduce the following notation:
- Address: { object-data } -- this time object type is not at the begining!
- A reference is represented by a type-identifier and an address-number,
that point to the memory address of the pointed object,
in the following format: type/number...
e.g. A/20 -- this reads as: "at address 20, there is an object of type A"
Note that: 0/0 is a null reference,
but it still requires space to store the type.
The memory would look like this:
1: ROOT[ B/90, A/20, B/30 ]
20: { 1236, B/30, 0.5, C/60 }
30: { C/60, A/70 }
40: { 1237 }
50: { 1238, A/20, 0.8, A/50 }
60: { 1234 }
70: { 1234, 0/0, 0.7, C/80 }
80: { 1235 }
90: { 0/0, 0/0 }
The memory looks like the first sample:
*This is the same notation used at the begining of this answer.
1: ROOT[ 90, 20, 30 ]
20: A { 1236, 30, 0.5, 60 }
30: B { 60, 70 }
40: C { 1237 }
50: A { 1238, 20, 0.8, 50 }
60: C { 1234 }
70: A { 1234, 0, 0.7, 80 }
80: C { 1235 }
90: B { 0, 0 }
我们首先注意到的是,我们不能再将布局信息存储在已分配的内存附近。
想象一下一个带有共享内存的数组:
示例:
I'll introduce a new notation for arrays:
type-identifier < array-length >
1: ROOT[ 20, 31 ] -- pointer 31 is invalid, destination has no type-identifier.
20: INT<5> -- info about layout of the next memory data (spans by 10 memory units)
30: 2
31: 3 -- should have type-identifier, because someone
32: 5 is pointing here!! Invalid memory!!
33: 7
34: 11
1: ROOT[ INT<5>/30, INT<2>/31 ]
30: 2
31: 3
32: 5
33: 7
34: 11
也许我完全错了。但是我无法想象有比现在更好的方法。 现代GC甚至比这个更复杂... 我在这里只涵盖了基础知识, 我认为,现代GC正在通过其他更可靠的方式进行优化。
其他参考资料:
http://en.wikipedia.org/wiki/Garbage_collection_(computer_science)
http://www.memorymanagement.org/glossary/t.html
我实际上编写了一个虚拟机(HLVM),通过使用四字节引用来避免头文件,即元数据随引用一起传递,而不是在指向的对象中。从引用计数研究中我们知道,绝大多数对象的引用计数为1,因此每个引用头的潜在重复实际上比你想象的要便宜。没有头文件意味着HLVM对数组的内部表示与C兼容,因此互操作性更加容易和快速。同样,切片可以是数组中间的引用。适当的GC算法,例如标记区域,可以释放不再可达的堆分配块的部分并重新使用它们,同时保留可达的切片。
另一个解决方案是使用虚拟内存。您可以将逻辑页面映射到相同的物理页面上,GC可以收集不可达页面。然而,这是非常粗粒度的,因此更加专业化。
当然,制作一个“更智能”的垃圾收集器的危险在于,你可能会以某种方式削弱用户,要么是阻止代码工作,要么是为了过度热衷的垃圾收集器而使用笨拙的解决方法。
我认为所有关联数组的实现(PERL、PHP、javascript等)都必须这样做。你称之为“垃圾回收”,但这意味着特定的项目必须首先被取消设置(删除),以便垃圾收集器知道它们未使用。因此,这是正常的删除/移除/取消设置,这肯定不仅反映在关联数组中,而且还反映在特定语言实现所使用的数据结构中。否则,内存可能会被几乎空的数组耗尽...
O(n)
并没有太大意义),而且GC变得如此聪明也会带来一些性能和存储开销。 - user395760