To be honest, this is one of the most brutal articles I've read lately: there is a lot about death at a young age, about persecution from one area of memory to another and about a fierce struggle for performance. In general, welcome to the cut - there is a translation of Peter Marshall's excellent article on how garbage collection in V8 works today.
Over the past few years, the approach to garbage collection in V8 has changed a lot. As part of the Orinoco project, he went from a consistent approach stop-the-world
to parallel and competitive approaches with incremental folback.
Note: if you like to see the report rather than read the article, you can do this here
. If not, read on.
Any garbage collector has a set of tasks that need to be performed periodically:
- Find live/dead objects in memory.
- Reuse memory used by dead objects.
- Compact/defragment memory (optional).
, as well as reduced overall program performance.
Basic GC (full mark-compact)
The main GC collects trash from the entire heap.
garbage collection takes place in three steps: marking, recycling and compacting
Determining from what objects you can free up memory is a mandatory part of the work of the garbage collector. He considers the object alive based on information about its reachability. This means that any object referenced from the current execution environment should be stored in memory, and all inaccessible objects can be collected by the GC.
Disposal is a process during which areas of memory left from dead objects are put into a list called free-list. As soon as the marking process is completed, the GC finds such areas and adds them to the appropriate list. Free-lists differ from each other in what size of the memory area they are stored in, which allows you to quickly find the desired one. Subsequently, when we want to allocate a memory, we will look at one of the lists and find a section of suitable size.
Also, the main GC sometimes makes decisions about cleaning/compacting some pages of memory, based on its own heuristic estimates based on the degree of page fragmentation. You can think of compaction as an analogue of hard disk defragmentation on older PCs. We copy the surviving objects onto other pages that have not yet been compacted (here the free-list is just used). Thus, we can reuse the small scattered memory areas left over from dead objects.
One of the drawbacks of GC, which copies surviving objects, is that when many long-lived objects are created, you have to pay a high price for copying them. It is for this reason that only some highly fragmented memory pages undergo compaction, while for the rest, recycling is simply carried out, which does not require copying the surviving objects.
Generation Memory Device
The pile in V8 is broken down into areas called generations. There is a younger generation (which in turn is subdivided into a generation “nursery” and “intermediate” generation) and old generations. Created objects are placed in the "nursery." Subsequently, if they are experiencing the next garbage collection, they remain in the younger generation, but pass into the category of "intermediate". If they survive even after the next assembly, they are placed in the older generation.
A bunch of V8 is broken into generations. Objects are moved from the younger to the older generation if they survive the garbage collection
The heap organization in V8 relies on the above hypothesis. For example, at first glance it may seem illogical that GC is engaged in compaction/movement of objects that have survived garbage collection, because copying objects is a rather expensive operation in order to carry it out during garbage collection. But, based on the hypothesis of generations, we know that very few objects will survive this procedure. So, if you move only the surviving objects, anything that has not been moved can automatically be considered rubbish. This means that the price we pay for copying is proportional to the number of surviving objects, not all created ones.
Auxiliary GC (scavenger)
There are actually two garbage collectors in V8. The main (mark-compact) quite effectively collects garbage from the entire heap, while the auxiliary collects garbage only in young memory, because the hypothesis of generations tells us that the main efforts to collect garbage should be sent there.
The principle of operation of the auxiliary GC is such that the surviving objects always move to a new memory page. In V8, the young memory is divided into two halves. One is always free so that it is possible to move the surviving objects into it, and during assembly this initially empty area is called To-space. The area from which copying occurs is called From-space. In the worst case, every object can survive, and then you have to copy them all.
For this type of assembly, there is a separate set of pointers that reference old memory to young. And instead of scanning the entire heap, we use write barriers ( write barriers
) to maintain this set. So, combining this set with the stack and the global object, we get all the links in the young memory without having to scan all the objects from the old memory.
When copying objects from From-space to To-space, all surviving objects are placed in a continuous memory location. Thus, it is possible to get rid of fragmentation - gaps of memory left over from dead objects. After the transfer is complete, To-space becomes From-space, and vice versa. As soon as the GC finishes its work, the memory for new objects will be allocated starting from the first free address in From-space.
Scavenger transfers surviving objects to a new memory page
If you use only this strategy and do not move objects from the young memory, the memory will end pretty quickly. Therefore, objects that have survived two garbage collections are moved to the old memory.
The final step is to update the pointers to the objects that have been moved. Each copied object leaves its original address, leaving instead the forwarding address, which is necessary to find the original object in the future.
Scavenger transfers the "intermediate" objects to the old memory, and the objects from the "manger" to the new page
Thus, garbage collection in young memory consists of three steps: marking objects, copying them, updating pointers.
Most of the listed algorithms are described in various sources and are often used in execution environments with support for automatic garbage collection. But GC in V8 has come a long way before becoming a truly modern tool. One of the significant metrics describing its performance is how often and how long the main thread pauses while the garbage collector performs its functions. For the classic stop-the-world collectors, this time leaves its mark on the experience of using the page due to delays, poor rendering and an increase in response time.
Orinoco GC V8 logo
Orinoco is a codename for GC using modern techniques of parallel, incremental and competitive garbage collection. There are some terms that have a specific meaning in the context of GC, so let's first give their definitions.
Parallelism is when the main and auxiliary streams do approximately the same amount of work per unit time. This is still the stop-the-world approach, but the length of the pause in this case is divided by the number of threads involved in the work (minus the cost of synchronization).
Main and auxiliary streams work on the same task at the same time
Incrementality is when the main thread does a small amount of work intermittently. Instead of full garbage collection, small tasks are done on partial assembly.
As can be seen from the diagram, this approach does not reduce the total amount of work (and, as a rule, even increases it), but it distributes this work in time. Therefore, it is a good way to solve one of the main tasks - reducing the response time of the main stream.
Small areas of GC work in the main thread
In addition, there are also read/write races, since the auxiliary and main threads simultaneously read or modify the same objects.
GC State in V8
V8 distributes the work of garbage collection between auxiliary streams in young memory (scavenging). Each stream receives a set of pointers, following which, moves all living objects into To-space.
When moving objects in To-space, threads need to synchronize through atomic read/write/compare and swap operations to avoid a situation where, for example, another thread found the same object, but following a different path, and also tries to move it.
The thread that moved the object to To-space then returns and leaves the forwarding pointer so that other threads that find this object can proceed to the right address. For fast and non-synchronized memory allocation for surviving objects, threads use thread local buffers.
Parallel builds distribute work between multiple auxiliary threads and the main thread
The main GC in V8 starts with the labeling of objects. As soon as the heap reaches a certain limit (calculated dynamically), competitive markers begin their work. Each of the streams receives a set of pointers, and, passing on them, they mark each found object as achievable.
Primary GC uses competitive labeling, recycling, and parallel compaction and updating of pointers
The root set is scanned again to make sure all living objects are marked, and then memory compaction and pointer updating begin in several streams.
Not all pages in the old memory are compacted - those that do not will be scanned to the vacant memory areas (sweeping) for listing them (free-lists).
GC can send tasks (idle tasks) that can be performed “in free time” and which are parts of the work that in any case would have to be done. An environment like Chrome, where the engine is embedded, can have its own idea of what is considered free time. For example, in Chrome at a frequency of 60 frames per second, the browser has about 16.6 ms to render an animation frame.
If the animation work is completed earlier, in free time, before the next frame arrives, Chrome can perform some of the tasks received from GC.
GC uses the free time of the main stream to do pre-cleaning
Learn more from our Idle-time GC publication
GC in V8 has come a long way since its inception. Adding parallel, incremental and competitive techniques to it took several years, but it paid off, allowing us to do most of the work in the background.
Everything related to the pauses of the main thread, the response time and page load has significantly improved, which allows the animation, scrolling and user interaction on the page to be much smoother. A parallel collector reduced the total processing time of a young memory by 20–50%, depending on the load.
Idle-time GC allows you to reduce the size of the used heap for Gmail by 45%. Competitive labeling and disposal (sweeping) can reduce the duration of GC pauses in heavy WebGL games by up to 50%.
However, the work is not over. Reducing pauses remains an important task to simplify the lives of web users, and we are looking for the possibility of using more advanced techniques to achieve the goal.
On top of that, Blink (a renderer in Chrome) is also equipped with a collector (Oilpan), and we are working on improving the interaction between the two GCs and also on using Orinoco techniques in Oilpan.