Optimization of programs under the Garbage Collector

Optimization of programs under the Garbage Collector


Not so long ago, an excellent article appeared on Habré Build optimization garbage in a high-loaded .NET service . This article is very interesting because the authors, having armed themselves with the theory, have made the previously impossible: they optimized their application using the knowledge of the work of GC. And if previously we had no idea how this GC works, now it is presented to us on a platter by the efforts of Konrad Kokos in his book Pro .NET Memory Management . What conclusions did I draw for myself? Let's make a list of problem areas and think about how to solve them.


At a recent CLRium # 5 workshop: Garbage Collector, we talked about GC all day. However, I decided to publish one report with a text transcript. This is a report on the conclusions regarding application optimization.



Reduce cross-generational connectivity


Problem


To optimize garbage collection speed, the GC collects the younger generation whenever possible. But to do this, he also needs information about links from older generations (in this case, they are an additional root): the card table.


At the same time, one link from the older to the younger generation forces us to cover the area with a card table:


  • 4 bytes overlaps 4 Kb or max. 320 objects - for x86 architecture
  • 8 bytes overlaps 8 KB or max. 320 objects - for x64 architecture

Ie. GC, checking the card table, meeting a non-zero value in it, is forced to check a maximum of 320 objects for the presence of outgoing links into our generation.


Therefore, sparse links to the younger generation will make GC more time consuming


Solution


  • Place objects with connections to the younger generation - close by;
  • If traffic of objects of zero generation is supposed, use pulling. Those. make a pool of objects (there will be no new ones: there will be no zero generation objects). And further, warming up the pool with two consecutive GCs so that its contents are guaranteed to fail into the second generation, thereby avoiding references to the younger generation and having zeros in the card table;
  • Avoid links to the younger generation;

Avoid strong connectivity


Problem


As follows from the algorithms for the phase of compressing objects in SOH:


  • To compress a heap, you need to go around the tree and check all the links correcting them with new values ​​
  • At the same time, links from the card table affect entire groups of objects

Therefore, the general strong connectivity of objects can lead to subsidence at GC.


Solution


  • Place strong-connected objects side by side, in the same generation
  • Avoid unnecessary links in general (for example, instead of duplicating this- & gt; handle links, you should use the existing this- & gt; Service- & gt; handle)
  • Avoid code with hidden connectivity.For example, closures

Monitor the use of segments


Problem


During intensive work, a situation may arise when the allocation of new objects leads to delays: the allocation of new segments under the heap and their further decomposition when cleaning garbage


Solution


  • Using PerfMon/Sysinternal Utilities, check the selection points of new segments and their decommitting and freeing
  • If it’s about LOH, which has heavy traffic buffers, use ArrayPool
  • If it’s about SOH, make sure that objects of the same lifetime stand out side by side, ensuring that the Sweep triggers instead of Collect
  • SOH: use object pools

Do not allocate memory in loaded code areas


Problem


Loaded section of code allocates memory:


  • As a result, the GC selects the allocation window not 1 KB, but 8 KB.
  • If the window does not have enough space, this leads to a GC and expansion of the zakommichenoy zone
  • A heavy stream of new objects will make short-lived objects from other streams quickly go to the older generation with worse garbage collection conditions
  • What will increase garbage collection time?
  • Which will lead to a longer Stop the World even in Concurrent mode

Solution


  • A complete ban on the use of closures in critical parts of the code
  • Complete prohibition of boxing on critical parts of the code (you can use emulation through pulling if necessary)
  • Where you need to create a temporary object for data storage, use structures. Better - ref struct. When the number of fields is more than 2, send by ref

Avoid unnecessary memory allocations in the LOH


Problem


Placing arrays in LOH leads either to its fragmentation or to the weighting of the GC procedure


Solution


  • Use partitioning of arrays into subarrays and a class that encapsulates the logic of working with such arrays (i.e., instead of List & lt; T & gt; where the mega-array is stored, your MyList with array [] [] separates the array into somewhat shorter) br/>
    • Arrays go to SOH
    • After a couple of garbage collections will lie next to the foreground objects and stop affecting the garbage collection
  • Control the use of double arrays longer than 1000 elements.

Where it’s possible and possible to use thread stack


Problem


There are a number of ultra short lived objects or objects that live as part of a method call (including internal calls). They create traffic objects


Solution


  • Use memory allocation on the stack, where possible:
    • It doesn’t load a bunch
    • Does not load GC
    • Freeing memory - instant
  • Use Span T x = stackalloc T []; instead of new T [] where possible
  • Use Span/Memory where possible
  • Translate algorithms to ref stack types (StackList: struct, ValueStringBuilder )

Release objects as early as possible


Problem


Conceived as short-lived, objects fall into gen1, and sometimes into gen2.
This leads to a weighted GC that lasts longer


Solution


  • You need to release the object reference as early as possible
  • If a lengthy algorithm contains code that works with any objects, separated by code. But which can be grouped in one place, it is necessary to group it, allowing thereby to collect them earlier.
    • For example, on line 10 we took out a collection, and on line 120 we filtered it out.

You do not need to call GC.Collect ()


Problem


Often it seems that if you call GC.Collect (), this will fix the situation


Solution


  • It is much more correct to learn the algorithms of the GC, look at the application under ETW and other diagnostic tools (JetBrains dotMemory, ...)
  • Optimize the most problematic areas

Avoid Pinning


Problem


Pinning creates a range of problems:


  • Complicates garbage collection
  • Creates free memory spaces (free-list items, bricks table, buckets)
  • May leave some objects in a younger generation, forming links from the card table

Solution


If there is no other way, use fixed () {}. This method of fixation does not make a real fixation: it occurs only when the GC has worked inside the curly braces.


Avoid finalizing


Problem


Finalization is not deterministic:


  • Undiscovered Dispose () causes finalization with all outgoing links from the object
  • Dependent objects are delayed longer than scheduled
  • Grow older by moving to older generations
  • If they contain links to younger ones, generate links from the card table
  • Complicating the assembly of older generations, fragmenting them and leading to Compacting instead of Sweep

Solution


Carefully call Dispose ()


Avoid a lot of threads


Problem


With a large number of threads, the number of allocation context grows, because they are allocated to each thread:


  • As a result, GC.Collect comes faster.
  • Due to the lack of space in the ephemeral segment, after Sweep, Collect will occur

Solution


  • Control the number of threads by number of cores

Avoid traffic of objects of different size


Problem


When traffic objects of different size and lifetime, fragmentation occurs:


  • Increase Fragmentation ratio
  • Triggering Collection with address change phase in all referencing objects

Solution


If you intend to traffic objects:


  • Check for extra margins by approximating dimensions
  • Check for lack of manipulation of strings: where possible, replace with ReadOnlySpan/ReadOnlyMemory
  • Free the link as soon as possible
  • Take advantage of pulling
  • Caches and pools "warm up" with a double GC to compact objects. Thereby you avoid problems with the card table.

Source text: Optimization of programs under the Garbage Collector