πŸ—‘οΈ GC (Garbage Collection)

GunhoΒ·2024λ…„ 10μ›” 10일
1

Object Oriented Programming (OOP) & Java

λͺ©λ‘ 보기
3/29

⛰️ Overview

As briefly discussed in here, GC (Garbage Collector), a component that belongs to the execution engine is responsible for effective memory management.

GC followed by Heap and JIT compiler is also considered to be a focal component directly related to the performance of a Java application in which if GC becomes appropriately manipulated, enhanced performance of a Java application could be well anticipated.

Oracle (1) Available at here

Performance can be measured in aspects:

  • πŸƒ responsiveness (latency)
    - how quickly an application or system responds to a requested piece of data
  • πŸ‹οΈβ€β™€οΈ throughput
    - how much volume of work has been conducted by an application in a given time.

depending on the application's focus and technical aims, the appropriate type of available GC could vary and this overall will be discussed in greater over the following sections.


πŸ—‘οΈ GC (Garbage Collection)

πŸ—‘οΈ GC strictly refers to the process of identifying the used and unused objects in the heap memory and deleting the unused objects.

In a programming language like C, the responsibility of allocating and deallocating memory from the heap directly falls to the developers, implying this process has to be manually handled. In Java however, this becomes handled automatically by the GC (Garbage Collector), such that it frees developers from incurring memory leaks.

The presence of GC also happens to be the core reason why Java or languages like Java are preferred across diverse developers' communities.

πŸ“ˆ GC Algorithms

Underlying mechanisms or algorithms within the GC have evolved over time, and as a result, lately introduced GC algorithms appear to partly complement the previous algorithms.

The most prevailing GC algorithms are namely:

  • Mark-Sweep
  • Mark-Compact
  • Copying
  • Generational

and these will be discussed in greater detail.


1. Mark-Sweep

Mark-Sweep is the most primitive GC algorithm that its underpinning mechanisms is still extesively applied to all subsequential GC algorithms.

Mark-Sweep simply follows two steps:

  1. Marking
  2. Sweeping
marking Oracle (1) Available at here

At the marking phase, GC marks a live and unused objects as above.

sweeping Oracle (1) Available at here

At the sweeping phase, GC frees all the unused objects and keeps the records of such freed memory addresses. Whenever these addresses are required, freed memory addresses can be addressed.

Mark-Sweep algorithm, however, is highly likely to incur memory issues where, specifically, a memory allocation could fail even though the sum of the freed memory fragments well exceeds the required memory spaces, followed by the fact that the pointers to freed addresses are fragments within the heap memory with fixed ranges. This phenomenon refers to fragmentation, and considerable sequential GC algorithms have been proposed to solve this.


2. Mark-Compact

Mark-Compact addresses the above fragmentation issues, in which after executing the identical marking and sweeping, compacts the remaining used memory spaces so that the memories can be better managed and better allocate objects to the heap memories.

compacting Oracle (1) Available at here

πŸ’‘ overhead refers to the indirect resources including time, memory and etc to fullfill the task.

Mark-Compact, however, supposedly incurs overhead in which the overall compacting process directly results in extra time and memory usage, directly leading to low performance.


3. Copying

Copying algorithm is another proposed solution to address the fragmentation issue that incurred from Mark-Sweeping algorithm.

Copying algorithm sets a logical boundary within the heap memory where an either half is interchangeably available for memory allocation.

Specifically, the algorithm involves the following steps:

1. allocate memory to active memory space.
2. once a memory exceeds a threshold, runs GC (Mark-Sweeping).
3. relocate all the surviving memories to theinactive memory space.
4. former inactive becomes active and vice versa.
5. repeat.

copying Leopold Medium Blog Available at here

Although, Copying algorithm addresses fragementation, it comes at the cost of the poor memory management where a half will be always unsused and overhead resulted from relocating all the surviving memories to the other half.


4. Generational

Generational algorithm is an algorithm that strictly follows, however, with better logical barriers within the heap memory, followed by the identical copying implementation. Specifically, the Generational algorithm has been implicitly derived after an empirical analysis as below informing most objects are short-lived, and incurring the overhead over these short-lived objects may not be optimal.

empirical analysis
heap memory structure
Oracle (1) Available at here

The heap memory, as a result of the analysis, was developed to form more complex logical boundaries while critically improving performance. Under the Generational algorithm, the heap memory is logically divided into:

  • πŸ‘Ά Young Generation
  • πŸ‘¦ Old Generation
  • πŸ‘΄ Permanent Generation

Objects are placed in these generations based on their age or survival duration. The longer an object survives, the more likely it will be promoted to older generations.

Given the above generational logical boundaries, GC for efficiency, can be divided into two types:

  • πŸ‡ Minor (Incremental) GC
  • 🐘 Major (Full) GC

where Minor GC operates Mark-Compact and Copying algorithms on the Young Generation, once the space within the Young Generation increases beyond a certain threshold, while Major GC normally runs the Mark-Compact over both Young and Old generations.

Both Minor and Major GCs can incur the Stop the World event that is directly relevant to the performance level of Java and hence, controlling the Stop the World event is key to improving Java performance. Recently, it appears that there are some Garbage Collectors that do not incur the Stop the World event or incur it minimally over the Minor GC. This will be discussed in greater detail below at the 🧹 Types of Garbage Collectors.

🌎 Stop the World event

an event that pauses a running application and runs GC.

marking Oracle (1) Available at here

πŸ‘Ά Young Generation

The Young Generation can be further divided into more detailed logical boundaries.

  • Eden: responsible for allocating newly created objects.
  • Suvivor Spaces(S0, S1): allocates suviving objects after some Minor GCs.

marking Oracle (1) Available at here

The Copying algorithm strictly applies in the Young Genration in which one of the two Survivor Spaces, either S0 and S1, is always left vacant and upon Minor GC, the surviving objects are copied into the empty Survivor Space. Formerly active Survivor Space then changes into the inactive space left vacant and this keeps repeating.

marking Oracle (1) Available at here

GC allocates a state to each memories called ages where it increases after each Minor GC. If a particular object reaches a certain age, such object becomes promoted to the Old Generation so that such memory no longer becomes the target of Minor GC.

marking Oracle (1) Available at here

πŸ‘¦ Old Generation

The Old Generation contains the memory addresses of the long-surviving classes, specifically the classes that survived a few cycles of the Minor GC from the Young Generation then promoted (reached a certain age), and perhaps Major GC.

marking Oracle (1) Available at here

πŸ‘΄ Permanent Generation

The Permanent Generation (PermGen) contains metadata required by the JVM to describe the classes and methods used in the application. The Permanent Generation is populated by the JVM at runtime based on classes in use by the application. In addition, Java SE library classes and methods may be stored here.

πŸ—’οΈ Notes

As of Java 8, the PermGen space was removed and was replaced with Metaspace, eliminating the β€œjava.lang.OutOfMemoryError: PermGen” issues. Unlike PermGen, which was located within the Java heap, Metaspace operates outside the heap. Class metadata is now mostly allocated from native memory instead of the Java heap.

Metaspace is strictly within the Method Area from Running Data Area and collects the symbolic references of the loaded classes after Class Loaders. (for more details please read this).


🧹 Types of Garbage Collectors

Following the above improvements in GC algorithms, recent Garbage Collectors almost all appear to strictly follow the latest Generational Algorithm.

However, in-depth implementations within each GC and their resulting suitabilities may differ across all GCs such that after a careful examination, the right type of GC has to be chosen.


Serial GC

Serial GC is a GC where both Minor GC and Major GC are done by a single thread. It particularly uses the Mark-Compact algorithm in the Old Generations. Serial GC is particularly considered to be efficient as it incurs no communication overhead between multiple threads.

The Stop the World event occurs for every Minor and Major GCs. Hence Serial GC would be suitable for applications where latency or responsiveness is not their major concern or if bounded by physical limitation (CPU single core)

Parallel GC

The Parallel GC uses the same basic algorithm as the Serial GC. However, while Serial GC processes garbage collection with a single thread, Parallel GC uses multiple threads to handle the garbage collection. As a result, it can process objects faster than Serial GC. Parallel GC is advantageous when there is enough memory and a large number of CPU cores. It is also known as the Throughput GC.

Concurrent Mark Sweep (CMS)

The Concurrent Mark Sweep (CMS) collector, also known as the concurrent low pause collector, is responsible for collecting the Old Generation. Its main goal is to reduce the pauses caused by garbage collection by performing most of its work concurrently with application threads. Unlike other collectors, CMS typically does not copy or compact live objects during collection. It leaves the live objects in place, which can lead to memory fragmentation over time. If fragmentation becomes an issue, increasing the heap size can help mitigate the problem.

G1 GC

The G1 GC or the Garbage First Garbage Collector is a parallel, concurrent, and incrementally compacting low-pause garbage collector.

Young and Old Generational division in the heap memory does not strictly apply to G1 GC as these logical divisions in G1 GC are laid in a noncontiguous pattern.

Below is a good graphical representation of how heap memory can be logically divided where the Red cells correspond to the Young Generation while blue cells become the Old Generation. Survivor Spaces are labeled S, and some memories allocated for the Old Generation that is labeled as H may be exceptionally huge.

Oracle (2) Available at here

G1 GC by its algorithm works efficiently, for instance, by directly allocating a huge object to the Old Generation and by first collecting the region with lesser live data. Combined with all other implementations of the efficient garbage collecting algorithms, G1 GC excels in low pauses or Stop the World events, resulting in improved performances. Hence, since Java 9, it is set as default GC.

ZGC

The Z Garbage Collector (ZGC) is a low-latency, scalable garbage collector. It handles all heavy processing concurrently, ensuring that application threads are not paused for more than a millisecond during garbage collection, impying no Stop the World events.

The Z Garbage Collector (ZGC) is suitable for applications requiring high latencies including financial trade systems, games and etc...

πŸ”¬ Conclusive Insights

The above GC mechanisms imply that GC and its relevant Stop the World event can be intentionally controlled by the technical requirements, where specifically memory sizes for Young and Old Gen can be manipulated.

Specifically, if:

  • GC is required to be completed in a short time span,

    • memory size for Young Gen can be reduced
      (frequent but shorter pause time for Minor GC)

    • memory size for Old Gen can be reduced
      (frequent but shorter pause time for Major GC)

    • both memory size for Young and Old Gen can be reduced
      (frequent but shorter pause time for Minor and Major GCs).

  • GC should not occur frequently and GC over the long time span is allowed,

    • memory size for Young Gen can be increased
      (few but longer pause time for Minor GC)

    • memory size for Old Gen can be increased
      (few but longer pause time for Major GC)

    • both memory size for Young and Old Gen can be increased
      (few but longer pause time for Minor and Major GCs)

  • extreme cases where there must be no GC

    • instantiate all the required objects and never run GC.

πŸ“š References

Oracle (1)
Oracle (2)
IBM
Naver
BETSOL
Leopold Medium Blog

profile
Hello

0개의 λŒ“κΈ€