Java's Garbage Collection for Beginners

Object-Oriented Programming relies heavily on objects (hence the name). We create and use objects in Java all the time and yet we don’t really pay much attention to it. But each one of those objects take up space in the memory. And as we know, memory is finite. In theory, after we create so many objects the memory would be full and no objects would be created anymore. But that doesn’t happen. Why?

Garbage Collection

Garbage Collection is an automatic memory manager. It is the process used by Java to free up unused memory. It does that by cleaning the Heap Memory of unreferenced/unreachable objects and reclaiming the space back to be used by another object in the future. In languages like C++, for example, you have to manually allocate memory (malloc() or calloc()) and then, later, deallocate it (free()).

But what happens if the programmer forgets to free up the memory after there’s no reference to the object? A Memory Leak. This happens when the memory used by the object is not released after it has no reference (the opposite would be Dangling Pointer).

An example of unreachable object that would cause Memory Leak is:

Integer num = new Integer(5); // 1 
num = null; // 2

1 - The Integer object is added to the Heap Memory. Also, the variable “num” is added to the Stack Memory with a pointer to the Integer’s address in Heap Memory, making the Integer a live object

2 - Now the Integer object has no reference which means it’s unreachable. It’s a dead object. If an unreachable object remains in the Heap Memory it might prevent JVM from allocating new objects in it. When this happens Java raises java.lang.OutOfMemoryError.

It might look like a simple process but the Garbage Collector needs a complex strategy to deal with all the live and dead objects.

Tracing Strategy

The GC work can be divided into two tasks:

  1. How it finds dead objects
  2. What it does to them.

The most common Garbage Collection strategy to find dead objects is Tracing. In this strategy the Garbage Collector is responsible for tracing all objects branching from GC roots. Those roots are objects outside Heap Memory, like System classes, Threads or method variables.

This strategy uses these algorithms to deal with dead objects: Copying Algorithm, Mark-and-sweep Algorithm and Mark-and-Compact Algorithm.

Mark-and-sweep Algorithm

This algorithm is divided into two processes: marking live objects and reclaiming dead objects’ spaces. This is the most common and simple algorithm. But its weakness is in its simplicity.

Whenever the GC cycle runs it traverses all objects and marks live ones, and later dead objects will be swept leaving empty spaces in the memory. Those spaces can be small or big and depending on the size of the new object it might not fit on any of those spaces. This whole process leaves the memory fragmented.

Source: How Does Garbage Collection Work in Java? | by Alibaba Tech

Copying Algorithm

In this algorithm, the Heap Memory is divided into two spaces, called From and To. All new objects are created in the From space. When it’s full, the GC copies the live objects into the To space and reclaims the From’s space used by dead objects.

The Copying algorithm deals with Mark-and-sweep’s fragmentation problem, since it copies the live objects and puts them side-by-side in the To space. However, by solving that problem, a new one arises: the Heap Memory is divided in two. That causes the GC cycle to run more often than it would if all the Heap Memory were being used for new objects.

Source: How Does Garbage Collection Work in Java? | by Alibaba Tech

Mark-and-Compact Algorithm

This algorithm is an extension of Mark-and-sweep.

It doesn’t leave the memory fragmented when the space is reclaimed. Instead, it sorts the memory by moving all living objects to one side and then it reclaims the marked space. Mark-and-Compact also solves the problem raised by Copying Algorithm, because it uses the whole Heap Memory.

Even if it solves both of the problems created by the preceding algorithms, it comes with a price. Every time the GC cycle runs it has to sort out every reference address. That makes this algorithm less efficient than the others.

Source: How Does Garbage Collection Work in Java? | by Alibaba Tech

We’ve seen that each algorithm has its strengths and weaknesses. How can we go about choosing one among them?

Why should we choose one? Wouldn’t it be better to use them all?

Generational Collection Algorithm

Enters Generational Collection Algorithm. It combines all the above algorithms and applies them in the scenario they best fit.

Every object created will have an “age”. They get “older” every time a Garbage Collection Cycle happens. Java separates them into two parts in the Memory Heap:

  • Young Generation, divided in:
    • Eden Space
    • Survivor Space
  • Old Generation, divided in:
    • Tenured Space

Eden Space

Newly created objects are added to the Eden Space, that’s why the Mark-and-sweep Algorithm is used in this scenario.

Copying Algorithm wouldn’t be the best fit here because it divides the memory in two and the minor Garbage Collection Cycle would run often. Minor GC cycles are quicker than Major ones and run in Young Generations. Mark-and-compact wouldn’t be the best fit here either, because most objects are short-lived. The large number of dead objects would force the GC to change the reference of the memory’s addresses constantly.

Survivor Space

After a minor GC cycle, all living objects in Eden Space will be moved to Survivor Space. It stores objects that have lived for a while, so it doesn’t require as much space as Eden Space does, that’s why the Copying Algorithm suits here.

Objects coming from Eden Space are stored in Survivor’s From space. If From is full, the objects will go directly to Tenured Space.

After another minor GC the living objects from Eden Space and From space (Survivor Space) are moved to the To space.

This last part deserves attention. The living objects from Eden will be moved to the To Space, not the From Space. This happens because the minor GC will also reclaim dead objects from the From Space, leaving it fragmented. So it might not have enough space for the objects coming from Eden. In the next cycle the objects will be moved to the From Space, including the living objects from To Space until they reach a certain threshold (15 by default) and are moved to the Old Generation.

It’s worth saying that every Garbage Collection is a “Stop-the-World” event. This means all threads are stopped until the Garbage Collection completes.

Tenured Space

After multiple Minor Garbage Collection Cycles, the objects from Young Generation will be moved to Tenured Space (Old Generation).

Since those objects are long-lived and their spaces will hardly be reclaimed, the Mark-and-Compact algorithm will be applied here.

When the Tenured Space is filled up a Major GC will be triggered.

Conclusion

Memory management is tough (ask the C++ people), but we forget about that because the work is done for us automatically. However, every automation comes with a price.

Depending on the application’s size the GC might have a major impact on its performance, due to the Stop-the-World event. It’s up to your team to tweak the GC to fit your needs and achieve the expected performance.

Where can you go from here?

Here are some good articles to learn more about Garbage Collection: