As soon as I started working with Java I found some curious things. I’ve never evolved with a language that requires thorough care with memory as well as a language where it is needed to configure virtual machine resources and everything else. That is not criticism, maybe I’ve never been so deeply into a programming language to need to understand exactly how their core persists data into memory. I worked with Ruby, PHP and Go (for a small period of time), as well I have fun with Python sometimes and, at least from what I know, they do not have any complex memory management system.
The thing is that last month I was working on an application that will receive a huge amount of requests and sometimes I noticed that the application instance restarts without any clear problem so I went to learn more about how Java and JVM manage memory because the first thing I thought was a memory problem.
Why did I suspect memory problems? Well, I saw some GC Minor running slowly into our datadog metrics so I assumed that they are consuming too much time to clear the heap memory application, so slow that degraded application instance forcing our PaaS to an instance recycle.
Ok. But the problem itself it’s not the subject of this article. The problem forced me to understand more about core java memory management, and this blog purpose is to share with the internet my learnings and achievements, so let’s talk about java memory management system.
What is Java Virtual Machine and How It Works?
Java Virtual Machine is nothing more than a computer program that gets your java classes and transforms it into bytecode which will allow the PC to read and run any algorithm. The first thing I thought when I learned about JVM is that this is a compiler… But not exactly. JVM turns your Java code into bytecode which is interpreted, so Java is a semi-compiled language. So JVM acts more as a translator than a compiler.
We can abstract JVM in three main components:
- Class Loader: responsible to load all classes and save them into Metaspace memory, don’t care about this name for now, we’ll talk about Metaspace later;
- Runtime Data Areas: probably the most important component of JVM, the is where all data are stored;
- Execution Engine: where the class file turns into bytecode.
We will talk about metaspace in a while, for know I think it is enough to know that class loader is the component responsible to get our class files metadata and load all their data. The second component of JVM , and probably the most important, is Runtime Data Areas.
Runtime Data Areas
It is a logical division of disponible memory resources for JVM. What I mean is that when we configure a new java app, the app will run inside a Java Virtual Machine, so this virtual machine has a memory allowed to use and Runtime Data Areas is a logical division of the disponible resource.
Each area is responsible to store determined kinds of information:
- Stack: each java thread has its own stack memory space to store primitive or reference variables.
- The heap: here is where any object is stored, so every time you write the new keyword into your code, the JVM will persist the instance object inside the heap memory area;
- The metaspace: there is where all classes metadata will be stored;
- PC Counter: known as Program Counter Register, here, the JVM will store where each thread is working, this system allow concurrency in Java;
- Native Method Stack: also known as a C stack, is used when our code uses any native method.
There are a lot of subjects to talk about here, and probably I’ll need some more posts to cover everything. So this article will be a series of articles, and this is the first one. Here we will cover two runtime data areas with more details: the stack and heap.
What about stack?
Earlier I said that I had a San Francisco Giants baseball on my table. Through lifelong learning, my brain has successfully associated the name “San Francisco Giants baseball” with the corresponding object. Also, speaking about human brains, there some logical division on them too:
As we can see, the human brain is divided by science in some areas: frontal lobe, parietal lobe, occipital lobe, temporal lobe and cerebellum. For sure, each one of these areas has its own responsibility and works to solve a specific problem that the evolution of humans discovered. Hum… we were talking about JVM stack areas and now are we talking about human brains?
Yep. And I made this analogy because the JVM stack area is the area responsible to maintain and organize the relationship between a variable and its value. When I say its value, I mean the object reference or the value itself if it is a primitive value. As well my brain maintains a relationship that allows me to know that my San Francisco Giants Baseball is that object on my desk.
It is important to know that each thread has its own stack space on memory. Inside its space we can have two kinds of data stored: primitive data types which are the value itself directly stored into stack area or reference data types that are only the address to a previously created object stored into heap. We’ll talk about the heap memory later.
So… The stack area creates these links between the execution and stored objects which will allow the program to run well. This could lead to a lot of new questions, like: ok, I understand how java creates a relationship between the code that has been executing and the objects stored on the heap, but it will grow forever? who clean it? only objects are stored in the heap? etc…
This subject isn’t trivial and isn’t small. So each step we learn will lead us to more and more questions. These questions will help us to understand more about the language and its core. And this knowledge will lead us to be better software engineers. So these questions are important and I hope I could help you to answer them.
Not only object will be stored into heap
The first time I read about this I thought: “ok, the heap only stores objects”. Although, it is not exactly true. Because if you have a primitive data type inside the object it will be stored into the heap, obviously wrapped with the whole object.
1class Baseball {
2 int size = 12;
3 String color = "white";
4 String team = "giants";
5
6 public String pitch() {
7 return "threw the ball at a speed of 90mph";
8 }
9}
10
11public class Main {
12 public static void main(String[] args) {
13 Baseball ball = new Baseball();
14 System.out.println(ball);
15 }
16}
I mean that, in the example, the size isn’t a primitive data type, but it is stored into the heap memory. So what I want to share is that it isn’t necessary that every primitive data type will be only stored into stack memory.
It’s time to learn about frames
We haven’t talked about frames yet. The thing is that the stack does not store only variables, the stack frames too. A frame is a space allocated into the stack for each method into our object. Back in the baseball example we have a single thread to execute the whole code, this thread has its own stack space (we talked about earlier), inside the stack JVM will create a frame to the main method and push it to the stack. As we instantiate Baseball class, JVM will create this object into the heap and return the address which will be stored into the current frame.
Now let’s say we execute the method pitch after instantiating a Baseball object, JVM will create a new frame and push it to the thread’s stack space. This new frame will be now the current frame and a variable will be stored inside: “threw the ball at a speed of 90mph”.
As soon as the method execution is finished JVM will pop the current frame and back executing the previous frame, that will be now the current frame.
StackOverFlow
Now we know how the stack works, it’s simple to understand what leads us to a stack overflow error. Imagine a water bottle filling under a tap, but somehow the bottle is closed. The bottle is made of plastic, and as it fills up the pressure increases, at some point it will burst because it can’t hold any more.
So the stack overflow error is this exact case: our JVM does not have more resources to store new frames.
It was true, we will talk about heap spaces!
At this moment, I hope you have understood, at least superficially how the JVM works, as well about the stack. So we understood how JVM relates the variable to an object stored in a heap. But what is the heap itself? I know we didn’t deeply talk about heap, but I think you already know based on our earlier subject in this post.
The heap is the memory space reserved to store objects. So everytime we use the keyword new, we’re saying to JVM create a new object based on a class and persist the object into the heap memory. As we saw on figure 6. After saving the object, it will return the object address to be saved into the thread’s stack space.
The heap memory space is divided into two parts, the first one is the young generation space and the second one is the tenured space, known as old generation space. Each one has its own specific usage by the JVM and we will discuss both in this article. But first, let’s talk about the first one: the young generation.
We’re all young, or not anymore.
When I was younger I was able to be in front of a computer for hours and hours, mostly gaming. Nowadays I work about 8 hours looking at a screen and my eyes hurt. Probably because I’m 15 years older than that time.
What happens inside the heap is: when a new object is created the JVM will allocate this new object into the young generation and for each time the garbage collector runs (yes, this is new in this article) he will increment the age counter to all objects. Like when I was 15 and on my birthday the world incremented one more year to my age.
After some birthdays (garbage collector runs), he will move the objects from the young generation to the old generation.
Let’s dive into young space
But before we talk about old generations, let’s deep dive a bit into the young generation space and how it works. First of all I need to say: the young generation is divided too. There are two spaces into the young generation, the Eden space and the survivor space.
So, as we can see, the heap memory splits into two spaces: young and old generations. The young generations split into other two spaces: the eden and the survivor space.
The eden space will receive any new object, and now, we will start to understand how the garbage collector works: when the eden space is full, JVM will start running the garbage collector, which is the process responsible to manage the objects, moving and cleaning spaces to allow the program to allocate new objects.
So, when the eden space is full, the garbage collector will run. The first step of the garbage collector is to mark all the objects which have any active reference in the stack variables. After this the gc will move all live objects from the eden to the survivor and deallocate all objects that do not have marks, I mean, objects which do not have any reference. This is the Minor GC.
Well, minor and major gc are a world and I believe there is enough for this article, for sure there will be more articles about JVM and Memory Management System. If you don’t know why I created this blog, I invite you to read my first blog post. I hope you learned something.
https://docs.oracle.com/javase/tutorial/java/concepts/object.html
https://docs.oracle.com/javase/tutorial/java/nutsandbolts/datatypes.html
https://docs.oracle.com/cd/E13150_01/jrockit_jvm/jrockit/geninfo/diagnos/garbage_collect.html