YouTube video summary

Optimizing Java Applications on Kubernetes: beyond the Basics

Technology

06 Jan 202513 min summaryFrom InfoQ

Optimizing Java Applications on Kubernetes: beyond the Basics

Save to your library

Chat with this summary

Introduction and Overview of Java at Microsoft

The speaker works for Microsoft, specifically in the Java Engineering Team, and has previously worked for Oracle, and is a Java developer who uses a Mac. 38s
Microsoft has its own JDK and helps internal teams optimize Java workloads, with examples including Azure, Minecraft, and LinkedIn. 57s
Java is widely used within Microsoft, including in Azure's control plane, Minecraft servers, and LinkedIn's production environment, with hundreds of thousands of JVM instances running. 1m22s
The talk will cover four main topics: optimizing Java workloads on Kubernetes, container size and startup time, JVM defaults and ergonomics, and Kubernetes' impact on Java workloads, as well as a concept called AB performance testing in production. 3m5s
The talk is not an advanced JVM tuning class, and attendees are encouraged to seek out other resources for learning about JVM tuning, garbage collection logs, and other related topics. 3m50s
The talk aims to provide pointers and opportunities for attendees to learn more about optimizing Java workloads on Kubernetes. 4m28s

Optimizing Container Image Size

Reducing the size of container images is a topic of interest, with the goal of improving startup time and overall performance. 4m41s
Reducing the size of a container image is important, but it's not the most critical factor, especially when running infrastructure on a data center with high-speed internet, as the download speed may not be significantly impacted by the image size 4m52s.
However, reducing the size of an image is crucial for security reasons, as it minimizes the attack surface area by reducing the number of components shipped in the image, making it easier to patch and update, and decreasing dependencies in production 5m40s.
Removing unnecessary dependencies can also make the image easier to audit, which is essential for supply chain security and component governance 6m16s.
To reduce the size of an image, there are three main areas to focus on: the base image layer, the application layer, and the JVM runtime 6m33s.
For the base image layer, using slim versions of Linux distributions, such as Alpine, or building a custom Linux base image can be effective 6m52s.
For the application layer, only including necessary dependencies and breaking down the layer into separate layers for dependencies and application code can help reduce the image size 7m12s.
Using caching can also help, as it allows for reusing the dependencies layer if it hasn't changed, and only rebuilding the application layer 7m43s.
Running as a non-root user and using a JVM runtime that can shrink down to only include necessary components can also help reduce the image size 7m58s.
The JDK project's module mechanism can be used to create a JVM runtime with only the necessary modules for the application 8m16s.
Building a native image can also be an option for further reducing the image size 8m27s.
Examples of size differences between images include Ubuntu, dban, and Alpine, with Alpine being a smaller option, but requiring consideration of its musl libc library and potential compatibility issues 8m34s.
The JDK is compatible with Alpine, but there may be issues, so it's essential to test and ensure compatibility, and also note that commercial support from cloud vendors for Alpine might be limited 9m6s.
A classic Docker file for a Spring application can be improved by using a custom user instead of running as root, and by separating dependencies into different layers to optimize build and image download 9m30s.
Using the Spring Boot Maven plugin can automate the process of building an optimized Docker image 10m22s.
Creating a custom Java runtime with only the necessary bits can significantly reduce the JVM size, from 334 megabytes to 57 megabytes, and using GraalVM native image can reduce it further to less than 10 megabytes 10m27s.

Improving Startup Time

The JDK's class data sharing feature can improve startup time by half, and future projects like Project Loom and Project Crack aim to further improve startup time 11m4s.
Project Loom, led by Oracle, and Project Crack, led by AO Systems, are working on checkpoint restore technology, which can significantly improve startup time, but requires framework, library, and runtime support 11m34s.

JVM Defaults and Ergonomics

The JDK has default settings that can be optimized for better performance, and understanding these defaults is essential for optimizing Java applications 13m1s.
Java runtime stack has defaults that tend to be conservative and work for most applications, but these defaults can be optimized for specific use cases 13m9s.
JVM ergonomics can affect how the JVM runs, and environment settings can impact the JVM's behavior, such as the number of processors available 14m12s.
When running a Java application in Docker, the JVM may not see all available processors, depending on the Docker configuration 14m29s.
If a container is set to use a non-integer number of CPUs (e.g., 1.2 CPUs or 1200m in Kubernetes), the JVM will round up to the nearest whole number of processors 15m27s.
Memory settings can be tricky, and the JVM's garbage collector and heap size can be affected by the amount of memory available 15m58s.
The JVM's garbage collector can be selected based on the amount of memory available, and the heap size can be set to a percentage of the total memory 16m30s.
If the JVM is not properly tuned, it can result in a bad heap configuration, leading to performance issues 17m7s.
The G1 garbage collector is selected when the JVM has 2 CPUs and 1792 megabytes of memory 18m24s.
The JVM's default heap size is 50% for environments with less than 256 megabytes, and then it remains stable at 127 megabytes up to 512 megabytes, after which it is set to 25% of the available memory 19m40s.
The JVM's default settings were designed for shared environments, but in the container world, the JVM needs to be informed manually about the available resources 20m20s.
There are ongoing projects, involving companies like Microsoft, Google, and Oracle, to enhance the ergonomics and defaults of the JVM for container environments 20m40s.
Simply packaging a Java application as a JAR can result in wasted resources, and proper configuration is necessary to optimize performance 20m51s.
There are various garbage collectors in the JVM, including the ZGC, Shenandoah, and G1, and understanding their differences is important for optimization 21m0s.
The Absen GC is a garbage collector that does not collect anything, making it useful for benchmarking applications but not suitable for production environments 21m8s.
When running applications in the cloud, it's essential to consider that certain areas of memory, such as the Met space or code cache, will consume the same amount of memory regardless of the heap size 21m40s.
Configuring the JVM involves setting the heap size to 75% of the memory limit, and using memory calculators can simplify the process 22m20s.
Build packs, such as those provided by the Paketo project, can offer optimizations for containers and include memory calculators for building Java workloads 22m29s.

Optimizing Java Workloads on Kubernetes

Java applications on Kubernetes can be optimized by understanding JVM tuning, as setting xmx is not always necessary and can be calculated automatically with tools like Peto build packs 22m50s.
Horizontal Pod Autoscaler (HPA) is a common scaling solution, but it can be expensive and may not be the most effective approach, as it involves adding more computing power rather than optimizing resource usage 23m30s.
Vertical Pod Autoscaler (VPA) is an alternative scaling solution that allows pods to increase their resource allocation without restarting the container, but it requires the runtime to understand and take advantage of the additional resources 26m3s.
The JVM currently lacks the capability to effectively utilize VPA, but this is being worked on, and other runtimes like Rust may offer better performance in some cases 26m35s.
A company in Latin America migrated from Java to Rust to improve performance, but this could have been achieved through JVM tuning and resource redistribution, highlighting the importance of understanding JVM behavior 24m56s.
Google's Cube startup CPU boost is a feature that allows containers to access additional resources for a limited time, and can be used on Azure, offering a potential solution for optimizing JVM performance 26m51s.
Java applications on Kubernetes can initially require more CPU and memory to start up, but CPU usage can be reduced and stabilized after the JVM has completed its initial work, such as JIT compilation and optimization, allowing for more efficient resource allocation 27m16s.
The main issues people face when running Java applications on Kubernetes include limited memory and CPU throttling, which can cause latency and impact application performance 28m1s.
CPU throttling occurs when an application is given a limited amount of CPU time, but the JVM and other runtime components require additional CPU resources, leading to delays and increased latency 28m50s.
Setting a CPU limit can impact the JVM's performance, as it only allows the application to access a certain amount of CPU time within a specified period, and any additional CPU requests will be delayed until the next period 29m45s.
The JVM's garbage collector also requires CPU time to perform its tasks, which can further reduce the available CPU time for the application and increase latency 30m56s.
Understanding CPU throttling and its impact on the JVM is crucial to optimizing Java application performance on Kubernetes and ensuring that the application has sufficient resources to perform its tasks efficiently 29m28s.
The JVM's active processor count flag can be used to specify the number of processors available to the JVM, which can be different from the actual CPU limit, and this can be useful for IO-bound applications 32m0s.
Most microservices on Kubernetes are IO-bound, as they involve network requests and responses, and the active processor count flag can be used to optimize the JVM's thread pool size for such applications 32m40s.
Microsoft has provided recommendations for JVM settings on Kubernetes based on CPU limits, and these can be used as a starting point for optimizing JVM performance 32m59s.
When optimizing JVM performance, it's essential to have a clear goal in mind, such as throughput, latency, or cost, and to use the appropriate garbage collector for that goal 33m31s.
Resource distribution can significantly impact JVM performance, and reducing the number of replicas while increasing CPU and memory allocation can lead to improved performance and cost savings 34m14s.
A benchmark study showed that reducing replicas from six to two while increasing CPU and memory allocation improved throughput and latency while reducing costs 35m41s.
Resource redistribution can be applied to any language and is a viable strategy for optimizing performance and reducing costs in Kubernetes clusters 36m18s.
To optimize Java applications on Kubernetes, one approach is to merge a few pods to improve performance on those nodes, and then apply the rollout to more nodes with only one replica per node, which can be achieved by writing a Kubernetes operator 36m38s.
Another approach is to increase the node pool to taller VMs, increase the resource limits of the pods, and have only three replicas, which can provide spare resources for more workloads while keeping the cost the same 37m20s.
This approach allows for resiliency and provides the ability to do more with the same resources, making it beneficial from a cloud vendor perspective 37m48s.

A/B Performance Testing in Production

The concept of A/B performance testing can be applied to production performance, where a load balancer routes loads to different instances of the application configured differently, such as using different garbage collectors or resource limits 38m13s.
This approach can be used to test different configurations, such as smaller JVMs with more replicas or taller JVMs with lesser replicas, and can be easily implemented on Kubernetes 39m4s.
An example of A/B performance testing is using NGINX to route traffic to different instances of the application, and using round-robin or least connection patterns to distribute the load 39m11s.
Another scenario is to use different garbage collector configurations and tuning, such as using ergonomics default JVM, G1GC, and parallel GC, and configuring the deployment to use least connection or round-robin 39m41s.
By combining these approaches, it is possible to achieve optimal performance and resource utilization on Kubernetes, as shown in the Azure dashboard 40m7s.
A demonstration is shown using a CPU-bound emulation, specifically a prime factor test, to compare the performance of different deployments in a Kubernetes cluster 40m34s.
The test is run on different topologies, including 2x2 and 2x4, and the results are displayed in a dashboard, showing live metrics such as request rate and CPU usage 42m30s.
The demonstration highlights the ability to compare the performance of different deployments in production, including different garbage collectors, JVM tuning flags, and parameters, without affecting the application's functionality 43m43s.
The test is run on a container in a cluster, and the results are displayed in a dashboard, allowing for real-time comparison of the performance of different deployments 41m41s.
The demonstration also shows how to define roles for different deployments using environment variable names, which can be used to differentiate between deployments in the dashboard 43m4s.
The importance of testing in production is emphasized, as it allows for a more accurate evaluation of performance under real-world conditions, which may not be replicable in a lab environment 44m6s.
The demonstration is run on a live cluster, and the cost of running the demo is mentioned, highlighting the importance of efficient resource usage 44m18s.

Key Takeaways and Future Directions

The main takeaways for optimizing Java applications on Kubernetes include reducing the size of container images, focusing primarily on security rather than size, and addressing potential issues with loading images into nodes. 45m37s
Startup time can be optimized by utilizing JVM features such as Class Data Sharing, which is available in Java 11, 17, and 21, and evaluating Project Crack and Project Loom for modernization. 46m18s
Understanding the runtime defaults and capabilities of the JVM is crucial, and observing memory, CPU, garbage collection, and J compilation in production can help identify areas for improvement. 46m39s
It is essential to understand the impact of resource constraints on the runtime stack and ensure sufficient resources are allocated for proper behavior. 46m59s
Horizontal scaling is not a silver bullet, and vertical scaling should also be considered to optimize performance. 47m16s
Performance 20 in production is expected to be a significant focus area, and utilizing staging environments for testing can be beneficial. 47m22s
Microsoft is researching the addition of CRA support in the OpenJDK distribution, and while there are some complexities, frameworks like Spring have implemented checkpoint restore flows. 48m3s
Framework teams are required to build capabilities into the application framework for certain features, and efforts are being made to explore this possibility 49m10s.
JSR (Java Specification Request) does not specifically focus on enhancing the JVM for performance issues, but there are ongoing projects by Google, Microsoft, and Oracle to make the JVM heap dynamic, allowing it to grow and shrink as needed 49m29s.
A dynamic JVM heap would enable in-place scaling and vertical scaling capabilities to be taken advantage of by the JVM 49m51s.
Oracle is working on the ZGC (Z Garbage Collector) to address issues with garbage collectors defining memory areas and managing objects 50m5s.
Oracle is also working on adaptable heap sizing for the ZGC, while Google has done work on the G1 GC, and Microsoft is exploring serial GC for dynamic heap management 50m17s.

Made with Recall · in 3 seconds

Get a summary like this for anything you read, watch or save.

Recall summarizes any link you paste, then keeps it in your personal library so you can search, chat with it, and never lose a key idea again.

YouTube videosArticlesPodcastsPDFsAnything else

Save this summary

Keep it in your library.

Save to your library

Browse all from InfoQ →

Why We Deprecated Google Analytics (And Built a System 3x Cheaper)

Why We Deprecated Google Analytics (And Built a System 3x Cheaper)

YouTube05 Jul 2026

Craig McLuckie on Culture as a Team's Operating System in the AI Era

Craig McLuckie on Culture as a Team's Operating System in the AI Era

YouTube15 Jun 2026

Netflix Engineering Director: Why Code Scales Systems, But Clarity Scales Orgs

Netflix Engineering Director: Why Code Scales Systems, But Clarity Scales Orgs

YouTube08 Jun 2026

Why Scaling Teams Spikes Human Latency (And How to Fix It)

Why Scaling Teams Spikes Human Latency (And How to Fix It)

YouTube07 Jun 2026

How AI Erased the Software Implementation Bottleneck (90% Shipped Code)

How AI Erased the Software Implementation Bottleneck (90% Shipped Code)

YouTube02 Jun 2026

Requirements Analysis for Architects: A Conversation with Sonya Natanzon

Requirements Analysis for Architects: A Conversation with Sonya Natanzon

YouTube02 Jun 2026

Ready to get started?

Save, summarize and chat with your content.

IT'S FREE

No credit card required · 30 Day Refund on Premium · 24 Hour Support

Recall web app on laptop, personal AI knowledge base for summarizing and chatting with your content