XtremeCloud Data Grid Performance Guide

For use with XtremeCloud Data Grid 3.0.1.

Last Updated: 10-25-2019

Legal Notice

The text of and illustrations in this document are licensed by Eupraxia Labs under a Creative Commons Attribution–Share Alike 3.0 Unported license (“CC-BY-SA”). An explanation of CC-BY-SA is available here.

We are providing the URL here of the original documentation, published by Red Hat.

In accordance with CC-BY-SA, if you distribute this document or an adaptation of it, you must provide the URL for this original version by Eupraxia Labs.

Eupraxia Labs, as the licensor of this document, waives the right to enforce, and agrees not to assert, Section 4d of CC-BY-SA to the fullest extent permitted by applicable law.

Red Hat, Red Hat Enterprise Linux, the Shadowman logo, the Red Hat logo, JBoss, OpenShift, Fedora, the Infinity logo, and RHCE are trademarks of Red Hat, registered in the United States and other countries. Linux ® is the registered trademark of Linus Torvalds in the United States and other countries. Java ® is a registered trademark of Oracle and/or its affiliates. XFS ® is a trademark of Silicon Graphics International Corp. or its subsidiaries in the United States and/or other countries.

MySQL ® is a registered trademark of MySQL AB in the United States, the European Union and other countries.

Node.js ® is an official trademark of Joyent. Eupraxia Labs is not formally related to or endorsed by the official Joyent Node.js open source or commercial project.

The OpenStack ® Word Mark and OpenStack logo are either registered trademarks/service marks or trademarks/service marks of the OpenStack Foundation, in the United States and other countries and are used with the OpenStack Foundation’s permission. We are not affiliated with, endorsed or sponsored by the OpenStack Foundation, or the OpenStack community.

All other trademarks are the property of their respective owners.

Abstract

This guide describes performance tuning for XtremeCloud Data Grid-web 3.0.1.

Preface

This guide will give you information and tweaks about tuning XtremeCloud Data Grid-web performance (server mode only).

Generally speaking, it is unnecessary to adjust any values since we’ve already done the heavy lifting for XtremeCloud SSO. However, due to the flexibility of our Helm Charts (provided to subscribed customers) these values can be adjusted should further optimization be desired.

Note: XtremeCloud Single Sign-On is the only currently certified multi-cloud application to work with XtremeCloud Data Grid-web.

Chapter 1 - Capacity Planning

Data in XtremeCloud Data Grid-web is in a serialized form. The data size can be estimated using sophisticated tools like Java Object Layout (JOL) and the total amount of required memory can be roughly estimated.

Java Object Layout (JOL) is the tiny toolbox to analyze in memory object layout schemes in JVMs. It allows you to make an estimate how much memory the object takes. This allows to make a simplest but most efficient performance improvements.

Use the following formulas:

Equation 1.1. Total Data Set in Server Mode

Total Data Set = Number Of Entries * (Serialized Key Size + Serialized Value Size + 200 b (Overhead))

The term overhead is used here as an average amount of additional memory (e.g. expiration or eviction data) needed for storing an Entry in a Cache.

In case of Local or Replicated mode, all data needs to fit in memory, so calculating the amount of required memory is trivial.

Calculating memory requirements for Distributed mode is slightly more complicated and requires using the following:

Equation 1.2. Required memory in Distributed mode

Required Memory = Total Data Set*(Node Failures + 2)/(Nodes - Node Failures)

Where:

Total Data Set - Estimated size of all data
Nodes - The number of Kubernetes pods in the cluster
Node Failures - Number of possible Kubernetes pod failures (also number of owners - 1 )

The calculated amount of memory should be used for setting Xmx and Xms parameters.

The JVM, as well as XtremeCloud Data Grid-web, require additional memory for other tasks like searches, allocating network buffers etc. It is advised to allocate no more than 50% of memory with living data when using XtremeCloud Data Grid-web solely as a caching data store, and no more than 33% of memory with living data when using XtremeCloud Data Grid-web to store and analyze the data using querying, distributed execution or distributed streams.

When considering large heaps, make sure there’s enough CPU to perform garbage collection efficiently.

Chapter 2 - Java Virtual Machine Settings

Java Virtual Machine tuning might be divided into sections like memory or GC. Below is a list of helpful configuration parameters and a guide how to adjust them.

2.1 Memory Settings

Adjusting memory size is one of the most crucial steps in XtremeCloud Data Grid-web tuning. The most commonly used JVM flags are:

-Xms - Defines the minimum heap size allowed.

-Xmx - Defines the maximum heap size allowed.

-Xmn - Defines the minimum and maximum value for the young generation.

-XX:NewRatio - Define the ratio between young and old generations. Should not be used if -
Xmn is enabled.

Using Xms equal to Xmx will prevent the JVM from dynamically sizing memory and will likely decrease GC pauses caused by resizing. It is a good practice to specify the Xmn parameter. This guarantees proper behavior during load peak since XtremeCloud Data Grid-web will generates a significant amount of small, short living objects.

2.2 Garbage Collection

The main goal is to minimize the amount of time that the JVM is paused. Consequently, Concurrent Mark Sweep (CMS) is the suggested garbage collector (GC) for XtremeCloud Data Grid-web applications.

The Concurrent Mark Sweep (CMS) collector is designed for applications, like XtremeCloud Data Grid-web, that require shorter garbage collection pauses and that can afford to share processor resources with the garbage collector while the application is running. Typically applications that have a relatively large set of long-lived data (a large tenured generation) and run on machines with two or more processors tend to benefit from the use of this collector. However, even with less processors available, this collector should be considered for any application with a low pause time requirement. The most frequently used JVM flags are:

-XX:MaxGCPauseMillis - Sets a target for the maximum GC pause time. Should be tuned to
meet the SLA.

-XX:+UseConcMarkSweepGC - Enables usage of the CMS collector.

-XX:+CMSClassUnloadingEnabled - Allows class unloading when the CMS collector is
enabled.

-XX:+UseParNewGC - Utilize a parallel collector for the young generation. This parameter
minimizes pausing by using multiple collection threads in parallel.

-XX:+DisableExplicitGC - Prevent explicit garbage collections.

-XX:+UseG1GC - Turn on G1 Garbage Collector.

2.3: Other Settings

There are two additional parameters which are suggested for use:

-server - Enables server mode for the JVM.

-XX:+ UseLargePages - Instructs the JVM to allocate memory in Large Pages. These pages
must be configured at the OS level for this parameter to function successfully.

2.4: Example Configuration

In most of the cases we suggest using CMS. However when using the latest JVM, G1 might perform slightly better.

32GB JVM:

-server
-Xmx32G
-Xms32G
-Xmn8G
-XX:+UseLargePages
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-XX:+DisableExplicitGC

32GB JVM with G1 Garbage Collector

-server
-Xmx32G
-Xms32G
-Xmn8G
-XX:+UseG1GC

Chapter 3 - Network Configuration

XtremeCloud Data Grid-web uses TCP/IP for sending packets over the network (for both cluster communication when using TCP stack or when communicating with Hot Rod clients, such as XtremeCloud Single Sign-On).

In order to achieve the best results, it is recommended to increase TCP send and receive window size (refer to you OS manual for instructions). The recommended values are:

send window size - 640 KB

receive window size - 25 MB

Chapter 4 - Number of Threads

Xtremecloud Data Grid-web tunes its thread pools according to the available CPU cores. Under Linux this will also take into consideration taskset / CGroup quotas. It is possible to override the detected value by specifying the system property infinispan.activeprocessorcount.

Note: Java 10 and later can limit the number of active processor using the VM flag - XX:ActiveProcessorCount=xx.

Chapter 5 - Number of Threads

The Hot Rod Server, used in our multi-cloud applications, uses worker threads which are activated by our XtremeCloud SSO requests. It’s important to match the number of worker threads to the number of concurrent client requests:

Hot Rod Server worker thread pool size:

<hotrod-connector socket-binding="hotrod" cache-container="local" worker-threads="200">
<!-- Additional configuration here -->
</hotrod-connector>

Chapter 6 - Cache Store Performance

In order to achieve the best performance, please follow the recommendations below when using Cache Stores:

Use async mode (write-behind) if possible
Prevent cache misses by preloading data

For a JDBC Cache Store:

Use indexes on id column to prevent table scans
Use PRIMARY_KEY on id column

Configure batch-size, fetch-size, etc

Chapter 7 - Hints for Program Developers

There are also several hints for developers which can be easily applied to the client application and will boost up the performance.

7.1: Ignore Return Value

When you’re not interested in returning value of the #put(k, v) or #remove(k) method, use Flag.IGNORE_RETURN_VALUES flag as shown below:

Using Flag.IGNORE_RETURN_VALUES

It is also possible to set this flag using ConfigurationBuilder

Using ConfigurationBuilder settings

7.2: Use Externalizer for Marshalling

XtremeCloud Data Grid-web uses Wildfly Marshalling to transfer objects over the wire. The most efficient way to marshall user data is to provide an AdvancedExternalizer. This solutions prevents Wildfly Marshalling from sending the class name over the network and preserves bandwidth:

User entity with an Externalizer:

Cache noPreviousValueCache =
cache.getAdvancedCache().withFlags(Flag.IGNORE_RETURN_VALUES);
noPreviousValueCache.put(k, v);

ConfigurationBuilder cb = new ConfigurationBuilder();
cb.unsafe().unreliableReturnValues(true);

import org.infinispan.marshall.AdvancedExternalizer;

public class Book {

final String name;
final String author;

public Book(String name, String author) {
this.name = name;
this.author = author;
}

public static class BookExternalizer
implements AdvancedExternalizer<Book> {

@Override
public void writeObject(ObjectOutput output, Book book)
throws IOException {
output.writeObject(book.name);
output.writeObject(book.author);
}

The Externalizer must be registered in cache configuration. See configuration examples below:

###### Adding Externalizer using XML

###### Adding Externalizer using Java

#### 7.3: Storing Strings Efficiently

If your strings are mostly ASCII, convert them to UTF-8 and store them as byte[] :

Using String#getBytes(“UTF-8”) allows to decrease size of the object

Consider using G1 GC with additional JVM flag - XX:+UseStringDeduplication. This allows to decrease memory footprint (see JEP 192 for details).

#### 7.4: Use Simple Cache for Local Caches

When you don’t need the full feature set of caches, you can set local cache to “simple” mode and achieve non-trivial speedup while still using Red Hat Data Grid API.

This is an example comparison of the difference, randomly reading/writing into cache with 2048 entries as executed on 2x8-core Intel® Xeon® CPU E5-2640 v3 @ 2.60GHz:

Table 7.1. Number of operations per second (± std. dev.)

@Override public Person readObject(ObjectInput input) throws IOException, ClassNotFoundException { return new Person((String) input.readObject(), (String) input.readObject()); }

@Override public Set<Class<? extends Book» getTypeClasses() { return Util.<Class<? extends Book»asSet(Book.class); }

@Override public Integer getId() { return 2345 ; } } }

GlobalConfigurationBuilder builder = … builder.serialization().addAdvancedExternalizer(new Book.BookExternalizer());

Cache type single-threaded cache.get(…)

single-threaded cache.put(…)

32 threads cache.get(…)

32 threads cache.put(…)

Local cache 14,321,510 ± 260,

1,141,168 ± 6,079 236,644,227 ± 2,657,

2,287,708 ± 100,

Simple cache 38,144,468 ± 575,

11,706,053 ± 92,

836,510,727 ± 3,176,

47,971,836 ± 1,125,

CHM 60,592,770 ± 924,

23,533,141 ± 98,

1,369,521,754 ± 4,919,

75,839,121 ± 3,319, ``` The CHM shows a comparison for ConcurrentHashMap from JSR-166 with plugable equality/hashCode function, which is used as the underlying storage in XtremeCloud Data Grid-web.

Even though we use JMH to prevent some common pitfalls of micro-benchmarking, consider these results only an approximation and should be optimized in a carefully planned and executed Load Testing and Performance Tuning process.