When to Use Struct Instead of Class to Increase Array Performance by up to 90%?

When to Use Struct Instead of Class to Increase Array Performance by up to 90%?

I am using C#, but you can apply the same idea to many other languages. I also want to be honest and say that a 90% boost is not the case in all situations. It can happen when you work with simple data structures in big arrays that you reused, reorder, or repopulate a lot.

Taking Advantage of The CPU Cache

I will not go into too much detail about how the CPU cache works, but I will give you a short explanation that you can use as a jumping-off point to do more research yourself.

The CPU has a cache that it can use to access data much quicker than from RAM. To speed things up, the CPU tries to predict what memory you will use next and move that into the cache. When using a Class array, what you get is an array of references that point to RAM positions where the actual data is stored. The CPU must therefore leave the cache to get it. Because Struct is not a reference type, the data itself goes to the cache. In theory, the CPU does not have to leave the cache and can operate a lot faster. Of course, it depends on many factors, including which CPU cache we are using, but generally, cache reading speeds are between 10 to 100 times faster than RAM.

Nowadays, the CPU is very smart, and combined with a good compiler having references to the RAM is not that big of a deal. If the array is created and filled in a structured way, they can find ways to optimize it anyway. I know some people who are going to say something a little like this “We as programmers should never try to help the CPU or compiler at the language level of C#. A good compiler should always be able to allocate memory and optimize better than we can.” Generally, this is true, but also a slightly naive point of view. The CPU and compiler are not all-knowing. In most cases, they indeed know best, but because we as programmers have a better understanding of what our code is used for, we can sometimes help guide the CPU in the right direction. This is an example of that.

Over time, as we reorder and repopulate our Class array, the data will no longer be tightly packed together. To simplify, this happens because Class is a reference type and can be placed more “freely” in memory. As our data gets more fragmented, it is challenging for the CPU to move it into the cache. The CPU moves data in chunks, so having relevant data close to each other is best. We don’t have this problem with Struct arrays. When creating a Struct array, we get a fixed amount of memory that fits the exact amount we need. We can work on the array without fragmentation. All our data is kept inside the allocated memory and close to each other.

When Not to Do This?

Let’s start by looking at some examples of when not to use Struct instead of Class.

You are working with small arrays. You need a reasonably big array for it to be measurable.
You have too big pieces of data. The CPU cannot cache enough of it, and it ends in RAM.
You have reference types like String in your Struct. They can point to RAM just like Class.
You don’t use the array enough. We need fragmentation for this to work.
You are using an advanced collection like List. We need fixed memory allocation.
You are not accessing the array directly. If you want to pass the data around to functions, use a Class.
If you are not sure, a bad implementation can be worse than just keeping to a Class array.
You still want Class functionality. Do not make hacky code because you want both Class functionality and Struct performance.

When to Use This?

Whether or not this is the right solution for you depends on what you are working on and how your data is structured, but here are some examples of systems where it could make sense.

Water simulation where you have a big array of velocity vectors.
City building game with a lot of game objects that have the same behavior. Like cars.
Real-time particle system.
CPU rendering using a big array of pixels.

A 90% boost is a lot, so if it sounds like something for you, I highly recommend doing some tests yourself. I would also like to point out that we can only make assumptions based on the industry norms because we are down at the hardware level. Some CPUs may have great results, while others may not show any difference. The framework and language you use can also have a significant impact, as well. Just keep this in mind.

Benchmark

First, I am benchmarking arrays that are filled in order with no use before running the test. Then I benchmark with arrays shuffled 100%, 50%, 25%, 10%, and 1% to simulate use over time.

The benchmark ran on an “Intel Core i5-9600KF Coffee Lake S, 6 core, 3.7 GHz” at 50 million iterations.

As we expected, there is little difference between Struct and Class arrays when they are both fresh, and everything is in order. When we begin to shuffle, it is clear that Struct is faster than Class. At 1% shuffle, it takes Class over double as long, and at 100%, it takes ten times as long.

Code

Here is the code I used to run the benchmark. If you don’t want to shuffle the arrays remove lines 12 and 29. To change how much of the array is shuffled, adjust the second parameter to the percentage you want.

using System;
using System.Diagnostics;

internal class Program
{
    private const int iterations = 50000000;

    private static void Main(string[] args)
    {
        {
            StructData[] array = new StructData[iterations];
            ShuffleArray(ref array, 100);

            Stopwatch stopwatch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; i++)
            {
                array[i].value++;
            }
            long time = stopwatch.ElapsedMilliseconds;
            Console.WriteLine($"Struct: {time}");
        }

        {
            ClassData[] array = new ClassData[iterations];
            for (int i = 0; i < iterations; i++)
            {
                array[i] = new ClassData();
            }
            ShuffleArray(ref array, 100);

            Stopwatch stopwatch = Stopwatch.StartNew();
            for (int i = 0; i < iterations; i++)
            {
                array[i].value++;
            }
            long time = stopwatch.ElapsedMilliseconds;
            Console.WriteLine($"Class: {time}");
        }

        Console.ReadLine();
    }

    private static void ShuffleArray<T>(ref T[] array, int percent)
    {
        Random random = new Random();
        int jump = 100 / percent;
        for (int i = 0; i < array.Length / jump; i++)
        {
            int index1 = i * jump;
            int index2 = random.Next(array.Length / jump) * jump;

            T tmp = array[index1];
            array[index1] = array[index2];
            array[index2] = tmp;
        }
    }
}

internal class ClassData
{
    public int value;
}

internal struct StructData
{
    public int value;
}

Final Thoughts

Even though you might not be able to implement this in your own project, I still think there is something to get out of this post. As programmers and problem solvers, we like to make things abstract, but sometimes it is good to take a step back and have a look at how things work on a more practical level, like with the CPU cache.

Cover photo by Harley-Davidson on Unsplash