Today, we will talk about assessing algorithm complexity and clearly demonstrate how this complexity affects the performance of the code.
In the last article, we discussed code benchmarking and how to use it to evaluate the performance of code in .NET. In this article , we will focus on assessing the complexity of algorithms. And to make everything clear, together we will try to assess the algorithms used to solve a common interview task at Google.
There are many such tasks on platforms like LeetCode, CodeWars, and others. Their value lies not in learning various sorting algorithms that you may never write in practice, but in understanding typical problems you may encounter during software development.
Why assess algorithm complexity, and what methods exist?
Understanding algorithm complexity is important because:
More often, the focus is on assessing the algorithm by time (time complexity) - how much time it will take to execute. The execution time depends on the number of elementary operations that the computer performs.
For each algorithm, several complexity assessments can be made:
Big O (O(f)) - allows you to assess the upper bound on the complexity of algorithms. This is the ratio of the amount of input data for the algorithm to the time in which the algorithm can process it. In simple terms, this is the maximum execution time of the algorithm when working with large amounts of data.
Omega (Ω(f)) - allows you to estimate the lower bound of complexity - how long the algorithm will take to run in the best case.
Theta (Θ(f)) - allows you to get a “dense” complexity estimate, that is, where the speed of operation in the worst and best cases will be proportional to one function. This is not applicable to all algorithms.
IT companies focus on Big O because it shows how performance scales with more input data. The other types aren’t used as often.
The function that limits the complexity is indicated in brackets after O. In this case, n is the size of the input data.
For example, O(n) means complexity grows linearly. In this case, the execution time of the algorithm increases in direct proportion to the size of the input data.
If you imagine a graph of common algorithm complexities, it will look like this:
If we divide the complexity into zones, then the complexities of the O(log n), O(1) or O(C) type can be classified as the "Excellent" zone. Such algorithms, regardless of the volume of data, will be executed very quickly - almost instantly.
Algorithms with O(n) complexity can be classified as the "Average" zone - their complexity grows predictably and linearly. For example, if your algorithm processes 100 elements in 10 seconds, it will process 1000 in about 100 seconds. Not the best result, but predictable.
Algorithms from the red zone with complexities of O(n^2) and higher are difficult to classify as high-performance. But! Here, everything strongly depends on the volume of input data. If you are sure that you will always have a small amount of data (e.g., 100 elements), and it will be processed in an acceptable time for you, then such algorithms can also be used. But if you are not sure about the constancy of the data volumes (10,000 elements may come instead of 100), it is better to think about optimising the algorithms.
It’s important to note that the time complexity assessment is a theoretical assessment. It does not take into account internal optimizations and the processor cache; in reality, the picture may be different.
It's important to assess algorithm complexity in terms of memory too, not just time. This is often forgotten when studying the topic.
For example, to speed up calculations, you can create some intermediate data structure such as an array or stack to cache the results. This will lead to additional memory costs but can significantly speed up calculations.
Memory complexity is also called space complexity and is estimated using the same notation as for time — big O. For example, memory complexity O(n^2) means that in the worst case, the algorithm will not need more memory than proportional to n^2.
When assessing the complexity of algorithms by memory, a simplified model known as the RAM machine is used. In this model, reading or writing to any memory cell is treated as a single operation. This makes the time for both computational and memory operations equal, which simplifies analysis. It closely mirrors working with RAM but doesn’t account for processor registers, disk operations, or garbage collection.
We’ll provide examples in C#, although pseudocode would suffice. We trust these examples will still be clear and easy to follow.
Example 1:
Let's start with a simple algorithm for assigning a variable:
private void Alg1(int[] data, int target)
{
var a = data[target];
}
What is its complexity in time and memory?
The data array’s unknown dimensions might be misleading, but it’s incorrect to consider them when assessing the complexity of the internal algorithm.
Rule 1: External data is not taken into account in the complexity of the algorithm.
It turns out that our algorithm consists of only one line:
var a = data[target];
Access to an array element by index is a known operation with complexity O(1) or O(C). Accordingly, the entire algorithm will take us O(1) in time.
Additional memory is allocated only for one variable. This means that the amount of data that we will transfer (doesn't matter 1,000 or 10,000) will not affect the final result. Accordingly, our memory complexity remains O(1) or O(C). Such in-place algorithms may use extra memory, but its size isn’t tied to the input data volume.
To simplify, we’ll write O(C) instead of O(1), as C in this case is a constant. Whether it’s 1, 2 or even 100 - for modern computers this number is not important, since both 1 and 100 operations are performed at almost the same time.
Example 2:
Let's consider the second algorithm, which is very similar to the first:
private void Alg2(int[] data, int target)
{
var a = data[target];
var b = data[target + 1];
}
Does the input array size affect the number of operations in it? No.
And how about the allocated memory? Also no.
The time complexity of this algorithm could be estimated as O(2*C) — since we perform twice as many operations as in the previous example, 2 assignments instead of 1. But we have a rule for this too:
Rule 2: Omit constant factors if they do not affect the result dramatically.
If we take this rule into account, the complexity of this algorithm will be the same as in the first example — O(C) in time and O(C) in memory.
Example 3:
We will add to our algorithm a loop for processing data:
private int Alg3(int[] data)
{
var sum = 0;
for (int i = 0; i < data.Length; i++)
{
sum += data[i];
}
return sum;
}
As we can see, the number of operations in the loop directly depends on the amount of input data: more elements in data - more processing cycles to reach the final result.
At first glance, if we account for each line of code, we’d get something like this:
private int Alg3(int[] data)
{
var sum = 0; // O(C)
for (i=0; i < data.Length; i++) // O(n)
{
sum += data[i]; // O(C)
}
return sum;
}
And then the final complexity of the algorithm will be O(C)+O(n). But here again, a new rule intervenes:
Rule 3: Omit evaluation elements that are less than the maximum complexity of the algorithm.
Let us explain: if you have O(C)+O(n), the resulting complexity will be O(n) since O(n) will always grow faster than O(C).
Another example is O(n)+O(n^2). With such complexity, n2 always grows faster than n, which means we discard O(n) and only O(n^2) remains.
So, the complexity of our third example is O(n). In memory, it remains unchanged, it is O(C).
Example 4:
We calculate the sum of all possible pairs of values from the array:
private int Alg4(int[] data)
{
var sum = 0;
for (int i = 0; i ‹ data.Length; i++)
{
for (int j = 0; j ‹ data.Length; j++)
{
sum += data[i]*data[j];
}
}
return sum;
}
And to process it, we need two loops. Both of these loops will depend on the dimensionality of the input data.
Rule 4: Nested complexities are multiplied.
The complexity of the outer loop is O(n), and the inner loop is also O(n). According to the rule, these two complexities must be multiplied. As a result, the total complexity of the entire algorithm becomes O(n^2). In terms of memory, without changes - it is O(C).
A tricky example:
Let's feed a two-dimensional array to the input and calculate the sum of the values:
private int Alg4_tricky_case(int[][] data)
{
var sum = 0;
for (int i = 0; i < data.Length; i++)
{
for (int j = 0; j < data.Length; j++)
{
sum += data[i][j];
}
}
return sum;
}
Again we see nested loops - and if the input array has N x M elements, then the complexity is O(N * M), not O(n^2). Here, the time is proportional to N * M, making it linear at O(n).
Example 5:
And what do we see here? A loop - the complexity is already known to us - O(n). But inside, the Alg4() function from the previous example is called:
private int Alg5(int[] data)
{
var sum = 0;
for (int i = 0; i ‹ data.Length; i++)
{
sum += Alg4(data);
}
return sum;
}
If we recall its complexity of O(n^2), as well as rule 4, we will get that the complexity of this algorithm is O(n^3) with all its visual minimalism. With the time complexity unchanged.
Rule 5: Include in the assessment of the algorithm's overall complexity the assessment of all nested function calls.
Understanding the complexity of syntactic sugar methods like LINQ, basic collections and data types is crucial for predicting behaviour with larger data sets. Without this, you risk high algorithm complexity, which can lead to performance issues as data grows.
Here’s an example of a minimalistic algorithm that looks good and compact (this is by no means intended asa reference code), but can become a time bomb when working with large volumes of data:
private List<int> Alg6(int[] data)
{
List<int> dups = new List<int>();
for (var i = 0; i < data.Length; i++)
{
var currentItem = data[i];
var newArr = data.Skip(i + 1).ToArray();
var duplicates = newArr.Where(x => newArr.Contains(currentItem));
dups.AddRange(duplicates);
}
return dups;
}
What do we see here? Loop = O(n), Where = O(n), Contains = O(n), since newArr is an array.
So, the time complexity of this algorithm is O(n^3).
Additionally, ToArray() allocates extra memory to create a copy of the array at each iteration, meaning the memory complexity is O(n).
For our final assessment, let's consider a task commonly given in interviews at Google.
In short, the goal of the algorithm is to find any two numbers in a sorted array that sum up to the target number.
Solution 1: full pass through the array
public static int[] FindPairWithFullWalkthrough(int[] data, int target)
{
for (int i = 0; i < data.Length; i++)
{
for (int j=i+1; j < data.Length; j++)
{
if (data[i] + data[j] == target)
return new[] { data[i], data[j] };
}
}
return new int[0];
}
Time complexity: O(n2)
Memory complexity: O(C)
This is a straightforward solution. It’s not the most optimal, as the time complexity increases quickly with the number of elements, but we don’t consume much additional memory.
Solution 2: use HashSet
public static int[] FindPairUsingHashSet(int[] data, int target)
{
HashSet<int> set = new HashSet<int>();
for (int i = 0; i < data.Length; i++)
{
int numberToFind = target - data[i];
if (set.Contains(numberToFind))
return new [] { data[i], numberToFind };
set.Add(data[i]);
}
return new int[0]
}
We go through the array and add the elements we’ve already checked to the HashSet. If the HashSet contains the missing element needed for the sum, then we’re all set and can return the result. Adding and searching in the HashSet is done in O(C) time.
Time complexity: O(n)
Memory complexity: O(n)
This is just an example of how you can improve performance by allocating additional memory for intermediate structures.
Solution 3: use binary search
public static int[] FindPairUsingBinarySearch(int[] data, int target)
{
for (int i = 0; i < data.Length; i++)
{
int numberToFind = target - data[i];
int left = i + 1;
int right = data.Length - 1;
while (left <= right)
{
int mid = left + (right - left) / 2;
if (data[mid] == numberToFind)
{
return new[] { data[i], data[mid] };
}
if (numberToFind < data[mid])
{
right = mid - 1;
}
else
{
left = mid + 1;
}
}
}
return new int[0];
}
The binary search algorithm has a well-known complexity of O(log(n)). The O(n) comes from the outer loop for, and everything inside the while loop is the binary search algorithm. According to Rule 4, the complexities are multiplied.
Time complexity: O(n*log(n))
Memory complexity: O(C)
Solution 4: use the two-pointer method
public static int[] FindPairUsingTwoPointersMethod(int[] data, int target)
{
int left = 0;
int right = data.Length - 1;
while (left < right)
{
int sum = data[left] + data[right];
if (sum == target) return new[] { data[left], data[right] };
if (sum < target)
{
left++;
}
else
{
right--;
}
}
return new int[0];
}
We move the left and right pointers to the centre until they converge or a pair of values that suit us is found.
Time complexity: O(n)
Memory complexity: O(C)
This is the most optimal solution, as it doesn't use additional memory and performs the fewest number of operations.
Benchmarking solutions
Now, knowing the complexities of all four solution options, let's benchmark this code and see how the algorithms will behave on different data sets. The information from our previous article will guide us in this process. The results are as follows:
What do we see here?
For the baseline of the solution, we use the direct pass through the FindPairWithFullWalkthrough array. On 10 elements, it works in an average of 20 nanoseconds, ranking second in performance.
Only our most optimal solution option, FindPairUsingTwoPointersMethod, runs faster on a small amount of data.
The option with HashSet took 8 times longer to process small data sets and required additional memory allocation, which would eventually need to be managed by the Garbage Collector.
On 1,000 elements, the full pass-through solution (FindPairWithFullWalkthrough) started to noticeably lag behind the other algorithms. The reason is its O(n^2) complexity, which increases much faster than the other complexities.
On 10,000 elements, the full-pass algorithm took 9.7 seconds to complete, while the others finished in 0.1 seconds or less. Our most optimal solution found a result in just 3 milliseconds.
Why did binary search outperform HashSet? After all, in theory, O(n * log(n)) should be slower than O(n). The reason is that on real computers, not theoretical ones, memory allocation and deallocation don’t happen instantly - Garbage Collection is triggered every now and then. This is confirmed by the high standard deviation (StdDev) values in the HashSet benchmark.
We’ve learned how to assess the complexity of algorithms and how to use BenchmarkDotNet to trace the relationship between algorithm complexity and the execution time of the code. This will allow you to roughly estimate whether your code is efficient or not, even before running benchmarks.
It's easy to start working with us. Just fill the brief or call us.