Background
Whenever we are evaluating an algorithm, we always compare their complexity or the Big O.
But, do you know what is a Big O ? and what it means ?
In this article, we are going to discuss about that.
Complexity
Complexity is defined as the amount of computational resources (time and memory) that is required to run the algorithm.
In another word, it defines the efficiency of the algorithm.
We can examine computational complexity through Time and Space Complexity.
Time Complexity is the amount of computational time it takes for the algorithm to complete.Space Complexity is the amount of memory storage it takes to run the algorithm.
but most of the time, we will see algorithms being compared using time complexity.
So how can we describe complexities ? Or how can we describe the algorithm's efficiency ?
This is where Asymptotic Analysis comes in.
Asymptotic Analysis
In mathematical analysis, asymptotic analysis, also known as asymptotics, is a method of describing limiting behavior. —wiki
The popular notations in asymptotic analysis are :
- Big O notation ( O (expression) )
- Omega notation ( Ω (expression) )
- Theta notation ( Θ (expression) )
Big-O is used to describe the worst case complexity, which explains the maximum amount of time an algorithm requires considering all input values [ geeksforgeeks ].
Omega describes the best case senario.
and
Theta defines the average case.
Now that we know the definition, how do we find the limit ?
Calculating the Time Complexity of Insertion Sort
In this section, we will take Insertion Sort as an example.
Before examining the psudocode, there are something that you need to know.
Every line of code that gets proceeded require certain amount of time, usually we would assume this is a constant time or cost ( c i ).
To find the total cost of a certain line of code, we simply need to find the number of time the code will run, n, and multiply it by the cost :
\[\begin{aligned} TotalTime = n\times{c_i} \end{aligned}\]where i is the line of the code
Now, let’s take a look at the psudocode :
int n = nums.size();
for i in 2 to n { // array[1:n]
int target = nums[i];
int j = i - 1;
while (j > 0 && nums[j] > t) {
nums[j + 1] = nums[j];
j--;
}
nums[j + 1] = target;
}
Here is the cost and number of time each line will run :
line # | code | cost | # of run |
---|---|---|---|
1 | int n = nums.size( ); | c1 | 1 |
3 | for i in 2 to n | c3 | n |
4 | int target = nums[ i ]; | c4 | n - 1 |
5 | int j = i - 1; | c5 | n - 1 |
6 | while ( j > 0 && nums[ j ] > t) | c6 | \(\sum_{i = 2}^{n}m_i\) |
7 | nums [ j + 1 ] = nums[ j ]; | c7 | \(\sum_{i = 2}^{n}(m_i - 1)\) |
8 | j–; | c8 | \(\sum_{i = 2}^{n}(m_i - 1)\) |
10 | nums[ j + 1 ] = target; | c10 | n - 1 |
since the inner loop depends on the i value, therefore, we uses m i to represent the number of time the code will run.
Here is a great tutorial that explains why the number of runs are the way they are.
If you are curious on why are the loops run n times while their contents run n - 1 times, it’s simply because loops need an extra step to exit the loop.
So if we sums up the total time it takes to run a Insertion Sort, we get :
\[\begin{aligned}T(n) = &c_1 + c_3n + c_4(n - 1)+c_5(n-1)+\\&c_6\sum_{i = 2}^{n}m_i+c_7\sum_{i = 2}^{n}(m_i - 1)+\\&c_8\sum_{i = 2}^{n}(m_i - 1)+c_{10}(n - 1)\end{aligned}\]Now with the formula prepared, we can examine its performance in best-case, average-case and worst-case scenario.
Best Case
The best case scenario is obviously when the array is already sorted.
In this case, the inner loop will never enter thus $m_i = 1$ :
\[\begin{aligned}T(n)&=c_1+c_3n + c_4(n-1) + c_5(n-1)\\&+ c_6(n-1)+c_{10}(n-1)\end{aligned}\] \[\begin{aligned}T(n)= &(c_3+c_4+c_5+c_6+c_{10})n \\+ &(c_1+c_3-c_4-c_5-c_6-c_{10})\end{aligned}\]Hence, in the best scenario, the time complexity is Ω ( n ), aka linear function of n [ ref ].
Worst Case
In the worst case scenario, the array is in the reversed order. ie [ 6, 5, 4, 3, 2, 1 ] if we wish to get a accending order.
In this case, all values need to be shifted by $i$ times :
array | actions |
---|---|
[ 6, 5, 4, 3, 2, 1 ] | select 6 ( nums[ 0 ] ) as the inital value |
[ 5, 6, 4, 3, 2, 1 ] | insert 5 ( nums[ 1 ] ) by shifting 6 by 1 step |
[ 4, 5, 6, 3, 2, 1 ] | insert 4 ( nums[ 2 ] ) by shifting 5, 6 by 2 steps |
And so forth.
Hence, $m_i = i$ [ ref ].:
\[\sum_{i = 2}^{n}m_i = \sum_{i = 2}^{n}i = \sum_{i = 1}^{n}i - 1\\\sum_{i = 1}^{n}i - 1 = \frac{(n + 1)(n)}{2} - 1\]Similarly [ ref ].:
\[\sum_{i = 2}^{n}(m_i - 1) = \sum_{i = 2}^{n}(i - 1) = \sum_{i = 1}^{n - 1}(i - 1)\\\sum_{i = 1}^{n - 1}(i - 1) = \frac{((n - 1 + 1)(n - 1))}{2} = \frac{(n)(n - 1)}{2}\]As a result :
\[\begin{aligned}T(n) = &c_1+c_3n + c_4(n-1) + c_5(n-1)+ \\&c_6(\frac{(n+1)(n)}{2} -1)+c_7\frac{(n)(n-1)}{2}+\\&c_8\frac{(n)(n-1)}{2} + c_{10}(n-1)\end{aligned}\] \[\begin{aligned}T(n) = &n^2(\frac{c_6}{2} + \frac{c_7}{2} + \frac{c_8}{2}) +\\& n(c_3+c_4+\frac{c_6}{2} - \frac{c_7}{2} - \frac{c_8}{2} + c_{10})+\\&( c_1 - c_4 - c_5 - c_6 - c_{10})\end{aligned}\]From this, we can express the worst case runtime as a quadratic fucntion:
\[an^2+bn+c\]As a result, in the worst-case scenario, Insertion Sort has a Big-O of O ( n2 ).
Worst case scenario is usually what we focused because it gives us an upper-bound to the runtime for any inputs, which is essential for real-time computation [ ref ].
Average Case
Average case analysis is seldom discussed because usually it is as bad as the worst case scenario.
This is true in the case of Insertion Sort. On average, half the of values are sorted, thus :
\[m_i = \frac{i}{2}\]which will also give a time complexity of Θ ( n2 ).
Space Complexity of Insertion Sort
Since Insertion Sort is a in-place algorithm, no extra space is needed, thus it will have a space complexity of O ( 1 ).
Summary
I hope you get a simple understanding on how complexity is calculated.
Further Readings
Coming Soon
About Post
- Author:Jimmy Liu
- Link:https://kuopingl.github.io//2022/10/23/algo-complexity/
- Copyright:Free to share and adapt, but remember to give proper credit(CC BY-SA 3.0)