Do Not Use Elbow Method (Seriously)
The fast-paced growth in data knowledge places more responsibility on data analysts, who need to get the most from data-driven insights, which is crucial for crafting effective strategies. As you can understand from the title, we will discuss clustering using k-means. The main aim of k-means is the determination of the optimal number of clusters, which directly impacts the granularity and relevance of the segmentation.
Most of us use the elbow method first, then move to other techniques for double-checking or better solutions. We cannot ignore its simplicity and intuitive graphical approach. However, while handling high-dimensional and complex datasets such as user behavior tables, demographic information, and combined marketing metrics, the elbow method may reveal limitations. It may not perfectly capture the data distributions. Consequently, more techniques have been developed to overcome these challenges.
Let Me Remind You The Elbow Method
The Elbow Method is a straightforward way to help decide how many clusters you should use when grouping data using k-means clustering. Let me explain briefly who does not remember the exact working mechanism of this method. The elbow point suggests the best number of clusters to use.

In a more technical way, we can address this technique with the following explanation: you run the clustering algorithm multiple times, the model increases the number of clusters each time, let’s say from 1 up to 10. For each number of clusters, model calculate a value called inertia, which measures how well the data points fit within their assigned clusters. The lower the inertia means the better solution for the model. Then plotting these inertia values on a graph, with the number of clusters on the x-axis and the inertia on the y-axis. As you increase the number of clusters, the inertia will naturally decrease because the data points are being grouped more specifically.
How Much Do You Believe In Your Elbow Method?
To understand its competence, I used Python to generate four different datasets. These datasets have different cluster levels like 3, 5, 10, and 20, with only two labels (x, y). You can see the scatter plots of each of them below; there are clusters clearly distributed from three to twenty. Since you can find the Python codes for each method, there will be no code examples below.

Below are the charts that show the results obtained from the elbow method’s code snippet. In each run, the cluster range is increased to find the best fit for the larger clustered datasets like 10 and 20. Even with the adjustments I made to the snippet, it did not go beyond a size of 5. Before examining other techniques, let’s not be prejudiced against the elbow method; however, I would like to warn you that at this point, you will see the elbow method has a limited field of application.

Which Method You Should Use Instead?
TL;DR: There are four different k-means techniques presented below for finding the best solution. We will discuss which one is the best at the end. For now, you will see the non-technical explanations and the plots of the results.
The first method is the Silhouette Method. It measures how similar each data point is to its own cluster compared to other clusters. For each data point, it calculates a score between -1 and 1. If the score is close to 1, it means it is well matched to its own cluster and if it is close to -1, it means it is assigned to the wrong cluster. Then averaging these scores for all data points at different numbers of clusters, gives the highest average silhouette score for clusters.

The second method is the Calinski-Harabasz Index. It evaluates how well the clusters are distributed and how close they are. It calculates a ratio of the variance between clusters to the variance within clusters. A higher score means that clusters are dense and well distributed from each other.

Davies-Bouldin Index calculates the average similarity between each cluster and its most similar one. The index is considering both the distribution within clusters and the distance between clusters. A lower score shows better clustering because it means clusters are compact and distinct from each other.

The last method is far more statistical than others. Gap Statistic method compares the clustering results with those expected from a random distribution of the data. It calculates the difference between the cluster variation for different numbers of clusters and that expected under a null reference distribution. The optimal number of clusters is the one that maximizes this gap.

What Now?
All these results indicate that the elbow method has the lowest accuracy. As seen in the given charts, the number of clusters determined by each technique is close to each other. The graphs below show all the clusters in a column layout for the related cluster levels and techniques.
After evaluating them all together, we find that the Davies-Bouldin Index and Silhouette Score have limitations for higher numbers of clusters. On the other hand, the other two provide closer clusters for each dataset. Even though the four datasets cannot provide higher accuracy, it is evident that the Gap Statistic and Calinski-Harabasz Index give the best results. The analysis recommends that we use these two when defining the clusters in the data.


Use Cases
As we discussed at the beginning, clustering is one of the most important steps in understanding user behavior and more. For example, it is used in customer segmentation, market basket analysis, churn prediction, and even recommendation systems.
It is essential not to underestimate the impact of the clusters in your data. Additionally, combining different metrics and considering multiple dimensions can enhance the clustering. By using various features and exploring their relationships, it becomes possible to boost digital marketing performance, remarketing and targeting strategies.