Python Tutorial: Public vs Private leaderboard

Want to learn more? Take the full course at at your own pace. More than a video, you’ll learn hands-on coding & quickly apply skills to your daily work.

In the previous lesson, we prepared and saved our first submission to a .csv file. Now, we will talk about how Kaggle processes the submissions.

Each competition specifies a single metric that is used to rank the participants. The better metric value our model achieves, the better the position we obtain. So, our goal in the competition is to build a model that optimizes the metric given. Here is a list of metrics most frequently used in the competitions and types of problems they appear in.

While preparing the submission, we have to make predictions for all the observations in the test set. However, Kaggle internally splits test data into two subsets: Public and Private. This split is unknown and it is the same for all the participants. During the competition, we could see the results and standings on the Public test data. The Private test data is only used to determine the final standings at the end of the competition.

In the previous lesson, we prepared the submission file and wrote it to the disk. Now, we can go to the competition website and upload our submission. Usually, competitions have a limit of about 5 submissions available per day.

Once we submit our file, Kaggle internally calculates the competition metric on the whole test set but shows the result only on the Public part. So, we see the standings on the so-called ‘Public Leaderboard’ (denoted as LB). On the other hand, the Private Leaderboard score is unknown until the competition deadline. For example, if we’ve submitted a file called submission_1.csv and the competition metric is Mean Squared Error, then we will know the result only on the Public Leaderboard.

As long as we could track the results only on the Public Leaderboard, we could potentially overfit to it. So, what is overfitting? Suppose we’re developing a Machine Learning model and measure the error rate on both train and test data. While increasing the model complexity, the train error generally goes down. It happens because the model learns the train data so well, that it performs great on it with very little error. However, test error at some point could go up.

It’s exactly the starting point of the overfitting. From this moment the model finds some very specific dependencies in the train data (lowering its error), that are unable to generalize well (increasing the test error).

The same could happen with Public and Private Leaderboards. If we only look at the results on the Public Leaderboard, we could potentially overfit to it. Thus, our Private Leaderboard score will be considerably worse together with our final place in the competition. To beat the overfitting in both real-life projects and competitions, we need to use a good validation strategy. We will talk about it in the next chapter.

The difference between Public and Private leaderboards standings is called a ‘shake-up’. The size of the shake-up is highly different from one competition to another.

On the left image, we see a competition example with a small shake-up. The movements in Private Leaderboard were about 2-3 places up and down compared to the Public Leaderboard.

While the image on the right represents a competition with a huge shake-up. The winner of the competition had only 1485th place on the Public Leaderboard.

Now you’re aware of overfitting and the difference between Public and Private Leaderboards in Kaggle competitions. Let’s explore the overfitting of practice!

Post Author: hatefull