Privacy Policy XGBoost in Python (Quickly) Explained (3min Read) - Andrea Ciufo

XGBoost in Python (Quickly) Explained (3min Read)

Sharing is caring!

What is?

XGBoost is an algorithm used for supervised learning problems.

It means Extreme Gradient Boosting.

How It works?

You have to imagine a sequence of models and each model is trained from the error of its predecessor.

Where It is applied?

Classification And Regression Trees  is the base learner and you apply a gradient boosted trees.

Is an iterative process where in each iteration you train a new tree, based on past “informations”.

For example, you have N models, the simplified logical scheme is the following:

  • Train (X,y)
  • Tree1
  • Predict
  • Define the error r1=y1-y*1
  • Train (X,r1)
  • Tree 2
  • Predict
  • Define the new error r2=r1-r*1


  • Train (X, rn-1)
  • Tree N
  • Predict
  • Define the last error rn=rn-1-rn-1*

Here a personal draw to understand better:

Extreme Gradient Boosting Algorith Explained in his steps until convergence
Extreme Gradient Boosting Algorithm Explained in his steps until convergence

Library for Python

Sklearn (here the official documentation, read it if you have time! If you don’t find it!) has the specific class:

class sklearn.ensemble.GradientBoostingClassifier

When to use it?

There are several kaggle competitions that were won using XGBoost algorithm, which competition?

  1. Avito Challenge on “Predict demand for an online classified ad”- Binary Classification Problem  
  2. Otto Group Challenge “Classify products into the correct category” –Multi-label classification problem
  3. Rossman “Forecast sales using store, promotion, and competitor data “ –Regression Problem 

Dark Side

You must be careful about overfitting.

Gradient boosted trees are quick to learn.

To avoid Overfitting you can use shrinkage also called learning rate.

Applying a weighting factor for every new tree in our sequence is a validated way to slow learning in gradient boosting.

How in Python?

You can find a lot of tutorial here on the official skelearn documentation, the “core process” is the following:

To tune the shrinkage there is a specific argument inside the class:

“learning_rate : float, optional (default=0.1):

          learning rate shrinks the contribution of each tree by learning_rate”


Want you to go deeper?

Read this interesting Q/A on stack overflow and the first paper on Gradient Boosting at this link ,tough paper. 

If you liked or you find it useful, share it through social networks, with just one click you can raise an opportunity.
If you think something needs to be fixed, you found any typo, write me!

Thanks for reading my article!



Leave a Comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.