Predicting the value of a house

Top 3% in a data science competition
pricing
prediction
Author

Jesse van Elteren

Published

July 26, 2020

Recently I’ve made my way into Kaggle. If it’s new for you, I highly recommend checking it out. Kaggle is a platform where organizations host data science competitions. They come up with a data science challenge, make the data available to Kaggle users, and many data scientist worldwide compete to get the highest score on the leaderboard. After a defined period the competition ends and the winner is awarded with a (monetary) price.

Participants also share their code (kernels), and have discussions on the data. This makes it an excellent platform to learn. The competitions can be a bit intimidating, since it can have extremely large datasets (100GB upwards), the objectives can be challenging (imaging, audio, text, combinations) and figuring out how to submit is not always trivial. But of course you can start with simpler competitions such as the classic Titanic example.

Since I’m quite familiar with tabular data I decided to give the housing competition a try. The training data consists of many features describing about 1500 houses and their selling price. After numerous experiments I ended up with a top 3% score on the leaderboard before throwing in the proverbial towel. The rush of inching up the leaderboard made it a great experience!

Afterwards I wrote about the main insights, learnings and questions

Next up: probably a imaging competition with Pytorch or fast.ai