The Utrecht Housing dataset: A housing appraisal dataset
| Sieuwert van Otterloo |
Artificial Intelligence

Van Otterloo, S and Burda, P. 2025. The Utrecht Housing dataset: A housing appraisal dataset. Computers and Society Research Journal (2025), 1
DOI: https://doi.org/10.54822/QVHM1662
Abstract
This paper introduces a real-world dataset for analysing and predicting house prices. The dataset consists of actual data on the Dutch housing market collected in 2024 for a total of 153 houses in one city (Utrecht in The Netherlands).
The dataset incorporates diverse variables on individual houses, including property characteristics (e.g., house type, build year, geolocation, area, energy label) and market metrics (e.g., asking price, final price). The data was collected from two public sources.
The dataset has been created to help researchers and educators to demonstrate machine learning principles on several problem types. It can be used for classification (energy label and energy efficiency) and regression/ price estimation. There are ten original input features and one derived feature. The dataset can be freely used without restrictions under a Creative Commons license and is available via open data platform Kaggle.
Download dataset and paper
The Utrecht housing dataset is available on Kaggle via the following link: https://www.kaggle.com/datasets/ictinstitute/utrecht-housing-dataset/data . The current version (2.0, published 2025) replaces earlier versions that were synthetic. The dataset is freely available under the Kaggle open data license.
The full paper can be downloaded here: Otterloo-Burda-Utrecht_Housing_dataset_CSR2025
Contents of the dataset
The dataset consists of data of 153 different houses in or around the Dutch city of Utrecht. Per house the following features are available:
- zipcode4, zipcode6, zipcode6id, housetype, lot-area, house-area, garden-size, rooms, bathrooms, x-coor,y-coor, buildyear, retailvalue, askingprice, energylabel, energyeff, valuationdate, street, subdistrict district, city, dist-from-train
The housing dataset can be used for several regression and classification problems. The chart below shows that house area can be used to predict retail value.
Possible use cases or challenges
A very typical prediction problem would be to try to predict retail price or asking price. These values must be chosen when listing a house for sale or making a bid. It is possible to achieve this by using linear regression or more advanced methods, with due assumptions. One can formulate many interesting variations on this main prediction problem, a few of these are listed below.
- Try making two separate models, one for appartement and one for woonhuis. Compare the performance of this approach against a combined model.
- Try making two separate models, one for energy-efficient houses and one for non-efficient houses. Compare the performance of this approach against a combined model.
- Use x-coor and y-coor to create new features, such as distance to schools, train station, parks, coffee factory etc. Try to find a correlation to house value for each distance.
- Try to make predictions using only zipcode4 and house-area. Then rank the zipcodes based on the effect of each zipcode on the predicted value. Try to see if the ranking changes when you add build year or garden size.
Another goal is to classify the energy efficiency of houses, specifically the energy label as an ex- ample of multi class classification, and energy efficiency (energyeff) as an example of binary classification. Suitable techniques would be logistic regression and decision trees.
How to cite this article
We recommend you to use the following code to cite the paper:
@article{Otterloo25, title={The Utrecht Housing dataset: A housing appraisal dataset}, author={Van Otterloo, S and Burda, P}, journal={Computers and Society Research Journal}, number={1}, year={2025}, note = {\url{https://doi.org/10.54822/QVHM1662}} }
It should look similar to this, depending on your template:
van Otterloo and Burda – The Utrecht Housing dataset: A housing appraisal dataset, Computers and Society Research Journal (2025), 1, https://doi.org/10.54822/BEWO3288

Dr. Sieuwert van Otterloo is a court-certified IT expert with interests in agile, security, software research and IT-contracts.