Volg ICTI

The Utrecht Housing dataset: A housing appraisal dataset

| Sieuwert van Otterloo | Artificial Intelligence

Van Otterloo, S and Burda, P. 2025. The Utrecht Housing dataset: A housing appraisal dataset. Computers and Society Research Journal (2025), 1

DOI: https://doi.org/10.54822/QVHM1662

Abstract

This paper introduces a real-world dataset for analysing and predicting house prices. The dataset consists of actual data on the Dutch housing market collected in 2024 for a total of 153 houses in one city (Utrecht in The Netherlands).
The dataset incorporates diverse variables on individual houses, including property characteristics (e.g., house type, build year, geolocation, area, energy label) and market metrics (e.g., asking price, final price). The data was collected from two public sources.
The dataset has been created to help researchers and educators to demonstrate machine learning principles on several problem types. It can be used for classification (energy label and energy efficiency) and regression/ price estimation. There are ten original input features and one derived feature. The dataset can be freely used without restrictions under a Creative Commons license and is available via open data platform Kaggle.

Download dataset and paper

The Utrecht housing dataset is available on Kaggle via the following link: https://www.kaggle.com/datasets/ictinstitute/utrecht-housing-dataset/data . The current version (2.0, published 2025) replaces earlier versions that were synthetic. The dataset is freely available under the Kaggle open data license.

The full paper can be downloaded here: Otterloo-Burda-Utrecht_Housing_dataset_CSR2025

Contents of the dataset

The dataset consists of data of 153 different houses in or around the Dutch city of Utrecht. Per house the following features are available:

  • zipcode4, zipcode6, zipcode6id, housetype, lot-area, house-area, garden-size, rooms, bathrooms, x-coor,y-coor, buildyear, retailvalue, askingprice, energylabel, energyeff, valuationdate, street, subdistrict district, city, dist-from-train

The housing dataset can be used for several regression and classification problems. The chart below shows that house area can be used to predict retail value.

 

Author: Sieuwert van Otterloo
Dr. Sieuwert van Otterloo is a court-certified IT expert with interests in agile, security, software research and IT-contracts.