A Beginner's Guide to Clean Data

Practical advice to spot and avoid data quality problems. - Benjamin Greve

This is a free version of my book "A Beginner's Guide to Clean Data: Practical advice to spot and avoid data quality problems". If you like the content, feel free to buy this book on Amazon and/or leave a positive review there.


This book will help you to become a better data scientist by showing you the things that can go wrong when working with data - particularly low-quality data. A key difference between a junior and a senior data scientist is the awareness of potential pitfalls. The experienced data scientist will expect them, navigate around them and avoid costly iteration cycles. After reading this book, you will be able to spot data quality problems and deal with them before they can break your work, saving yourself a lot of time.

In the past six years of working in data science, I have made all the mistakes described in this book. Every time, it cost me hours, sometimes days to figure out what the problem was and to fix it. This type of iterative work is what data scientists mean when they talk about how they spend most of their time on data preparation. Yet, for some reason, the art of preparing data and ensuring a sufficiently high level of quality is largely ignored by textbooks, university programs, online courses and industry conferences. That's why I felt the need to write this book and share some of my experiences. It is the hands-on advice that I myself wish I had when I started my career as a data scientist.

About the author

My name is Benjamin Greve and I'm a data scientist, mathematician and clean data enthusiast from Germany. I teach machines how to solve complex tasks so that people can concentrate on the things that really matter.

To contact me, use one of the following channels:


Copyright (c) 2019 Benjamin Greve. All rights reserved.

Some of the advice from this book is opinion-based. If you disagree or your personal experience differs from mine, feel free to contact me on any of the above-mentioned channels. I'm constantly learning new things and I'd love to discuss all things data with you.

Last updated