At this point, I should also mention default values in databases. The SQL standard allows you to specify default values for a column in the CREATE TABLE
statement. Using default values for columns that should not be NULL seems like a good idea at first. All subsequent ETL steps, analyses or data products further down the pipeline can rely on the data without having to worry about missing data. The problem is that you can end up shooting yourself in the foot because you lose awareness of missing values. When default values are automatically inserted, you will not notice problems earlier in the data preparation that may lead to an explosion in the number of missing values. Instead of breaking, your data product will continue to work, but with errors that you are not aware of. In many cases, this is a much more dangerous scenario. Errors in preceding ETL steps could go unnoticed for weeks, months or years!