NULL
FIRST_NAME != 'Alice'
or FIRST_NAME <> 'Alice'
to confirm that the first name of the person we are looking for is not Alice, we DO NOT write FIRST_NAME != NULL
or FIRST_NAME <> NULL
. This is due to the fact that NULL values have all kinds of funny behavior, especially when it comes to comparisons and joins, as we will explore later in this book. They can also lead to problems when you aggregate the data. Calculating the sum over a column where half of the observations have empty fields will give you a result that is pretty much useless. Calculating the average can still work okay, but only if the observations that have non-missing values are a representative sample of all observations., SUM(CASE WHEN CUSTOMER_ID IS NULL THEN 1 ELSE 0 END AS NULL_VALUES_CUSTOMER_ID
, SUM(CASE WHEN FIRST_NAME IS NULL THEN 1 ELSE 0 END AS NULL_VALUES_FIRST_NAME
, SUM(CASE WHEN LAST_NAME IS NULL THEN 1 ELSE 0 END AS NULL_VALUES_LAST_NAME
DBC.COLUMNSV
[2], to generate the SQL code that will count the number of missing values for each column in your table. More specifically, it gives you the SELECT
part of your statement. Copy the resulting table into your SQL query tool, complete the statement and execute it. Occasionally, when the SUM
gets too big, you can get a numeric overflow. Just add a CAST(... AS BIGINT)
around the SUM
to avoid that problem.CAST(COUNT(*) AS FLOAT)
- so that the result is also a floating point number with the exact quotient rather than the result of integer division.DBC.COLUMNSV
. Usually, every database system will have such metadata tables. If you are working with another database system, just replace the metadata tables with the ones in your system.