# Trailing line breaks

Trailing line breaks are usually a minor problem, but it is useful to be aware of them. Some CSV files have a few additional blank lines at the end. This can confuse some CSV parsers into thinking that they are dealing with actual records where all the fields are empty. Importing a file like that can lead to records with many missing values and distort summary statistics.

```
'ID','FIRST_NAME'\r\n
1,'Adam'\r\n
2,'Billy'\r\n
3,'Elizabeth'\r\n
\r\n
\r\n
```

A bad parser could read the above CSV and interpret the content as

| `ID`   | `FIRST_NAME` |
| ------ | ------------ |
| 1      | Adam         |
| 2      | Billy        |
| 3      | Elizabeth    |
| `NULL` | `NULL`       |
| `NULL` | `NULL`       |

To avoid this, check the last few lines of the CSV file manually, for example with the command line tool `tail` on Linux or by looking at the CSV file in Notepad++, if it isn't too big. You can also check for NULL values in a column that should absolutely not be NULL. If you find such a record and all the other columns are NULL too, that is an indicator for trailing line breaks as well.

You can also read the file in as a plain text file and calculate the number of characters in every line. If there are N columns, there should be at least N-1 column separators, i.e. `,` or `;` characters. Any column with less characters is obviously an error and should be inspected and possibly excluded.

Takeaways:

* Inspect the last few lines of a CSV file using the tail command or Notepad++
* Check for rows where all values are NULL Count the number of column separators in each row. With N columns, there should be at least N-1 occurrences of the column separator
