Whatever the reason for the gap between expectation and reality may be, make sure to check your assumptions about the occurring values. If you have columns with nominal data, group by these columns to display all distinct values and compare them to what you expected. If you see an unexpected value, check how often this value occurs in the data. Is it a one-off thing, a single record in a million other ones that you can simply ignore or exclude, or is it a reoccurring problem? Check what the rest of the record, i.e. the other columns, looks like. Maybe these columns will give you a hint as to what's going on with these strange records. Check whether the combination of values in the other columns make sense. If there is only a single record or a small number of records with a particular combination of values, take a closer look. Is this a plausible combination of values?