Data export and import

Many software programs offer the option to import or export data. CSV files are usually the preferred format because they are simple, human-readable and easily accessible.

Analytics software like R has a very powerful CSV parser that allows you to import CSV files of any kind and account for several different parameters. You can specify the field separator, the row terminator, the decimal separator, the quotation character, the way characters in text fields are escaped or the file encoding, among other things. This allows you to import and export files in pretty much every CSV-like format.

The code block below shows the head of the read.table() function in base R with all the parameters that you can specify, along with their default values.

read.table(file, header = FALSE, sep = "", quote = "\"'",
    dec = ".", numerals = c("allow.loss", "warn.loss", "no.loss"),
    row.names, col.names, as.is = !stringsAsFactors,
    na.strings = "NA", colClasses = NA, nrows = -1,
    skip = 0, check.names = TRUE, fill = !blank.lines.skip,
    strip.white = FALSE, blank.lines.skip = TRUE,
    comment.char = "#", allowEscapes = FALSE, flush = FALSE,
    stringsAsFactors = default.stringsAsFactors(),
    fileEncoding = "", encoding = "unknown", text, skipNul = FALSE
)

However, most programs use parsers that are not this powerful and customizable. In many cases, they have fixed parameters, expecting for example that the field separator is always a comma and that quotation is always used for text fields. If you are lucky, the documentation is comprehensive enough to figure out what parameters the program's parser expects, but don't count on that. In that case, there is little you can do except rely on trial-and-error.

My main advice here is to become familiar with how your program works and learn the parameters it uses. Never just assume a certain format, always test your assumptions. When you are sharing a CSV file with someone, make sure to add all the information about the format that people will need to parse it. If you are writing a program that takes CSV files as input, document all your choices regarding the format so that people know what type of CSV your program accepts.

Takeaways:

  • Know the format that your CSV parser expects

  • When you give someone a CSV file, specify exactly what format you used

Last updated