> For the complete documentation index, see [llms.txt](https://b-greve.gitbook.io/beginners-guide-to-clean-data/llms.txt). Markdown versions of documentation pages are available by appending `.md` to page URLs; this page is available as [Markdown](https://b-greve.gitbook.io/beginners-guide-to-clean-data/type-and-format-related-problems/whitespace-padded-strings.md).

# Whitespace-padded strings

All databases have some quirks. One of the more subtle and occasionally annoying quirks I have come across are fixed-length string types. This data type is often used for text columns where the data is known to have a constant or at least bounded length, for example when the column is a 32 character long alphanumeric hash.

The advantages of fixed-length string types are usually performance benefits and having a constraint that cannot be violated, which helps to create a dataset that is more resilient to human error.

When you know that all your text entries are of the same length, the use of these data types, if available, is strongly recommended. However, when you use a type like this and insert a string that is shorter than the fixed length, it will usually be padded with spaces. This becomes a problem when you are checking the value of a string since 'hello world' is not the same as `'hello world     '` with five spaces at the end. Of couse, this problem of leading or trailing whitespace characters is not limited to fixed-length string types.

A quick way to check if you have this type of problem comes with the `TRIM()` function in SQL or `trimws()` in R. These functions allow you to remove leading and trailing white spaces. Simply check if `TRIM(MY_COLUMN) = MY_COLUMN`. If this evaluates to FALSE at any point, you have leading or trailing white spaces in one of your text fields. Please note that this doesn't work with all databases, because they will sometimes ignore the leading and trailing spaces. If you want to be on the safe side, check if `LENGTH(TRIM(MY_COLUMN)) = LENGTH(MY_COLUMN)`. Alternatively, you can check if the string satisfies a regular expression like `'^[ ]+'` or `'[ ]+$'`. Be aware that the space character `U+0020` is not the only type of whitespace out there. In text data, you will often find the non-breaking space character `U+00A0`, the tab character `U+0009` or even more obscure characters. Therefore, it is safer to use regular expressions like `'^[\s]+'` or `'[\s]+$'`.

Takeaways:

* When you are working with a text column, check if is a fixed-length data type
* Use the `TRIM()` function to check for leading and trailing whitespace characters


---

# Agent Instructions
This documentation is published with GitBook. GitBook is the documentation platform designed so that both humans and AI agents can read, navigate, and reason over technical content effectively. Learn more at gitbook.com.

## Querying This Documentation
If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter, and the optional `goal` query parameter:

```
GET https://b-greve.gitbook.io/beginners-guide-to-clean-data/type-and-format-related-problems/whitespace-padded-strings.md?ask=<question>&goal=<endgoal>
```

`ask` is the immediate question: it should be specific, self-contained, and written in natural language.
`goal` is optional and describes the broader end goal you are ultimately trying to accomplish on behalf of the user. GitBook uses it to tailor the answer towards what is most useful for that goal.

The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
