A simple example would be the color of your product. Perhaps your company sells T-shirts and your want to find out which colors are the most popular with your customers. However, looking into the data, you discover that aside from plain one-colored shirts, you also have some with patterns and prints in various colors on them. Aside from the column COLOR
, which you thought would give you all the information you need, you discover that there's another column called PATTERN_COLOR
that contains a comma-separated list of all the most notable colors in your shirt's pattern or print. This means that a T-shirt may qualify as 'blue'
not only when COLOR = 'blue'
, but also when 'blue' is contained as one of the entries in PATTERN_COLOR
. This also means that a white-and-blue striped shirt can belong to the set of white shirts AND the set of blue shirts at the same time. The simple question has just gotten a lot more complex and you need to figure out how you want to deal with this level of complexity.