Naming tables and columns

Name your tables and columns consistently. As the saying goes, naming things is one of the hardest problems in computer science. Well named objects, be it classes, methods, functions, variables, tables or columns, will make your life a lot easier while poor and inconsistent naming can make even a simple program impossible to comprehend. The book 'Clean Code' by Robert C. Martin offers some very helpful naming conventions and suggestions for programming languages. In databases, naming tables and columns works a little bit different.

Most important, name your tables and columns consistently. If you worked out a naming convention, try to stick to it. Unfortunately, this is not always feasible. As your project progresses, you may realize that your early naming conventions become obsolete or misleading. Here are a few suggestions that I find helpful:

Avoid abbreviations. Abbreviations are terrible for people who are new to a project. We tend to forget how annoying it is when people use a lot of abbreviations because we get used to those that are used within our teams or departments. If you must use abbreviations, limit it to the most important ones and, for god's sake, keep a dictionary of abbreviations somewhere and put that on the on-boarding list for every new team member!

Put the dimension into the name. What does a row in your table represent? Is it an offer, a customer, a product, a tuple of offer and customer, a transaction? Whatever it is, having the name of the dimension in the table name lets people quickly grasp what the table is all about. DAILY_REVENUE_PER_CUSTOMER let's you know what a row most likely represents while REVENUE is contains no information about the dimension.

Does the table contain a subset from a larger pool of data? If it's a simple filter like 'customers who were active, i.e. purchased something, within the last year', then a suffix like _ACTIVE_1YR or _ACTIVE_52WEEKS is useful. Of course, it's not always feasible to put every filter and every dimension into the table name. At some point, it simply gets too long to be useful.

Try to identify the key metric within the table. If it's a single metric, this should be in the table name. For example, a table containing the revenue for each combination of customer and product within the last 52 weeks could be named CUSTOMER_PRODUCT_REVENUE_W52. If the table contains several different metrics, try to identify what they have in common. Maybe it is all sales statistics like total sales, average weekly sales, standard deviation of sales, number of transactions and so on. In this case CUSTOMER_PRODUCT_SALES_STATISTICS_W52 might be a reasonable name.

You also need to remember that your audience consists of humans, not robots. Naming conventions can sometimes become very formal and lead to long, impossible to pronounce names. As Robert C. Martin points out: If you cannot pronounce it, you cannot talk about it, which also means that you cannot explain it to your team. Sometimes a short, descriptive name can be better than a long, technical one - even when it doesn't fully follow your naming conventions. For example: CUSTOMERMANAGEMENT_CUST_W52_N_PURCHASE_GT_1 is a reasonable name for the subset of customers who made at least one purchase in the last year, but CUSTOMERS_WHO_SHOPPED_LAST_YEAR is a lot easier to understand. If you are like me and you still want the name of the 'module' or process that creates this table in the name, CUSTOMERMANAGEMENT_SHOPPED_LAST_YEAR would be a good compromise.

Sometimes, you may need temporary tables that are only supposed to hold intermediate results. This can be actual temporary tables that get deleted once you don't need them anymore - like the volatile tables in Teradata that exist only in the current session and get deleted when the sessions ends. But it can also be actual, lasting tables. There are two ways that you should mark these tables.

Firstly, you should mark them as temporary tables in their name. I simple use the prefix TEMP_ or STAG_ for 'temporary' or 'staging'. Secondly, you should make sure that the name of the table shows the process it belongs to. If there are five intermediate result tables during the calculation of your customer datamart, add some token to their name to make clear where they belong to. In this particular case, use the prefix TEMP_CUSTOMERDATAMART_. This may seem intimidating because the resulting table names can get very long, especially when you take into account all the other advice from above. But if some person who is unfamiliar with your project ever has to look into your code or database, he will be glad that you chose such descriptive table names.

As you may already have noticed, some of the above rules contradict each other. If you stick faithfully to your naming conventions, many table names will become long and hard to pronounce. On the other hand, if you just use descriptive names, they will probably not be compatible with your naming conventions. As a rule of thumb, the more often you need to talk about a table, the more important a descriptive name becomes. A central table like sales statistics on a customer level can have a simple name like CUSTOMER_SALES_STATISTICS while a more 'technical' table like a temporary table used for calculating the customer sales statistics should follow the naming conventions more closely, e.g. TEMP_CUSTOMERSALESSTATS_RELEVANT_CUSTOMERS. The same goes for abbreviations. If you stick to the rule of no abbreviations, you'll end up will long names that are kind of unwieldy. If all your temporary tables start with TEMP_CUSTOMERSALESSTATS_, it probably makes sense to abbreviate it to something slightly shorter like TEMP_CUSTSTATS_ - even if that violates the rule.

Takeaways:

  • Avoid abbreviations. If you can't, have a dictionary with all the abbreviations.

  • Use simple names designed for humans, not machines.

  • Use names that a human can pronounce. Add context to your table name, e.g. the name of the module that creates or uses the table.

Last updated