Database Field Naming Notes

Database Field Naming Notes

Let’s look at a few ways we can name our database fields. This is mostly subjective, with some personal opinion mixed in for added velocity.

Naming tables

To me, this is straight forward.

  1. All table names should be lower snake_case e.g. account_name or account_email_option.
  2. Table names should be singular e.g. account, person, email, setting, etc. It’s easier to just keep singular names and not worry about using the plural form. This is SQL, not English class.
  3. Joining tables should be a merge of the tables they join e.g. account_person, the order isn’t important.
  4. No upper case, ever
  5. No spaces, ever

Try to avoid meaningless words like data, configurations, items, content. They are so general in nature, they could mean anything or nothing.

IDs and the evil auto-increment

Almost every table having unique rows should have an id field. Please do NOT name it accounts.account_id, if you do this, you should be sad and feel ashamed. Such meta in the field names is just verbose nonsense. I feel differently about this now. Using just id makes your SQL very ambiguous when you start joining many tables. Now I subscribe to any IDs being named with the table name, e.g. property.property_id should be the ID of the property table. This also aids in writing SQL, now you can simply write

SELECT * FROM property 
JOIN property_unit ON property_id

Much less verbose join than the standard JOIN property_unit ON property.id = property_unit.property_id

It should go without saying, id should be unique.

Why is auto-increment so bad?

We should address why auto-increment is a short sighted database design decision. If you like auto-increment, it’s likely you’ve only worked on smaller database projects. When a project grows bigger you need flexible solutions auto-increment is incapable of.

  1. Seed Data: When using auto-increment, it’s impossible to create preexisting seed data for your database which does not conflict when reloading or working with data already in the database. This is a big problem. You will want to load up seed data, or preexisting rows in a table and using auto-increment, you would first have to insert the row, then get the inserted ID (hint in PG you would using RETURNING * in your INSERT statement)

  2. Sequences Auto increment uses an internal sequence, and different environments can easily get out of sync. As an experiment trying this query.

    CREATE TABLE testing (
        id SERIAL
        name TEXT
    );
    INSERT INTO testing (name) VALUES (E'Shane'),(E'Barry'),(E'Bob');
    DELETE FROM testing;
    INSERT INTO testing (name) VALUES (E'Shane'),(E'Barry'),(E'Bob');
    SELECT * FROM testing;

    You’ll notice the ids will be 4,5,6. Why is this? Well, the sequence does not get reset when you remove rows. It just keeps going. When you’re developing on your machine and someone else is adding rows, then your sequences will not be in line. This will need to be manually managed to make sure staging and productio do NOT get different ids for rows you may want to insert.

  3. Uniqueness: Auto-increment ids are NOT unique. Enough said I guess. This causes a problem when you may want to combine similar tables.

  4. Predictable: Auto-increment ids are entirely predictable. If ids get exposed to outside operators, this will open attack surfaces for bad actors, i.e. hackers love auto-incremented integers as ids.

So, if I can’t use auto-increment, what should I use?

Good question, it’s rather simple. The solution to the unique, unpredictable, pre-generated values for IDs is to use a UUID (or GUID).

Some minor drawbacks to using a UUID

  1. Cannot sort by UUID. Since they are not predictable or orderly, it’s impossible to sort them in any useful way. But lets face it, why are you sorting by an ID in the first place? NOTE: A new version of UUID will be sortable.
  2. They are long and ugly. There is no getting around this fact, compared to normal ID integers, UUIDs are long and ugly. They are difficult to relate verbally and would take more space in a URL. UUIDs are not really for human consumption.
  3. They take more space in the database. I consider this a moot point. The days of shaving bytes of data to conserve space have long since past. These minor differences are virtually irrelevant when we have hard drives with terabytes of space.

How to use a UUID in PostgreSQL?

First, you’ll need to add support for it. It’s likely your version of PG already has it ready at your finger tips, but you need to enable the following plugins when you design your database.

CREATE EXTENSION IF NOT EXISTS "pgcrypto";
CREATE EXTENSION IF NOT EXISTS "uuid-ossp";

Next, you would use it for your IDs like such

CREATE TABLE people (
  id UUID PRIMARY KEY DEFAULT gen_random_uuid()
);

You’ll see in this type of setup, we can provide a pre-generated UUID or fall back to an auto-generated one. The pre-existing would be when you have seed data providing the ID.

Field Naming (non ID)

We’ve come to the part of the show where opinions will be most visible. What do you name your fields? First, I’m a big proponent of providing comments for all fields. Comments in postgres can be added to almost anything. Here we see a comment added to a column.

COMMENT ON COLUMN people.first_name IS 'The first name of the person';

Then, I would think of a name which fully describes the field.

Bad: images.url
Good: images.source_url

images.url is not a good choice, because an image could have different types of URLs might needing to be tracked. Think maybe images.cdn_url or images.imported_url or images.placeholder_url. Remember: Try not to be short sighted in your database design, assume your schema will change, and prepare for it.

Again, avoid watered down, meaningless words like type, class, data, content

Next, make sure you are not using reserved words in SQL or in the app language primarily in use. A perfect example is class. While in most cases, it will not be a problem, it does get confusing when you are working with data and has conflicting reserved words.

Foreign Keys and the _id

A lot of database designs will use the convention emails.person_id as a foreign key to the field people.id. I can see the value in this, because it provides hints in your application layer. The value is just an ID to a foreign field. I do not like this convention, as I mentioned previously it’s too meta. As a database administrator you know it’s a foreign key, so it’s almost certainly an ID, putting it in the field name is redundant. As for the application layer, they should be cognizant of table structure and understand emails.person is an ID.

To recap

  1. Add comments to your fields
  2. Avoid generic, easily confused naming
  3. Stay clear of obvious reserved words
  4. Append _id to foreign key fields? It’s up to you

Conclusion

We’ll stop here, but you can imagine there are many different ways that we could postulate about database field naming. Some are better than others, but in the end, it needs to make sense to you and your teams. Do the best you can, adapt, and overcome.

Update

I’ve been slightly swayed by the article How I Write SQL by Sehrope Sarkuni. It makes some good points.

  • Names of tables should be singular
  • FK should be like account_id instead of just account

Above all

The only thing worse than bad naming conventions is multiple naming conventions. If your existing project already has a standard approach to naming its database objects then keep using it.