https://www.depesz.com/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/
Lesson Learned: Use TEXT
, unless you have a specific need to limit the characters
CREATE TABLE people (
first_name TEXT NOT NULL,
last_name TEXT NOT NULL
);
The debate rages on, “To use ENUMs, or NOT use ENUMs”. When you use ENUMs, data integrity is enforced, this is a good thing. Think about allowing a free form TEXT
field like colors. You might get values like this in your database.
Grey
GRAY
gray
grey
If your solution is “the app will lower/upper case the value”, then you are essentially making your app the only consumer of the database. What happens when another process starts using your database? I view this as a very limited database design mindset when you only view the database as being the backend to one type of frontend. You should design your database as a database that exists to serve and protect the data it houses. Too many times I see developers treating their database as dumb data stores, and hold much of the logic in the app that could easily reside in the database. When you go down this route, you may be leveraging your strengths (app development), but losing out on very powerful tools that reside in the database engine. It also requires you to implement logic in multiple places rather than just depend on an intelligent data store.
These are functions your database can easily handle without breaking a sweat
updated_at
✅OK, that was a rabbit trail, back to the main subject ENUMs
Let’s look at a couple of notes about ENUM types. First, we’ll start with this ENUM type.
CREATE TYPE phone_type as ENUM ('MOBILE', 'WORK', 'HOME')
You can add to an ENUM type, but not remove. Add another enum value with ALTER TYPE item_status ADD VALUE IF NOT EXISTS 'OTHER'
. Just know that OTHER
is now permanently in that type. It can’t be easily removed.
Altering an ENUM type must be done outside of a transaction. Nuff said.
Get a list of values in an ENUM as rows in a query
SELECT unnest(enum_range(NULL::phone_type));
SELECT * FROM phones ORDER BY phone_type
or
SELECT * FROM movies WHERE rating <= PG-13
is_default
fieldThis is part of learning what you really want from your database. We had a many to many table, let’s call it people_addresses
for example. We wanted to know the first entry and then set the is_default = true
for that record. Sounds simple right?
The first attempt solving this solution was to have the application do a check for any record for a peron’s ID in the people_addresses
table, if no record was found, then we set is_default = true
when inserting that relation. It worked, but in reality, was subject to race conditions from other applications. It would work fine enough for low volume sites.
One of our talented engineers decided to use a table trigger and automagically set the is_default
tag. So this was the trigger that seemed to work during testing
CREATE OR REPLACE FUNCTION people_address_is_default()
RETURNS trigger AS $$
DECLARE
is_default boolean;
BEGIN
is_default := (SELECT NOT EXISTS(SELECT person FROM people_addresses WHERE person = NEW.person));
NEW.is_default = is_default;
RETURN NEW;
END;
$$ language 'plpgsql';
You’ll notice that the table trigger will check if another record exists and automatically set the is_default
value. Pretty slick. The only problem we saw was another race condition (see a pattern). When the client would call the same endpoint with different values. Each insert ran in its own transaction and therefore the is_default
was set to true.
Stepping back and looking at the problem, we really didn’t care about an is_default
, we mostly cared about detecting the first insert. This can be done with a timestamp. So instead we decided to use a created_at
field with a default value of now()
. The only drawback now is we have to derive the is_default
value by looking at other records and determining if this is the oldest timestamp.
Postgres timestamp precision? During the discussions about using a timestamp for determining the first row inserted, we wondered, “What precision does PG use for storing timestamps? Milliseconds? Nanoseconds?”. The correct answer is a microsecond. Just for reference, 1 millisecond (ms) = 1000 microseconds (μs). We should be pretty safe using microseconds to determine which record was inserted first.
From their own documentation on Date/Time Types
Note: When timestamp values are stored as eight-byte integers (currently the default), microsecond precision is available over the full range of values.
We found UNION ALL
to be a powerful way to combine multiple tables together that had similar types of content into a single view. For argument sake, we didn’t go the route of complete normalization because of some difficultly setting up the data. So we would create a view that looked like this (a poor example, but simple nonetheless).
CREATE VIEW people (
id
first_name
last_name
date_of_birth
) AS
SELECT id, first_name, last_name, date_of_birth FROM adults
UNION All
SELECT id, first_name, last_name, date_of_birth FROM children
UNION All
SELECT id, first_name, last_name, date_of_birth FROM teachers
This worked well to have a unified table for the app to connect and get the data it needed, but has some drawbacks.
This pattern works well as you are discovering your data requirements, but should moved to more solid structures as your understanding of how data interacts progresses.
I’m going to briefly touch on ID
s, but this subject deserves its own blog post (look for it soon).
id
field. It’s just easier to lookup, select, join, and delete.ID
s. I think it’s too meta to name your id’s like account_id
or person_id
. I worked on project once where all the tables had fields named account.account_id
and account.account_created_at
. Incorporating the table name in the field is super lame.That’s all for now. Look for new blog posts soon that will dive into performance and that promised field naming post about IDs.