其他-Is storing a delimited list in a database column really that bad?

Bill Karwin 2021-02-10 14:01:20

In addition to violating First Normal Form because of the repeating group of values stored in a single column, comma-separated lists have a lot of other more practical problems:

Can’t ensure that each value is the right data type: no way to prevent 1,2,3,banana,5
Can’t use foreign key constraints to link values to a lookup table; no way to enforce referential integrity.
Can’t enforce uniqueness: no way to prevent 1,2,3,3,3,5
Can’t delete a value from the list without fetching the whole list.
Can't store a list longer than what fits in the string column.
Hard to search for all entities with a given value in the list; you have to use an inefficient table-scan. May have to resort to regular expressions, for example in MySQL:
idlist REGEXP '[[:<:]]2[[:>:]]' or in MySQL 8.0: idlist REGEXP '\\b2\\b'
Hard to count elements in the list, or do other aggregate queries.
Hard to join the values to the lookup table they reference.
Hard to fetch the list in sorted order.
Hard to choose a separator that is guaranteed not to appear in the values

To solve these problems, you have to write tons of application code, reinventing functionality that the RDBMS already provides much more efficiently.

Comma-separated lists are wrong enough that I made this the first chapter in my book: SQL Antipatterns: Avoiding the Pitfalls of Database Programming.

There are times when you need to employ denormalization, but as @OMG Ponies mentions, these are exception cases. Any non-relational “optimization” benefits one type of query at the expense of other uses of the data, so be sure you know which of your queries need to be treated so specially that they deserve denormalization.

Frank Heikens 2011-11-24 21:18:25

An ARRAY (of any datatype) can fix the exception, just check PostgreSQL: postgresql.org/docs/current/static/arrays.html (@Bill: Great book, a must read for any developer or dba)

Craig Ringer 2013-12-31 08:54:15

For PostgreSQL-specific discussion see dba.stackexchange.com/q/55871/7788 . Comma-separated is just as awful, but an array field can be an acceptable performance optimisation under some circumstances if applied carefully and with consideration of the consequences.

Bill Karwin 2014-09-25 05:32:43

@CraigRinger, yes, it's a type of denormalization. When used carefully, denormalization can be just the right thing to do for a certain query you are trying to optimize, but it must be done with full understanding that it harms other queries. If those other queries aren't important to your application, then the pain is less.

jmcclure 2015-09-23 01:09:35

I know its not recommended, but playing devils advocate: most of these can be taken off if there is a ui that handles uniqueness and data types (otherwise would error or misbehave), ui drops and creates it anyway, there is a driver table where the values come from to make them unique, field like '%P%' can be used, values being P, R, S, T, counting doesn't matter, and sorting doesn't matter. Depending on ui, values can be split[] e.g. to check checkboxes in a list from driver table in least common scenario without having to go to another table to get them.

Bill Karwin 2018-02-28 16:33:50

@PrabhuNandanKumar, I would store 174 rows in a second table that references your first table. Do not store 174 columns with similar data.

Is storing a delimited list in a database column really that bad?

热门帖子

热门github