回答如下:
答案有点取决于你是否仅限于使用免费软件,例如PostGreSQL(不完全符合SQL标准),或者你正在考虑使用SQL(即符合SQL标准)和大型数据库。
对于符合SQL标准的开放式架构数据库,其中有许多应用程序使用一个数据库,并且许多用户使用不同的报告工具(不仅仅是应用程序)来访问数据,标准、规范化和开放式架构要求非常重要。
尽管有些人试图改变“规范化”等定义以适应其不断变化的目的,但规范化(科学)并没有改变。
if you have data values such as {Open; Closed; etc
} repeated in data tables, that is data duplication, a simple Normalisation error: if you those values change, you may have to update millions of rows, which is very limited design.
Such values should be Normalised into a Reference or Lookup table, with a short CHAR(2)
PK:
O Open
C Closed
U [NotKnown]
The data values {Open;Closed;etc
} are no longer duplicated in the millions of rows. It also saves space.
the second point is ease of change, if Closed
were changed to Expired
, again, one row needs to be changed, and that is reflected in the entire database; whereas in the un-normalised files, millions of rows need to be changed.
Adding new data values, eg. (H,HalfOpen
) is then simply a matter of inserting one row.
in Open Architecture terms, the Lookup table is an ordinary table. It exists in the [SQL compliant] catalogue; as long as the FOREIGN KEY
relation has been defined, the report tool can find that as well.
ENUM
is a Non-SQL, do not use it. In SQL the "enum" is a Lookup table.
The next point relates to the meaningfulness of the key.
- If the Key is meaningless to the user, fine, use an {
INT;BIGINT;GUID;etc
} or whatever is suitable; do not number them incrementally; allow "gaps".
- But if the Key is meaningful to the user, do not use a meaningless number, use a meaningful Relational Key.
Now some people will get in to tangents regarding the permanence of PKs. That is a separate point. Yes, of course, always use a stable value for a PK (not "immutable", because no such thing exists, and a system-generated key does not provide row uniqueness).
{M,F
} are unlikely to change
if you have used {0,1,2,4,6
}, well don't change it, why would you want to. Those values were supposed to be meaningless, remember, only a meaningful Key need to be changed.
if you do use meaningful keys, use short alphabetic codes, that developers can readily understand (and infer the long description from). You will appreciate this only when you code SELECT
and realise you do not have to JOIN
every Lookup table. Power users too, appreciate it.
Since PKs are stable, particularly in Lookup tables, you can safely code:
WHERE status_code = 'O' -- Open
You do not have to JOIN
the Lookup table and obtain the data value Open
, as a developer, you are supposed to know what the Lookup PKs mean.
最后,如果数据库很大并且除了支持 OLTP 外还支持 BI、DSS 或 OLAP 功能(因为规范化的数据库可以这样做),那么查找表实际上是一个维度或向量,在“维度-事实”分析中。如果没有它,则必须添加它以满足该软件的要求,然后才能进行此类分析。
如果你从一开始就对数据库进行这样的处理,就不必在以后升级它(和代码)。
你的示例
SQL 是一种低级语言,因此它很繁琐,特别是当涉及到 JOIN 时。这就是我们所拥有的,所以我们需要接受这种负担并处理它。你的示例代码很好。但更简单的形式也可以完成同样的事情。
报告工具将生成:
SELECT p.*,
s.name
FROM posts p,
status s
WHERE p.status_id = s.status_id
AND p.status_id = 'O'
另一个例子
对于银行系统,我们使用有意义的短代码(因为它们是有意义的,所以我们不会随季节更改它们,我们只会添加它们),给定一个查找表,如下(精心选择,类似于ISO国家代码):
Eq 股票
EqCS 普通股
OTC 场外交易
OF 场外期货
这样的代码很常见:
WHERE InstrumentTypeCode LIKE "Eq%"
GUI的用户将从下拉列表中选择值,其中显示为
{普通股;场外交易
},
而不是 {Eq;OTC;OF
}, 也不是 {M;F;U
}。
如果没有查找表,则无法在应用程序或报告工具中执行此操作。