Data Naming Standards

D

This page describes a high level approach for naming database objects. It is a combination of the abbreviations that I have used used at many clients. It also includes some basic principles from Ralph Kimball’s Data Warehouse Lifecycle Toolkit. It is an intentionally lightweight standard.

Medallion Architecture Considerations

The naming of objects in the initial landing area of the raw / bronze layer are an exception to the following naming standards. If you are using your bronze layer for raw / cold data storage (a common, recommended practice) then keeping the entry point object names aligned to the source system(s) is something to consider. This helps keep your ETL pipelines lean and makes data lineage tracking easier. Apply a consistent naming standard in the first layer built over the raw / cold data storage. This may be prime datasets, interim datasets in the pipeline leading to the silver layer, etc depending on how your structure your pipelines.

Table Names

Table names will follow the pattern of TYPE (optional) – SUBJECT – MODIFIER(s). The table name is singular, all upper case, with underscores representing spaces. There are no hyphens, spaces, or special characters that would require the table name to be escaped or bracketed in a query tool. Numbers can be used as part of a table name, however a table name wholly composed of numbers is not allowed. The table name cannot be a reserved word for the underlying platform.

Table names that represent an association between two entity types should list both entities and have a modifier that makes the association explicit.

Types are optional descriptors that categorize some tables into broad categories. Some types are architecture or ETL specific.

If you are using a Medallion Architecture then creating schemas or storage containers for each layer is advised.

Some types are data model specific (Star Schema, Data Vault). The naming standard applies regardless of the underlying object type (database table, text file, Parquet / Avro tables, etc.). For the purposes of simplicity this document uses table/column nomenclature.

TypeAbbreviationDefinition and Use
BridgeBRDUse for Data Vault architecture bridge tables. BRDG tables consolidate hubs for easy of querying across hubs.
DimensionDIMUse for Star Schema architecture dimension tables loaded by an ETL process. You may want to use a modifier to distinguish between current and time of event dimension tables.
FactFCTUse for Star Schema architecture fact tables
HubHUBUse for Data Vault architecture hub tables
LinkLNKUse for Data Vault architecture link tables
Point in TimePITUse for Data Vault architecture point in time tables. PIT tables are useful for performance/ease of querying.
PrimePRMMinimally transformed raw tables that are intended for initial exploration and the basis for creating worksets. They are full fidelity versions of raw tables, no rows or columns have been removed or altered.
RawRAWUse when landing incoming data w/o any transformations, filtering, data cleansing, constraints, etc. The purpose of RAW tables is long term storage (could be compressed, offline, etc.).
ReferenceREFUse for Star Schema Style dimension tables or code lookup tables that are not populated by ETL (manual loads, UI maintained tables, etc.) For example, a list of State Codes and Names.
SatelliteSATUse for Data Vault architecture satellite tables
ShadowSHDUse for ETL work tables that mirror other tables (partition swaps, table renames, etc.). This is an instance where a table name could have multiple type values as SHD_ should preface the table name that is being mirrored. Ex. SHD_DIM_CUSTOMER is the shadow table for DIM_CUSTOMER.
ViewVWUsed for logical views to pre-join other objects. Materialized views / objects should not use the view prefix but instead should use another type from this list (most likely work/workset).
* NOTE
WorkWRKUse for intermediate ETL work tables that need to be instantiated but are not generally useful for end user reporting or querying.
WorksetWRSUse for transformed PRIME data that is intended for general querying / reporting and not stored in a Star Schema model. Intended for Big Data implementations but applicable to relational databases as well.
Also used for materialized views that represent pre-joined objects, pre-calculated reports, etc.

NOTE: Identifying views by name is a common practice, but I would argue that this is a pattern that causes more problems than it solves. The underlying storage mechanism should not be an end-user concern. Views become materialize views or tables as part of tuning. Naming an object with VW creates a code change if you switch to a materialized view or table.

Subject

The subject should be a singular noun that represents the entity. Abbreviations/acronyms are acceptable if well known (ex. GL for General Ledger). Application specific or obscure abbreviations/acronyms should be considered more carefully, weighing brevity against readability. Otherwise subjects should generally be spelled out.

Modifiers

Modifiers are optional. For example, you may have a FCT_TRANSACTION table. There’s no need for a modifier if this table represents direct attributes of the customer. However, you may have a FCT_TRANSACTION_CUSTOMER table where Transaction is the subject and Customer is the modifier. This assumes the possibility of additional transaction types in the model. If multiple modifiers are required then they should be ordered from least specific to most specific.

Data Sourcing

Table names as a general rule do not reference the data source name. However, if multiple data sources exist for an entity type then the data source should be called out as the first modifier in the table name.  (Ex. RAW_CUSTOMER_TRANSUNION_ CREDIT_DATA, RAW_CUSTOMER_EXPERION _CREDIT_DATA).

Column Names

Column names will follow the pattern of SUBJECT – MODIFER(s) – CLASS. The column name is singular, all upper case, with underscores representing spaces. There are no hyphens, spaces, or special characters that would require the column name to be escaped or bracketed in a query tool. Numbers can be used as part of a column name. The column name cannot be a reserved word for the underlying platform.

Subject

The subject should be a singular noun that represents the entity. Abbreviations/acronyms are acceptable if well known (ex. GL for General Ledger). Application specific or obscure abbreviations/acronyms should be considered more carefully, weighing brevity against readability. Otherwise subjects should generally be spelled out.

Modifier

Modifiers are optional. For example, you may have a CUSTOMER_ID column. There’s no need for a modifier in this case. However, a birth date would be CUSTOMER_BIRTH_DT as customer date is ambiguous. If multiple modifiers are required then they should be ordered from least specific to most specific.

For timestamps you may want to consider using modifiers if you have not standardized on a given timezone. Time zones deserve their own article!

Class

The Class suffix classifies the type of information being represented by the column.  Class values are assigned according to the nature of the data; they are only loosely tied to specific data types.

ClassAbbreviationDefinition

Amount

AMT

A numeric measurement of a monetary value

LOB

LOB

A column that stores a large object of some sort that may or may not be human readable. Typically requires special handling for ETL, querying, and reporting.

Code

CD

A set of one or more characters or letters that represents a more meaningful piece of information. Codes typically represent a little changing set of values. Not used for Y/N.

Date

DT

A date at the day grain, no time component

Description

DSC

A character string that defines an abbreviation, code, or otherwise shortened term. Typically used for defining the meaning of code fields.

Flag

FLG

Represents a binary value, typically in a Boolean data type. Y/N or 1/0 (useful for data science) are also acceptable. Nulls are strongly discouraged!

Identifier

ID

A unique label/value for an entity. Can be either a natural or surrogate key. If your system has both then consider using modifiers to differentiate between the two.

Name

NME

A character string that gives the proper name or title of and entity. Ex. STATE_NME, CUSTOMER_FIRST_NME. Names are proper but not necessarily unique (not an identifier).

Number

NO

A value which is not for the purpose of measuring a quantity. It can contain non-numeric characters. Use when arithmetic operations are not applicable and the business term/definition calls it a number. Ex. PURCHASE_ORDER_NO.

Percent

PCT

A numeric field that represents a percentage or factor. Can either be the percentage or the decimal representation of a percentage depending upon usage. It is highly encouraged that the representation be consistent across the system or enterprise.

Quantity

QTY

A numeric value that represents the counted value of an event or object

Text

TX

An unconstrained set of characters that otherwise does not represent another character based value on this list. This is the catch all character type.

Time

TM

A time of day value that is at a minimum at the minute grain. Does not contain a date component.

Timestamp

TS

A date and time of day value that is at a minimum at the minute grain.

Value

VLU

A catch all abbreviation for numeric values that can be acted upon arithmetically but cannot be represented by another class on this list. Alternately, you may choose to define additional classes based on the problem domain if a given type appears regularly (ex. RATE).

XML

XML

A character or LOB data type that contains XML encoded data

JSON

JSON

A character or LOB data types that contains JSON encoded data

Add comment

Glen Pennington

I have over 30 years of IT experience in the application of computer technology to many fields of business. I have deep experience in developing ETL/ELT pipelines to populate data warehouses, data lakes, and data products built on traditional relational database and cloud based platforms such as Snowflake and the Spark ecosystem.

As a data architect I can design and implement data products with consistent naming standards and rigorous data quality standards. These solutions are built on insights gained through data profiling and enforced through embedded data quality checks.

I have extensive experience using many languages and platforms and I have experience in several software development methodologies.

Where to find me on the socials