Data Naming Standards

This page describes a high level approach for naming database objects. It is a combination of the abbreviations that I have used used at many clients. It also includes some basic principles from Ralph Kimball’s Data Warehouse Lifecycle Toolkit. It is an intentionally lightweight standard.

Medallion Architecture Considerations

The naming of objects in the initial landing area of the raw / bronze layer are an exception to the following naming standards. If you are using your bronze layer for raw / cold data storage (a common, recommended practice) then keeping the entry point object names aligned to the source system(s) is something to consider. This helps keep your ETL pipelines lean and makes data lineage tracking easier. Apply a consistent naming standard in the first layer built over the raw / cold data storage. This may be prime datasets, interim datasets in the pipeline leading to the silver layer, etc depending on how your structure your pipelines.

Table Names

Table names will follow the pattern of TYPE (optional) – SUBJECT – MODIFIER(s). The table name is singular, all upper case, with underscores representing spaces. There are no hyphens, spaces, or special characters that would require the table name to be escaped or bracketed in a query tool. Numbers can be used as part of a table name, however a table name wholly composed of numbers is not allowed. The table name cannot be a reserved word for the underlying platform.

Table names that represent an association between two entity types should list both entities and have a modifier that makes the association explicit.

Types are optional descriptors that categorize some tables into broad categories. Some types are architecture or ETL specific.

If you are using a Medallion Architecture then creating schemas or storage containers for each layer is advised.

Some types are data model specific (Star Schema, Data Vault). The naming standard applies regardless of the underlying object type (database table, text file, Parquet / Avro tables, etc.). For the purposes of simplicity this document uses table/column nomenclature.

Type	Abbreviation	Definition and Use
Bridge	BRD	Use for Data Vault architecture bridge tables. BRDG tables consolidate hubs for easy of querying across hubs.
Dimension	DIM	Use for Star Schema architecture dimension tables loaded by an ETL process. You may want to use a modifier to distinguish between current and time of event dimension tables.
Fact	FCT	Use for Star Schema architecture fact tables
Hub	HUB	Use for Data Vault architecture hub tables
Link	LNK	Use for Data Vault architecture link tables
Point in Time	PIT	Use for Data Vault architecture point in time tables. PIT tables are useful for performance/ease of querying.
Prime	PRM	Minimally transformed raw tables that are intended for initial exploration and the basis for creating worksets. They are full fidelity versions of raw tables, no rows or columns have been removed or altered.
Raw	RAW	Use when landing incoming data w/o any transformations, filtering, data cleansing, constraints, etc. The purpose of RAW tables is long term storage (could be compressed, offline, etc.).
Reference	REF	Use for Star Schema Style dimension tables or code lookup tables that are not populated by ETL (manual loads, UI maintained tables, etc.) For example, a list of State Codes and Names.
Satellite	SAT	Use for Data Vault architecture satellite tables
Shadow	SHD	Use for ETL work tables that mirror other tables (partition swaps, table renames, etc.). This is an instance where a table name could have multiple type values as SHD_ should preface the table name that is being mirrored. Ex. SHD_DIM_CUSTOMER is the shadow table for DIM_CUSTOMER.
View	VW	Used for logical views to pre-join other objects. Materialized views / objects should not use the view prefix but instead should use another type from this list (most likely work/workset). * NOTE
Work	WRK	Use for intermediate ETL work tables that need to be instantiated but are not generally useful for end user reporting or querying.
Workset	WRS	Use for transformed PRIME data that is intended for general querying / reporting and not stored in a Star Schema model. Intended for Big Data implementations but applicable to relational databases as well. Also used for materialized views that represent pre-joined objects, pre-calculated reports, etc.

NOTE: Identifying views by name is a common practice, but I would argue that this is a pattern that causes more problems than it solves. The underlying storage mechanism should not be an end-user concern. Views become materialize views or tables as part of tuning. Naming an object with VW creates a code change if you switch to a materialized view or table.

Subject

The subject should be a singular noun that represents the entity. Abbreviations/acronyms are acceptable if well known (ex. GL for General Ledger). Application specific or obscure abbreviations/acronyms should be considered more carefully, weighing brevity against readability. Otherwise subjects should generally be spelled out.

Modifiers

Modifiers are optional. For example, you may have a FCT_TRANSACTION table. There’s no need for a modifier if this table represents direct attributes of the customer. However, you may have a FCT_TRANSACTION_CUSTOMER table where Transaction is the subject and Customer is the modifier. This assumes the possibility of additional transaction types in the model. If multiple modifiers are required then they should be ordered from least specific to most specific.

Data Sourcing

Table names as a general rule do not reference the data source name. However, if multiple data sources exist for an entity type then the data source should be called out as the first modifier in the table name. (Ex. RAW_CUSTOMER_TRANSUNION_ CREDIT_DATA, RAW_CUSTOMER_EXPERION _CREDIT_DATA).

Column Names

Column names will follow the pattern of SUBJECT – MODIFER(s) – CLASS. The column name is singular, all upper case, with underscores representing spaces. There are no hyphens, spaces, or special characters that would require the column name to be escaped or bracketed in a query tool. Numbers can be used as part of a column name. The column name cannot be a reserved word for the underlying platform.

Subject

Modifier

Modifiers are optional. For example, you may have a CUSTOMER_ID column. There’s no need for a modifier in this case. However, a birth date would be CUSTOMER_BIRTH_DT as customer date is ambiguous. If multiple modifiers are required then they should be ordered from least specific to most specific.

For timestamps you may want to consider using modifiers if you have not standardized on a given timezone. Time zones deserve their own article!

Class

The Class suffix classifies the type of information being represented by the column. Class values are assigned according to the nature of the data; they are only loosely tied to specific data types.

Class	Abbreviation	Definition
Amount	AMT	A numeric measurement of a monetary value
LOB	LOB	A column that stores a large object of some sort that may or may not be human readable. Typically requires special handling for ETL, querying, and reporting.
Code	CD	A set of one or more characters or letters that represents a more meaningful piece of information. Codes typically represent a little changing set of values. Not used for Y/N.
Date	DT	A date at the day grain, no time component
Description	DSC	A character string that defines an abbreviation, code, or otherwise shortened term. Typically used for defining the meaning of code fields.
Flag	FLG	Represents a binary value, typically in a Boolean data type. Y/N or 1/0 (useful for data science) are also acceptable. Nulls are strongly discouraged!
Identifier	ID	A unique label/value for an entity. Can be either a natural or surrogate key. If your system has both then consider using modifiers to differentiate between the two.
Name	NME	A character string that gives the proper name or title of and entity. Ex. STATE_NME, CUSTOMER_FIRST_NME. Names are proper but not necessarily unique (not an identifier).
Number	NO	A value which is not for the purpose of measuring a quantity. It can contain non-numeric characters. Use when arithmetic operations are not applicable and the business term/definition calls it a number. Ex. PURCHASE_ORDER_NO.
Percent	PCT	A numeric field that represents a percentage or factor. Can either be the percentage or the decimal representation of a percentage depending upon usage. It is highly encouraged that the representation be consistent across the system or enterprise.
Quantity	QTY	A numeric value that represents the counted value of an event or object
Text	TX	An unconstrained set of characters that otherwise does not represent another character based value on this list. This is the catch all character type.
Time	TM	A time of day value that is at a minimum at the minute grain. Does not contain a date component.
Timestamp	TS	A date and time of day value that is at a minimum at the minute grain.
Value	VLU	A catch all abbreviation for numeric values that can be acted upon arithmetically but cannot be represented by another class on this list. Alternately, you may choose to define additional classes based on the problem domain if a given type appears regularly (ex. RATE).
XML	XML	A character or LOB data type that contains XML encoded data
JSON	JSON	A character or LOB data types that contains JSON encoded data

Medallion Architecture Considerations

Table Names

Subject

Modifiers

Data Sourcing

Column Names

Subject

Modifier

Class

Add comment

Glen Pennington

Medallion Architecture Considerations

Table Names

Subject

Modifiers

Data Sourcing

Column Names

Subject

Modifier

Class

Add comment

Glen Pennington

Where to find me on the socials