Moving beyond basic SQL skills

M

Introduction

You know your way around the select statement, you can join tables in various ways, use aggregate functions, can alter existing tables and create new objects, etc. But sometimes your SQL just doesn’t work, or it somewhat works giving odd results. So what’s the next step to take in improving your SQL skills?

I’m decent with SQL, good enough at least that I’m often called on to debug other people’s code. There’s one problem area that comes up more than all others, one that manifests itself in many different ways. But at it’s core the problem is the same. What is it? Simply, the coder does not understand nulls and how they impact SQL processing.

So what is null and how should we think about it? Here are some definitions, pick one or more that resonate:

  • null is the representation of “missing information and inapplicable information” – E.F. Codd
  • null means a value is not known or doesn’t exist
  • null is not a value, null is the absence of value
  • null is a state, not a value

Zero is a value, an empty string ” is a value. If we don’t know the answer then we use null to represent this. How many angels can dance on the head of a pin? null angels

You probably know this already, but this has many implicatations and some of them are subtle. For example, assume you have (a == b) in a case statement or a where clause. You’ve worked with several programming languages, you see what appears to be a boolean expression and can intuit what happens when the expression is either true or false. However, this isn’t a boolean expression, it’s a ternary expression. If you think it’s a boolean expression then you are missing 1/3 of the possibilities as the possible outcomes are true, false, and null.

Truth Tables

Assume we have two variables, a and b. Each variable may either be True, False, or null. What do we get when we combine them in different ways?

a b a OR b a AND b a = b
True True True True True
True False True False False
True null True null null
False True True False False
False False False False True
False null null False null
null True True null null
null False null False null
null null null null null

Notice that the presence of a null in either column can cause the result to be null. For equality tests, the presence of a null in either column results in a null. For OR you can get True results if one or both columns are True. But a False / null combination results in null, you can’t definitively say that the expression is either True or False. Likewise with AND a True/null results in null. The last line is especially interesting, a null/null equality check does not return True.

It’s important to understand the truth tables for AND, OR, and EQUALITY.

Examples

(2 = null) -> result is null.

A known value does not equal null.

(null and true) -> result is null.

An unknown value isn’t definitely true.

(null or true) -> result is true.

The null is meaningless in this expression.

(null = null) -> result is null.

The unknown doesn’t necessarily equal the unknown.

(null != null) -> result is null

The unknown doesn’t necessarily not equal the unknown.

Finally, 99.99% of the time you are making a mistake if you type these

some_val = null
some_val != null

OK I get it, but do you?

select
case i
when null then 'null'
when 0 then 'zero'
else 'non-zero'
end as foo
from bar;

This will never return ‘null’, because i = null is always false. There are many instances where an equality test is being performed without an equality sign.

-- Correct syntax
select
case i
when i is null then 'null'
when 0 then 'zero'
else 'non-zero'
end as foo
from bar;

How about joins?

select
*
from
foo
where
bar = null;

The result set is zero rows, even where the bar column contains nulls.

select
*
from
foo
where
(bar = 1) or
(bar != 1);

versus

select
*
from
foo;

These are not the same, the first query will not return records where bar is null.

select
*
from
foo
left join bar on foo.key = bar.key
where
bar.another_var = 'value';

This is an inner join! If bar doesn’t have a matching record then all columns in bar are null. Therefore, they fail the equality test. This is a very common mistake!

select
*
from
foo
where
some_val in (select some_val from bar);

Notes: The rows where some_val is null in either table will never be matched even if both tables have rows where some_val is null.

null in (null)

Notes: this is an unknown

What about math?

null * 2 = null

null / 0 = ??

NOTES: could be null, could give division by zero error (platform dependent. That’s right, you can’t count on null behaviour to be consistent across database platforms. Yay!)

What about counting?

These return the same count…

count()
count(1)
count('foo')

…but this is not when some_col contains null values.

count(some_col)

How about sums and averages?

sum(i) / count(*)

…is not the same as…

avg(i) when i contains nulls

Notes: avg(), sum() excludes nulls when calculating. This may not be what you want! Also, you can’t trust that all databases handle this case the same.

Grouping and sorting

(null = null) -> null

But for purposes of grouping and sorting, all nulls are equal!

For sorting, nulls will typically come first. Some SQL interpreters allow you to specify where nulls will appear.

order by some_val nulls last;

Dealing with nulls

Often the best way to handle nulls is to replace them with a known value or filter them out. The null functions that are available to you will vary by database platform. Most have some variation on the following:

The correct way to test for the presence of nulls, use in where clauses and case statements.

 case
some_col is null
some_col is not null

Useful functions

Null functions vary between database implementations, but you can almost always find some mix of the following.

coalesce(col1, col2, col3)

This returns the first variable encountered that contains a value

nvl(expression, value)

This returns value if expression is null. Very handy!

concetenate(col1, col2, col3)
concat(col1, col2, col3)

These return a concatenated string that ignores null values instead of returning null for the entire expression.
‘string’ || null will result in a null on many, but not all database platforms (sideeye Oracle).

It could be worse!

  • E.F. Codd suggested two values for null, “missing but applicable” and “missing but inapplicable”
  • “high null” and “low null” have also been suggested
  • Either would result in 4 value logic when most coders think it’s 2 value logic
  • Nulls are confusing in SQL because they break fundamental assumptions built into relational theory (closed universe)

Summary

The horse is dead and has been flogged appropriately. My recommendation is to know beforehand which columns have null and to plan accordingly.

By Glen Pennington

Glen Pennington

I have over 30 years of IT experience in the application of computer technology to the fields of consumer credit, marketing, accounting, finance, sensory analysis, and manufacturing. I have deep experience in database development using metadata driven solutions for ETL (Extract Transform Load), Data Profiling, and Data Quality. I have extensive experience developing systems in many languages and on several platforms (Mac, Unix, Windows) and can perform as both a developer and as a project manager (certified SCRUM Master and SAFe Agilist).

This blog contains my musings on various IT topics.

Get in touch