Moving beyond basic SQL skills

Introduction

You know your way around the select statement, you can join tables in various ways, use aggregate functions, can alter existing tables and create new objects, etc. But sometimes your SQL just doesn’t work, or it somewhat works giving odd results. So what’s the next step to take in improving your SQL skills?

I’m decent with SQL, good enough at least that I’m often called on to debug other people’s code. There’s one problem area that comes up more than all others, one that manifests itself in many different ways. But at it’s core the problem is the same. What is it? Simply, the coder does not understand nulls and how they impact SQL processing.

So what is null and how should we think about it? Here are some definitions, pick one or more that resonate:

null is the representation of “missing information and inapplicable information” – E.F. Codd
null means a value is not known or doesn’t exist
null is not a value, null is the absence of value
null is a state, not a value

Zero is a value, an empty string ” is a value. If we don’t know the answer then we use null to represent this. How many angels can dance on the head of a pin? null angels

You probably know this already, but this has many implicatations and some of them are subtle. For example, assume you have (a == b) in a case statement or a where clause. You’ve worked with several programming languages, you see what appears to be a boolean expression and can intuit what happens when the expression is either true or false. However, this isn’t a boolean expression, it’s a ternary expression. If you think it’s a boolean expression then you are missing 1/3 of the possibilities as the possible outcomes are true, false, and null.

Truth Tables

Assume we have two variables, a and b. Each variable may either be True, False, or null. What do we get when we combine them in different ways?

a	b	a OR b	a AND b	a = b
True	True	True	True	True
True	False	True	False	False
True	null	True	null	null
False	True	True	False	False
False	False	False	False	True
False	null	null	False	null
null	True	True	null	null
null	False	null	False	null
null	null	null	null	null

Notice that the presence of a null in either column can cause the result to be null. For equality tests, the presence of a null in either column results in a null. For OR you can get True results if one or both columns are True. But a False / null combination results in null, you can’t definitively say that the expression is either True or False. Likewise with AND a True/null results in null. The last line is especially interesting, a null/null equality check does not return True.

It’s important to understand the truth tables for AND, OR, and EQUALITY.

Examples

(2 = null) -> result is null.

A known value does not equal null.

(null and true) -> result is null.

An unknown value isn’t definitely true.

(null or true) -> result is true.

The null is meaningless in this expression.

(null = null) -> result is null.

The unknown doesn’t necessarily equal the unknown.

(null != null) -> result is null

The unknown doesn’t necessarily not equal the unknown.

Finally, 99.99% of the time you are making a mistake if you type these

some_val = null
some_val != null

OK I get it, but do you?

select

  case i

      when null then 'null'

        when 0 then 'zero'

        else 'non-zero'

  end as foo

from bar;

This will never return ‘null’, because i = null is always false. There are many instances where an equality test is being performed without an equality sign.

-- Correct syntax

select

  case i

      when i is null then 'null'

        when 0 then 'zero'

        else 'non-zero'

  end as foo

from bar;

How about joins?

select
  *
from
  foo
where
  bar = null;

The result set is zero rows, even where the bar column contains nulls.

select
  *
from
  foo
where
  (bar = 1) or
  (bar != 1);

versus

select
  *
from
  foo;

These are not the same, the first query will not return records where bar is null.

select
  *
from
  foo
  left join bar on foo.key = bar.key
where
  bar.another_var = 'value';

This is an inner join! If bar doesn’t have a matching record then all columns in bar are null. Therefore, they fail the equality test. This is a very common mistake!

select
  *
from
  foo
where
  some_val in (select some_val from bar);

Notes: The rows where some_val is null in either table will never be matched even if both tables have rows where some_val is null.

null in (null)

Notes: this is an unknown

What about math?

null * 2 = null

null / 0 = ??

NOTES: could be null, could give division by zero error (platform dependent. That’s right, you can’t count on null behaviour to be consistent across database platforms. Yay!)

What about counting?

These return the same count…

count()
count(1)
count('foo')

…but this is not when some_col contains null values.

count(some_col)

How about sums and averages?

sum(i) / count(*)

…is not the same as…

avg(i) when i contains nulls

Notes: avg(), sum() excludes nulls when calculating. This may not be what you want! Also, you can’t trust that all databases handle this case the same.

Grouping and sorting

(null = null) -> null

But for purposes of grouping and sorting, all nulls are equal!

For sorting, nulls will typically come first. Some SQL interpreters allow you to specify where nulls will appear.

order by some_val nulls last;

Dealing with nulls

Often the best way to handle nulls is to replace them with a known value or filter them out. The null functions that are available to you will vary by database platform. Most have some variation on the following:

The correct way to test for the presence of nulls, use in where clauses and case statements.

 case
   some_col is null
   some_col is not null

Useful functions

Null functions vary between database implementations, but you can almost always find some mix of the following.

coalesce(col1, col2, col3)

This returns the first variable encountered that contains a value

nvl(expression, value)

This returns value if expression is null. Very handy!

concetenate(col1, col2, col3)
concat(col1, col2, col3)

These return a concatenated string that ignores null values instead of returning null for the entire expression.
‘string’ || null will result in a null on many, but not all database platforms (sideeye Oracle).

It could be worse!

E.F. Codd suggested two values for null, “missing but applicable” and “missing but inapplicable”
“high null” and “low null” have also been suggested
Either would result in 4 value logic when most coders think it’s 2 value logic
Nulls are confusing in SQL because they break fundamental assumptions built into relational theory (closed universe)

Summary

The horse is dead and has been flogged appropriately. My recommendation is to know beforehand which columns have null and to plan accordingly.

Moving beyond basic SQL skills

Glen Pennington

Get in touch