days
0
-42
-1
hours
-2
-1
minutes
-4
-7
seconds
-3
-6
Check out the latest:
Use them more

# You probably don’t use SQL INTERSECT or EXCEPT often enough

SQL image via Shutterstock

Lukas Eder likes SQL a lot, so he’s here to show you the true meaning of Venn diagrams by describing its operations – UNION, EXCEPT and INTERSECT all play a starring role. You’ll even get a sneak peak of some of the points he’ll cover in his upcoming JAX London session.

This post was originally published over at jooq.org, a blog focusing on all things open source, Java and software development from the perspective of jOOQ.

When people talk about SQL JOIN, they often use Venn diagrams to illustrate inclusion and exclusion of the two joined sets:

While these Venn diagrams are certainly useful to understand (and remember) SQL JOIN syntax, they’re not entirely accurate, because SQL JOIN is a special type of a cartesian product, the CROSS JOIN.

Illustration by Wikipedia user Quartl

In a cartesian product between two sets A and B, the result is the multiplication of each set, meaning that each element `a ∈ A` is combined with each element `b ∈ B` to form a set of tuples `(a, b)`.

Ordinary SQL JOINs do precisely this. When you join BOOK to AUTHOR, you will probably get a combination of every `author ∈ AUTHOR` with each`book ∈ BOOK`, such that for each combination `(author, book)`, the author actually wrote the book.

### The true meaning of Venn diagrams

The true meaning of Venn diagrams is much better described by the operations

• `UNION`
• `INTERSECT`
• `EXCEPT` (or `MINUS` in Oracle)

In the following sections, we’ll see that these operations match exactly the semantics of operations that can be illustrated by Venn diagrams, even if you will be able to “abuse” `JOIN` operations to achieve the same result.

### UNION

The `UNION` operation is the most well-known among these set operations. It is often also referred to as “concatenation” of two sets of tuples, where the result is the concatenation of a set `B` to a set `A`.

In the following example, we’ll see that we might be interested in all the different people from our database, given their first and last names, regardless if they’re `customer` or `staff`:

The original Venn diagrams used `FULL OUTER JOIN` to model the “same” concept, although the two things are not strictly same. Consider the following query, which we’ll run against the Sakila database:

```SELECT first_name, last_name
FROM customer
UNION
SELECT first_name, last_name
FROM staff
ORDER BY 1, 2
```

The result looks like:

```first_name   last_name
------------------------------------
AARON        SELBY
AGNES        BISHOP
ALAN         KAHN
ALBERT       CROUSE
ALBERTO      HENNING
ALEX         GRESHAM
ALEXANDER    FENNELL
ALFRED       CASILLAS
ALICE        STEWART
ALICIA       MILLS
...
```

Now, run the following “equivalent” query:

```SELECT first_name, last_name
FROM customer
FULL OUTER JOIN staff
USING (first_name, last_name)
ORDER BY 1, 2
```

The result will again yield:

```first_name   last_name
------------------------------------
AARON        SELBY
AGNES        BISHOP
ALAN         KAHN
ALBERT       CROUSE
ALBERTO      HENNING
...
```

This only works because we’re using the `USING` clause, which not every database supports natively. If we did our JOIN with the more commonly used ON clause, we’d have to write the more tedious:

```SELECT
COALESCE(c.first_name, s.first_name) AS first_name,
COALESCE(c.last_name, s.last_name) AS last_name
FROM customer c
FULL OUTER JOIN staff s
ON (c.first_name, c.last_name)
=  (s.first_name, s.last_name)
ORDER BY 1, 2
```

In this case, most people probably default to using `UNION` already, as it is a much better known operation than `FULL OUTER JOIN`.

All of jOOQ’s currently supported RDBMS support `UNION` and `UNION ALL `(the latter doesn’t remove duplicates).

In the following, we’ll see that equivalent comparisons can be made with other set operations:

### INTERSECT

The `INTERSECT` operation is really useful when you want to keep only those tuples that are present in both sets that are combined using `INTERSECT`:

As you can see, we may want to retain only those customers that are also actors. Let’s run this query:

```SELECT first_name, last_name
FROM customer
INTERSECT
SELECT first_name, last_name
FROM actor
```
```first_name   last_name
------------------------------------
JENNIFER     DAVIS
```

One of our customers is also an actor. The same query could have been written with an `INNER JOIN` as such:

```SELECT first_name, last_name
FROM customer
INNER JOIN actor
USING (first_name, last_name)
```

… or with the `ON` syntax

```SELECT c.first_name, c.last_name
FROM customer c
INNER JOIN actor a
ON (c.first_name, c.last_name)
=  (a.first_name, a.last_name)
```

This time, no `COALESCE` is needed, as `INNER JOIN` retains only those tuples from the cartesian product, which are present on “both sides” of the `JOIN`, so we can pick any of the tables to prefix our columns.

You may even decide to use a semi-join instead, which would yield the same results:

```SELECT first_name, last_name
FROM customer
WHERE (first_name, last_name) IN (
SELECT first_name, last_name
FROM actor
)
```

or, using the more verbose, yet equivalent `EXISTS` predicate:

```SELECT first_name, last_name
FROM customer c
WHERE EXISTS (
SELECT 1
FROM actor a
WHERE (c.first_name, c.last_name)
= (a.first_name, a.last_name)
)
```

All of the above, again, yield:

```first_name   last_name
------------------------------------
JENNIFER     DAVIS
```

### EXCEPT

The `EXCEPT` operation is useful when you want to keep only those tuples that are present in one set, but not in another:

Running this query:

```SELECT first_name, last_name
FROM customer
EXCEPT
SELECT first_name, last_name
FROM staff
ORDER BY 1, 2
```

… will yield:

```first_name   last_name
------------------------------------
AARON        SELBY
AGNES        BISHOP
ALAN         KAHN
ALBERT       CROUSE
ALBERTO      HENNING
...
```

According to the original Venn diagrams, this can be tweaked using `LEFT JOIN` and a `IS NULL` predicate:

```SELECT first_name, last_name
FROM customer
LEFT JOIN staff
USING (first_name, last_name)
WHERE staff_id IS NULL
ORDER BY 1, 2
```

or with an `ON` clause:

```SELECT c.first_name, c.last_name
FROM customer c
LEFT JOIN staff s
ON (c.first_name, c.last_name)
=  (s.first_name, s.last_name)
WHERE staff_id IS NULL
ORDER BY 1, 2
```

This is completely unreadable and doesn’t communicate the fact that we’re removing tuples from a set `CUSTOMER`, given their presence in another set`STAFF`.

An equivalent version using anti-join might be more readable (watch out for `NULLs` in `NOT IN` predicates, though!):

```SELECT c.first_name, c.last_name
FROM customer c
WHERE (first_name, last_name) NOT IN (
SELECT first_name, last_name
FROM staff
)
ORDER BY 1, 2
```

… or, using `NOT EXISTS`:

```SELECT c.first_name, c.last_name
FROM customer c
WHERE NOT EXISTS (
SELECT 1
FROM staff s
WHERE (c.first_name, c.last_name)
= (s.first_name, s.last_name)
)
ORDER BY 1, 2
```

### Conclusion

`UNION`, `INTERSECT`, and `EXCEPT` are very simple, yet very useful operations that can add a lot of value every now and then in your daily SQL tasks. While`JOIN` operations are much more versatile, they are also more complex for the simple tasks that can be solved by `UNION`, `INTERSECT`, and `EXCEPT`.

Did you like this article? It’s part of the Data Geekery SQL Training – a 1-day workshop helping you to get the most out of the awesome SQL language.

### Best of Java 9-15: Cool New Java Features

Michael Inden (Java Developer)

### Reactive Microservice in Action

Emily Jiang (IBM)

### The Past, Present, and Future of Cloud Native API Gateways

Dr. Daniel Bryant (Big Picture Tech Ltd)

Author

Lukas Eder

Lukas is a Java and SQL aficionado. He’s the founder of Data Geekery GmbH, the company behind jOOQ, the best way to write SQL in Java.