9 Joining Data in SQL
https://learn.datacamp.com/courses/joining-data-in-postgresql
Main functions and concepts covered in this BP chapter:
inner join
left join
right join
full join
cross join
semi and anti join
case
union
intersect
except
- subqueries
Packages used in this chapter:
## Load all packages used in this chapter
library(RSQLite)
For the SQL chapters, you can use the “chinook.db” SQL database (which comes from here: https://www.sqlitetutorial.net/sqlite-sample-database/) and load it using the dbConnect
function (made available via the RSQLite
package):
<- dbConnect(SQLite(), "data/chinook.db") con
You can then create working examples using SQL code chunks, like this:
select FirstName, LastName FROM customers
FirstName | LastName |
---|---|
Luís | Gonçalves |
Leonie | Köhler |
François | Tremblay |
Bjørn | Hansen |
František | Wichterlová |
Helena | Holý |
Astrid | Gruber |
Daan | Peeters |
Kara | Nielsen |
Eduardo | Martins |
9.1 Introduction to joins
9.1.1 Inner Join
In this example, the INNER JOIN
clause matches each row from the tracks table with every row from the albums table based on the join condition (tracks.AlbumID = albums.AlbumID) specified after the ON keyword.
If the join condition evaluates to true (or 1), the columns of rows from both albums and tracks tables are included in the result set.
SELECT
Title,
NameFROM
tracksINNER JOIN albums
ON tracks.AlbumID = albums.AlbumID
Title | Name |
---|---|
For Those About To Rock We Salute You | For Those About To Rock (We Salute You) |
For Those About To Rock We Salute You | Put The Finger On You |
For Those About To Rock We Salute You | Let’s Get It Up |
For Those About To Rock We Salute You | Inject The Venom |
For Those About To Rock We Salute You | Snowballed |
For Those About To Rock We Salute You | Evil Walks |
For Those About To Rock We Salute You | C.O.D. |
For Those About To Rock We Salute You | Breaking The Rules |
For Those About To Rock We Salute You | Night Of The Long Knives |
For Those About To Rock We Salute You | Spellbound |
Alternative method using USING
where we sort the table in alphabetical order by title:
SELECT
Title,
NameFROM
tracksINNER JOIN albums USING(AlbumId)
ORDER BY Title
Title | Name |
---|---|
…And Justice For All | Blackened |
…And Justice For All | …And Justice For All |
…And Justice For All | Eye Of The Beholder |
…And Justice For All | One |
…And Justice For All | The Shortest Straw |
…And Justice For All | Harvester Of Sorrow |
…And Justice For All | The Frayed Ends Of Sanity |
…And Justice For All | To Live Is To Die |
…And Justice For All | Dyers Eve |
20th Century Masters - The Millennium Collection: The Best of Scorpions | Rock You Like a Hurricane |
You can join on two things using ON
and AND
. For example, we might want to join a country data set by country name, and then match up the years. If there is one year column for 2010 and one for 2015, we can do AND year1 = year1 - 5;
. This way, if the year is 2015, it will be matched with 2010.
9.2 Outer joins and cross joins
9.2.1 Left Join
The LEFT JOIN
clause selects data starting from the left table (artists) and matching rows in the right table (albums) based on the join condition (artists.ArtistId = albums.ArtistId) .
The left join returns all rows from the artists table (or left table) and the matching rows from the albums table (or right table).
If a row from the left table doesn’t have a matching row in the right table, SQLite includes columns of the rows in the left table and NULL
for the columns of the right table.
Note that RIGHT JOIN
does the exact same as left, but it returns all rows from the right table and the matching rows from the left table
If a row from the right table doesn’t have a matching row in the left table, SQLite includes columns of the rows in the right table and NULL
for the columns of the left table.
SELECT
Name,
TitleFROM
artistsLEFT JOIN albums ON
= albums.ArtistId
artists.ArtistId ORDER BY Name
Name | Title |
---|---|
A Cor Do Som | NA |
AC/DC | For Those About To Rock We Salute You |
AC/DC | Let There Be Rock |
Aaron Copland & London Symphony Orchestra | A Copland Celebration, Vol. I |
Aaron Goldberg | Worlds |
Academy of St. Martin in the Fields & Sir Neville Marriner | The World of Classical Favourites |
Academy of St. Martin in the Fields Chamber Ensemble & Sir Neville Marriner | Sir Neville Marriner: A Celebration |
Academy of St. Martin in the Fields, John Birch, Sir Neville Marriner & Sylvia McNair | Fauré: Requiem, Ravel: Pavane & Others |
Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart | Bach: Orchestral Suites Nos. 1 - 4 |
Academy of St. Martin in the Fields, Sir Neville Marriner & William Bennett | NA |
If you want to find artists who don’t have any albums, you can add a WHERE
clause as shown in the following query:
SELECT
Name,
TitleFROM
artistsLEFT JOIN albums ON
= albums.ArtistId
artists.ArtistId WHERE Title IS NULL
ORDER BY Name
Name | Title |
---|---|
A Cor Do Som | NA |
Academy of St. Martin in the Fields, Sir Neville Marriner & William Bennett | NA |
Aerosmith & Sierra Leone’s Refugee Allstars | NA |
Avril Lavigne | NA |
Azymuth | NA |
Baby Consuelo | NA |
Banda Black Rio | NA |
Barão Vermelho | NA |
Bebel Gilberto | NA |
Ben Harper | NA |
9.2.2 Full Join
In theory, the result of the FULL OUTER JOIN
is a combination of a LEFT JOIN
and a RIGHT JOIN
SELECT
Name,
TitleFROM
artistsFULL JOIN albums ON
= albums.ArtistId
artists.ArtistId ORDER BY Name
Name | Title |
---|---|
A Cor Do Som | NA |
AC/DC | For Those About To Rock We Salute You |
AC/DC | Let There Be Rock |
Aaron Copland & London Symphony Orchestra | A Copland Celebration, Vol. I |
Aaron Goldberg | Worlds |
Academy of St. Martin in the Fields & Sir Neville Marriner | The World of Classical Favourites |
Academy of St. Martin in the Fields Chamber Ensemble & Sir Neville Marriner | Sir Neville Marriner: A Celebration |
Academy of St. Martin in the Fields, John Birch, Sir Neville Marriner & Sylvia McNair | Fauré: Requiem, Ravel: Pavane & Others |
Academy of St. Martin in the Fields, Sir Neville Marriner & Thurston Dart | Bach: Orchestral Suites Nos. 1 - 4 |
Academy of St. Martin in the Fields, Sir Neville Marriner & William Bennett | NA |
9.2.3 Case
The CASE
expression evaluates a list of conditions and returns an expression based on the result of the evaluation. It is similar to the IF-THEN-ELSE statement in other programming languages.
In the following example, we look at if the customer’s country is the USA. If it is, we return the value “domestic.” If it is something else besides the USA, we return “foreign”. We call this new variable CustomerGroup.
SELECT customerid,
firstname,
lastname,CASE country
WHEN 'USA'
THEN 'Domestic'
ELSE 'Foreign'
END CustomerGroup
FROM
customersORDER BY
LastName, FirstName;
CustomerId | FirstName | LastName | CustomerGroup |
---|---|---|---|
12 | Roberto | Almeida | Foreign |
28 | Julia | Barnett | Domestic |
39 | Camille | Bernard | Foreign |
18 | Michelle | Brooks | Domestic |
29 | Robert | Brown | Foreign |
21 | Kathy | Chase | Domestic |
26 | Richard | Cunningham | Domestic |
41 | Marc | Dubois | Foreign |
34 | João | Fernandes | Foreign |
30 | Edward | Francis | Foreign |
9.2.4 Cross Join
If you use a LEFT JOIN
, INNER JOIN
, or CROSS JOIN
without the ON
or USING
clause, SQLite produces the Cartesian product of the involved tables. The number of rows in the Cartesian product is the product of the number of rows in each involved tables.
In the following example, we create a dataset that cross joins genres with media types, which produces a data set with each combination of media type and genre (i.e. MPEG-Rock, MPEG-Jazz, etc.)
create view mediatypes as
select Name as mediatype
from media_types
## Error: view mediatypes already exists
create view genres1 as
select Name as genre
from genres
## Error: view genres1 already exists
SELECT
mediatype, genreFROM
mediatypesCROSS JOIN genres1
mediatype | genre |
---|---|
MPEG audio file | Rock |
MPEG audio file | Jazz |
MPEG audio file | Metal |
MPEG audio file | Alternative & Punk |
MPEG audio file | Rock And Roll |
MPEG audio file | Blues |
MPEG audio file | Latin |
MPEG audio file | Reggae |
MPEG audio file | Pop |
MPEG audio file | Soundtrack |
9.3 Set theory clauses
9.3.1 Union
Sometimes, you need to combine data from multiple tables into a complete result set. It may be for tables with similar data within the same database or maybe you need to combine similar data from multiple databases.
To combine rows from two or more queries into a single result set, you use SQLite UNION
operator. You can also use UNION ALL
. UNION
and UNION ALL
operators combine rows from result sets into a single result set. The UNION
operator removes eliminate duplicate rows, whereas the UNION ALL
operator does not.
The following example combines the names of employees and customers into a single list:
SELECT FirstName, LastName, 'Employee' AS Type
FROM employees
UNION
SELECT FirstName, LastName, 'Customer'
FROM customers;
FirstName | LastName | Type |
---|---|---|
Aaron | Mitchell | Customer |
Alexandre | Rocha | Customer |
Andrew | Adams | Employee |
Astrid | Gruber | Customer |
Bjørn | Hansen | Customer |
Camille | Bernard | Customer |
Daan | Peeters | Customer |
Dan | Miller | Customer |
Diego | Gutiérrez | Customer |
Dominique | Lefebvre | Customer |
9.3.2 Intersect
SQLite INTERSECT
operator compares the result sets of two queries and returns distinct rows that are output by both queries.
The following example intersects the customers and invoices tables on CustomerID to find which customers appear in the invoices table.
select *
from customers
CustomerId | FirstName | LastName | Company | Address | City | State | Country | PostalCode | Phone | Fax | SupportRepId | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Luís | Gonçalves | Embraer - Empresa Brasileira de Aeronáutica S.A. | Av. Brigadeiro Faria Lima, 2170 | São José dos Campos | SP | Brazil | 12227-000 | +55 (12) 3923-5555 | +55 (12) 3923-5566 | luisg@embraer.com.br | 3 |
2 | Leonie | Köhler | NA | Theodor-Heuss-Straße 34 | Stuttgart | NA | Germany | 70174 | +49 0711 2842222 | NA | leonekohler@surfeu.de | 5 |
3 | François | Tremblay | NA | 1498 rue Bélanger | Montréal | QC | Canada | H2G 1A7 | +1 (514) 721-4711 | NA | ftremblay@gmail.com | 3 |
4 | Bjørn | Hansen | NA | Ullevålsveien 14 | Oslo | NA | Norway | 0171 | +47 22 44 22 22 | NA | bjorn.hansen@yahoo.no | 4 |
5 | František | Wichterlová | JetBrains s.r.o. | Klanova 9/506 | Prague | NA | Czech Republic | 14700 | +420 2 4172 5555 | +420 2 4172 5555 | frantisekw@jetbrains.com | 4 |
6 | Helena | Holý | NA | Rilská 3174/6 | Prague | NA | Czech Republic | 14300 | +420 2 4177 0449 | NA | hholy@gmail.com | 5 |
7 | Astrid | Gruber | NA | Rotenturmstraße 4, 1010 Innere Stadt | Vienne | NA | Austria | 1010 | +43 01 5134505 | NA | astrid.gruber@apple.at | 5 |
8 | Daan | Peeters | NA | Grétrystraat 63 | Brussels | NA | Belgium | 1000 | +32 02 219 03 03 | NA | daan_peeters@apple.be | 4 |
9 | Kara | Nielsen | NA | Sønder Boulevard 51 | Copenhagen | NA | Denmark | 1720 | +453 3331 9991 | NA | kara.nielsen@jubii.dk | 4 |
10 | Eduardo | Martins | Woodstock Discos | Rua Dr. Falcão Filho, 155 | São Paulo | SP | Brazil | 01007-010 | +55 (11) 3033-5446 | +55 (11) 3033-4564 | eduardo@woodstock.com.br | 4 |
select *
from invoices
InvoiceId | CustomerId | InvoiceDate | BillingAddress | BillingCity | BillingState | BillingCountry | BillingPostalCode | Total |
---|---|---|---|---|---|---|---|---|
1 | 2 | 2009-01-01 00:00:00 | Theodor-Heuss-Straße 34 | Stuttgart | NA | Germany | 70174 | 1.98 |
2 | 4 | 2009-01-02 00:00:00 | Ullevålsveien 14 | Oslo | NA | Norway | 0171 | 3.96 |
3 | 8 | 2009-01-03 00:00:00 | Grétrystraat 63 | Brussels | NA | Belgium | 1000 | 5.94 |
4 | 14 | 2009-01-06 00:00:00 | 8210 111 ST NW | Edmonton | AB | Canada | T6G 2C7 | 8.91 |
5 | 23 | 2009-01-11 00:00:00 | 69 Salem Street | Boston | MA | USA | 2113 | 13.86 |
6 | 37 | 2009-01-19 00:00:00 | Berger Straße 10 | Frankfurt | NA | Germany | 60316 | 0.99 |
7 | 38 | 2009-02-01 00:00:00 | Barbarossastraße 19 | Berlin | NA | Germany | 10779 | 1.98 |
8 | 40 | 2009-02-01 00:00:00 | 8, Rue Hanovre | Paris | NA | France | 75002 | 1.98 |
9 | 42 | 2009-02-02 00:00:00 | 9, Place Louis Barthou | Bordeaux | NA | France | 33000 | 3.96 |
10 | 46 | 2009-02-03 00:00:00 | 3 Chatham Street | Dublin | Dublin | Ireland | NA | 5.94 |
SELECT CustomerId
FROM customers
INTERSECT
SELECT CustomerId
FROM invoices
ORDER BY CustomerId;
CustomerId |
---|
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
9.3.3 Except
SQLite EXCEPT
operator compares the result sets of two queries and returns distinct rows from the left query that are not output by the right query.
The following example finds artist ids of artists who do not have any album in the albums table:
SELECT ArtistId
FROM artists
EXCEPT
SELECT ArtistId
FROM albums;
ArtistId |
---|
25 |
26 |
28 |
29 |
30 |
31 |
32 |
33 |
34 |
35 |
9.3.4 Semi and Anti Join
These joins determine which records to keep in the left table. In other words, you use these last two joins (semi-join and anti-join) in a way similar to a WHERE clause dependent on the values of a second table. The previous examples using intersect and except can function as semi and anti join respectively. Intersect kept only the customer IDs that were found in the invoice table. Except kept only the artist IDs that were not found in the album table.
9.4 Subqueries
A subquery is a SELECT
statement nested in another statement. See the following statement.
9.4.1 Where
You can use a simple subquery as a search condition. For example, the following statement returns all the tracks in the album with the title “Let There Be Rock.”
SELECT trackid,
name,
albumidFROM tracks
WHERE albumid = (
SELECT albumid
FROM albums
WHERE title = 'Let There Be Rock'
);
TrackId | Name | AlbumId |
---|---|---|
15 | Go Down | 4 |
16 | Dog Eat Dog | 4 |
17 | Let There Be Rock | 4 |
18 | Bad Boy Boogie | 4 |
19 | Problem Child | 4 |
20 | Overdose | 4 |
21 | Hell Ain’t A Bad Place To Be | 4 |
22 | Whole Lotta Rosie | 4 |
9.4.2 From
Sometimes you want to apply aggregate functions to a column multiple times. For example, first, you want to sum the size of an album and then calculate the average size of all albums. You may come up with the following query.
SELECT
AVG(album.size)
FROM
(SELECT
SUM(bytes) SIZE
FROM
tracksGROUP BY
albumidAS album; )
AVG(album.size) |
---|
338288920 |
We also might want the average milliseconds of
select *
from tracks
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice |
---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
2 | Balls to the Wall | 2 | 2 | 1 | NA | 342562 | 5510424 | 0.99 |
3 | Fast As a Shark | 3 | 2 | 1 | F. Baltes, S. Kaufman, U. Dirkscneider & W. Hoffman | 230619 | 3990994 | 0.99 |
4 | Restless and Wild | 3 | 2 | 1 | F. Baltes, R.A. Smith-Diesel, S. Kaufman, U. Dirkscneider & W. Hoffman | 252051 | 4331779 | 0.99 |
5 | Princess of the Dawn | 3 | 2 | 1 | Deaffy & R.A. Smith-Diesel | 375418 | 6290521 | 0.99 |
6 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 |
7 | Let’s Get It Up | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 233926 | 7636561 | 0.99 |
8 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 |
9 | Snowballed | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 203102 | 6599424 | 0.99 |
10 | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 |
create view albumseconds as
SELECT
Title,
Name,
MillisecondsFROM
tracksINNER JOIN albums
ON tracks.AlbumID = albums.AlbumID
## Error: view albumseconds already exists
We also might want the average milliseconds of each album.
SELECT
AVG(album.length)
FROM
(SELECT
SUM(milliseconds) LENGTH
FROM
albumsecondsGROUP BY
TitleAS ALBUM; )
AVG(album.length) |
---|
3973424 |
9.4.4 Select
The following query uses a correlated subquery in the SELECT clause to return the number of tracks in an album.
SELECT albumid,
title,
(SELECT count(trackid)
FROM tracks
WHERE tracks.AlbumId = albums.AlbumId
)
tracks_countFROM albums
ORDER BY tracks_count DESC;
AlbumId | Title | tracks_count |
---|---|---|
141 | Greatest Hits | 57 |
23 | Minha Historia | 34 |
73 | Unplugged | 30 |
229 | Lost, Season 3 | 26 |
230 | Lost, Season 1 | 25 |
251 | The Office, Season 3 | 25 |
83 | My Way: The Best Of Frank Sinatra [Disc 1] | 24 |
231 | Lost, Season 2 | 24 |
253 | Battlestar Galactica (Classic), Season 1 | 24 |
24 | Afrociberdelia | 23 |