8 Introduction to SQL

https://learn.datacamp.com/courses/introduction-to-sql

Main functions and concepts covered in this BP chapter:

  1. select
  2. from
  3. where
  4. like
  5. order by

Packages used in this chapter:

## Load all packages used in this chapter
library(RSQLite)

For the SQL chapters, you can use the “chinook.db” SQL database (which comes from here: https://www.sqlitetutorial.net/sqlite-sample-database/) and load it using the dbConnect function (made available via the RSQLite package):

con <- dbConnect(SQLite(), "data/chinook.db")

You can then create working examples using SQL code chunks, like this:

select FirstName, LastName FROM customers
Table 8.1: Displaying records 1 - 10
FirstName LastName
Luís Gonçalves
Leonie Köhler
François Tremblay
Bjørn Hansen
František Wichterlová
Helena Holý
Astrid Gruber
Daan Peeters
Kara Nielsen
Eduardo Martins

8.1 Notes

8.1.1 Introductory Notes

A database stores data and information. This information is housed in objects called tables, with data organized into rows and columns. A relational database defines relationships between tables of data inside the database. The biggest advantage of a database is that many users can write queries to gather insights from the data at the same time. When a database is queried, the data stored inside the database does not change: rather, the database information is accessed and presented according to instructions in the query.

Tables are organized into rows and columns; in the world of databases, rows are often referred to as records and columns as fields. A table’s fields are limited to those set when the database was created, but the number of rows is unlimited.

Table names should be lowercase and should not include spaces - we use underscores in place of spaces. And ideally, a table name would refer to a collective group (like “inventory”) but it’s also okay for the table to have a plural name (such as “products”).

A record is a row in a table. It holds data on an individual observation. A field is a column in a table. It holds one piece of information about all observations in the table. Because field names must be typed out when querying a database with SQL, field naming is important. Generally, field names should be lowercase and should not involve spaces. A field name should be singular rather than plural because it refers to the information contained in that field for a single record.

When a table is created, a data type must be indicated for each field. The data type is chosen based on the type of data that the field will hold - a number, text, or a date for example. We use data types for several reasons. First, different types of data are stored differently and take up different amounts of storage space. Second, some operations only apply to certain data types. It makes sense to multiply a number by another number, but it does not make sense to multiply text by other text for example.

In programming, a “string” refers to a sequence of characters such as letters or punctuation. SQL’s VARCHAR data type is more flexible and can store small or large strings - up to tens of thousands of characters! Because of its flexibility, VARCHAR is very commonly used for storing strings.

Integer data types store whole numbers, such as the years in the member_year column of the patrons table. The NUMERIC data type can store floats which have up to 38 digits total - including those before and after the decimal point.

Schemas are often referred to as “blueprints” of databases. A schema shows a database’s design, such as what tables are included in the database and any relationships between its tables. A schema also lets the reader know what data type each field can hold.

8.1.2 SQL Flavors

SQL has a few different versions, or flavors. Some are free, while others have customer support and are made to complement major databases such as Microsoft’s SQL Server or Oracle Database, which are used by many companies. All SQL flavors are used with table-based relational databases like the ones we’ve seen, and the vast majority of keywords are shared between them. One of the two main versions is PostgreSQL, which is a free and open-source relational database system which was originally created at the University of California, Berkeley. The other is SQL Server, which is also a relational database system which comes in both free and enterprise versions. It was created by Microsoft, so it pairs well with other Microsoft products.

Here’s an example of a small difference between SQL Server and PostgreSQL: when we want to limit the number of records returned, we use the LIMIT keyword in PostgreSQL. Here, we limit the number of employee names and ids selected to only the first two records. The exact same results are achieved in SQL Server using the TOP keyword instead of LIMIT. Notice that this keyword is the only difference between the two queries. Limiting results is useful when testing code, since many result sets can have thousands of results! It’s best to write and test code using just a few results before removing the LIMIT for the final query.

8.2 Selecting columns

To select certain columns from a certain table, we can use SELECT and FROM. List the fields you wish to select after SELECT or use an * if you want to select all fields. After FROM list the table from which you want to pull the fields from.

SELECT *
FROM albums
Table 8.2: Displaying records 1 - 10
AlbumId Title ArtistId
1 For Those About To Rock We Salute You 1
2 Balls to the Wall 2
3 Restless and Wild 2
4 Let There Be Rock 1
5 Big Ones 3
6 Jagged Little Pill 4
7 Facelift 5
8 Warner 25 Anos 6
9 Plays Metallica By Four Cellos 7
10 Audioslave 8
SELECT EmployeeId, LastName
FROM employees
Table 8.3: 8 records
EmployeeId LastName
1 Adams
2 Edwards
3 Peacock
4 Park
5 Johnson
6 Mitchell
7 King
8 Callahan

You can also rename fields using SELECT ___ AS ___

SELECT Title AS album_title, AlbumID
FROM albums
Table 8.4: Displaying records 1 - 10
album_title AlbumId
For Those About To Rock We Salute You 1
Balls to the Wall 2
Restless and Wild 3
Let There Be Rock 4
Big Ones 5
Jagged Little Pill 6
Facelift 7
Warner 25 Anos 8
Plays Metallica By Four Cellos 9
Audioslave 10

We can also select distinct values in a field using SELECT DISTINCT.

SELECT DISTINCT Name
FROM playlists
Table 8.5: Displaying records 1 - 10
Name
Music
Movies
TV Shows
Audiobooks
90’s Music
Music Videos
Brazilian Music
Classical
Classical 101 - Deep Cuts
Classical 101 - Next Steps

We can combine distinct and as to get a new field with just the unique values of Name

SELECT DISTINCT Name AS unique_name
FROM playlists
Table 8.6: Displaying records 1 - 10
unique_name
Music
Movies
TV Shows
Audiobooks
90’s Music
Music Videos
Brazilian Music
Classical
Classical 101 - Deep Cuts
Classical 101 - Next Steps

We can store data in a view that we create using CREATE VIEW.

CREATE VIEW unique_names AS
SELECT DISTINCT Name AS unique_name
FROM playlists
## Error: view unique_names already exists
SELECT *
FROM unique_names
Table 8.7: Displaying records 1 - 10
unique_name
Music
Movies
TV Shows
Audiobooks
90’s Music
Music Videos
Brazilian Music
Classical
Classical 101 - Deep Cuts
Classical 101 - Next Steps

You can also limit the query to only the first x amount of rows.

SELECT *
FROM employees
LIMIT 4
Table 8.8: 4 records
EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City State Country PostalCode Phone Fax Email
1 Adams Andrew General Manager NA 1962-02-18 00:00:00 2002-08-14 00:00:00 11120 Jasper Ave NW Edmonton AB Canada T5K 2N1 +1 (780) 428-9482 +1 (780) 428-3457
2 Edwards Nancy Sales Manager 1 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SW Calgary AB Canada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322
3 Peacock Jane Sales Support Agent 2 1973-08-29 00:00:00 2002-04-01 00:00:00 1111 6 Ave SW Calgary AB Canada T2P 5M5 +1 (403) 262-3443 +1 (403) 262-6712
4 Park Margaret Sales Support Agent 2 1947-09-19 00:00:00 2003-05-03 00:00:00 683 10 Street SW Calgary AB Canada T2P 5G3 +1 (403) 263-4423 +1 (403) 263-4289

8.3 Filtering rows

We can use WHERE to filter rows. This link goes over the different operators used to filter. https://www.sqlitetutorial.net/sqlite-where/

SELECT *
FROM employees
WHERE 
ReportsTo = 1
Table 8.9: 2 records
EmployeeId LastName FirstName Title ReportsTo BirthDate HireDate Address City State Country PostalCode Phone Fax Email
2 Edwards Nancy Sales Manager 1 1958-12-08 00:00:00 2002-05-01 00:00:00 825 8 Ave SW Calgary AB Canada T2P 2T3 +1 (403) 262-3443 +1 (403) 262-3322
6 Mitchell Michael IT Manager 1 1973-07-01 00:00:00 2003-10-17 00:00:00 5827 Bowness Road NW Calgary AB Canada T3B 0C5 +1 (403) 246-9887 +1 (403) 246-9899
SELECT *
FROM tracks
WHERE 
GenreId = 1
AND
Milliseconds < 200000
Table 8.10: Displaying records 1 - 10
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
11 C.O.D. 1 1 1 Angus Young, Malcolm Young, Brian Johnson 199836 6566314 0.99
40 Perfect 6 1 1 Alanis Morissette & Glenn Ballard 188133 6145404 0.99
42 Right Through You 6 1 1 Alanis Morissette & Glenn Ballard 176117 5793082 0.99
51 We Die Young 7 1 1 Jerry Cantrell 152084 4925362 0.99
59 Put You Down 7 1 1 Jerry Cantrell 196231 6420530 0.99
339 Communication Breakdown 30 1 1 Jimmy Page/John Bonham/John Paul Jones 192653 6287257 0.99
341 The Girl I Love She Got Long Black Wavy Hair 30 1 1 Jimmy Page/John Bonham/John Estes/John Paul Jones/Robert Plant 183327 5995686 0.99
343 Communication Breakdown(2) 30 1 1 Jimmy Page/John Bonham/John Paul Jones 161149 5261022 0.99
346 Somethin’ Else 30 1 1 Bob Cochran/Sharon Sheeley 127869 4165650 0.99
347 Communication Breakdown(3) 30 1 1 Jimmy Page/John Bonham/John Paul Jones 185260 6041133 0.99

Sometimes, you may not remember exactly the data that you want to search. In this case, you perform an inexact search using the LIKE operator.

SELECT *
FROM tracks
WHERE
Composer LIKE '%Johnson%'
Table 8.11: Displaying records 1 - 10
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99
6 Put The Finger On You 1 1 1 Angus Young, Malcolm Young, Brian Johnson 205662 6713451 0.99
7 Let’s Get It Up 1 1 1 Angus Young, Malcolm Young, Brian Johnson 233926 7636561 0.99
8 Inject The Venom 1 1 1 Angus Young, Malcolm Young, Brian Johnson 210834 6852860 0.99
9 Snowballed 1 1 1 Angus Young, Malcolm Young, Brian Johnson 203102 6599424 0.99
10 Evil Walks 1 1 1 Angus Young, Malcolm Young, Brian Johnson 263497 8611245 0.99
11 C.O.D. 1 1 1 Angus Young, Malcolm Young, Brian Johnson 199836 6566314 0.99
12 Breaking The Rules 1 1 1 Angus Young, Malcolm Young, Brian Johnson 263288 8596840 0.99
13 Night Of The Long Knives 1 1 1 Angus Young, Malcolm Young, Brian Johnson 205688 6706347 0.99
14 Spellbound 1 1 1 Angus Young, Malcolm Young, Brian Johnson 270863 8817038 0.99

The IN operator allows you to check whether a value is in a list of a comma-separated list of values.

SELECT *
FROM tracks
WHERE
AlbumId IN (1,2)
Table 8.12: Displaying records 1 - 10
TrackId Name AlbumId MediaTypeId GenreId Composer Milliseconds Bytes UnitPrice
1 For Those About To Rock (We Salute You) 1 1 1 Angus Young, Malcolm Young, Brian Johnson 343719 11170334 0.99
6 Put The Finger On You 1 1 1 Angus Young, Malcolm Young, Brian Johnson 205662 6713451 0.99
7 Let’s Get It Up 1 1 1 Angus Young, Malcolm Young, Brian Johnson 233926 7636561 0.99
8 Inject The Venom 1 1 1 Angus Young, Malcolm Young, Brian Johnson 210834 6852860 0.99
9 Snowballed 1 1 1 Angus Young, Malcolm Young, Brian Johnson 203102 6599424 0.99
10 Evil Walks 1 1 1 Angus Young, Malcolm Young, Brian Johnson 263497 8611245 0.99
11 C.O.D. 1 1 1 Angus Young, Malcolm Young, Brian Johnson 199836 6566314 0.99
12 Breaking The Rules 1 1 1 Angus Young, Malcolm Young, Brian Johnson 263288 8596840 0.99
13 Night Of The Long Knives 1 1 1 Angus Young, Malcolm Young, Brian Johnson 205688 6706347 0.99
14 Spellbound 1 1 1 Angus Young, Malcolm Young, Brian Johnson 270863 8817038 0.99