8 Introduction to SQL
https://learn.datacamp.com/courses/introduction-to-sql
Main functions and concepts covered in this BP chapter:
select
from
where
like
order by
Packages used in this chapter:
## Load all packages used in this chapter
library(RSQLite)
For the SQL chapters, you can use the “chinook.db” SQL database (which comes from here: https://www.sqlitetutorial.net/sqlite-sample-database/) and load it using the dbConnect
function (made available via the RSQLite
package):
<- dbConnect(SQLite(), "data/chinook.db") con
You can then create working examples using SQL code chunks, like this:
select FirstName, LastName FROM customers
FirstName | LastName |
---|---|
Luís | Gonçalves |
Leonie | Köhler |
François | Tremblay |
Bjørn | Hansen |
František | Wichterlová |
Helena | Holý |
Astrid | Gruber |
Daan | Peeters |
Kara | Nielsen |
Eduardo | Martins |
8.1 Notes
8.1.1 Introductory Notes
A database stores data and information. This information is housed in objects called tables, with data organized into rows and columns. A relational database defines relationships between tables of data inside the database. The biggest advantage of a database is that many users can write queries to gather insights from the data at the same time. When a database is queried, the data stored inside the database does not change: rather, the database information is accessed and presented according to instructions in the query.
Tables are organized into rows and columns; in the world of databases, rows are often referred to as records and columns as fields. A table’s fields are limited to those set when the database was created, but the number of rows is unlimited.
Table names should be lowercase and should not include spaces - we use underscores in place of spaces. And ideally, a table name would refer to a collective group (like “inventory”) but it’s also okay for the table to have a plural name (such as “products”).
A record is a row in a table. It holds data on an individual observation. A field is a column in a table. It holds one piece of information about all observations in the table. Because field names must be typed out when querying a database with SQL, field naming is important. Generally, field names should be lowercase and should not involve spaces. A field name should be singular rather than plural because it refers to the information contained in that field for a single record.
When a table is created, a data type must be indicated for each field. The data type is chosen based on the type of data that the field will hold - a number, text, or a date for example. We use data types for several reasons. First, different types of data are stored differently and take up different amounts of storage space. Second, some operations only apply to certain data types. It makes sense to multiply a number by another number, but it does not make sense to multiply text by other text for example.
In programming, a “string” refers to a sequence of characters such as letters or punctuation. SQL’s VARCHAR data type is more flexible and can store small or large strings - up to tens of thousands of characters! Because of its flexibility, VARCHAR is very commonly used for storing strings.
Integer data types store whole numbers, such as the years in the member_year column of the patrons table. The NUMERIC data type can store floats which have up to 38 digits total - including those before and after the decimal point.
Schemas are often referred to as “blueprints” of databases. A schema shows a database’s design, such as what tables are included in the database and any relationships between its tables. A schema also lets the reader know what data type each field can hold.
8.1.2 SQL Flavors
SQL has a few different versions, or flavors. Some are free, while others have customer support and are made to complement major databases such as Microsoft’s SQL Server or Oracle Database, which are used by many companies. All SQL flavors are used with table-based relational databases like the ones we’ve seen, and the vast majority of keywords are shared between them. One of the two main versions is PostgreSQL, which is a free and open-source relational database system which was originally created at the University of California, Berkeley. The other is SQL Server, which is also a relational database system which comes in both free and enterprise versions. It was created by Microsoft, so it pairs well with other Microsoft products.
Here’s an example of a small difference between SQL Server and PostgreSQL: when we want to limit the number of records returned, we use the LIMIT keyword in PostgreSQL. Here, we limit the number of employee names and ids selected to only the first two records. The exact same results are achieved in SQL Server using the TOP keyword instead of LIMIT. Notice that this keyword is the only difference between the two queries. Limiting results is useful when testing code, since many result sets can have thousands of results! It’s best to write and test code using just a few results before removing the LIMIT for the final query.
8.2 Selecting columns
To select certain columns from a certain table, we can use SELECT
and FROM
. List the fields you wish to select after SELECT
or use an * if you want to select all fields. After FROM
list the table from which you want to pull the fields from.
SELECT *
FROM albums
AlbumId | Title | ArtistId |
---|---|---|
1 | For Those About To Rock We Salute You | 1 |
2 | Balls to the Wall | 2 |
3 | Restless and Wild | 2 |
4 | Let There Be Rock | 1 |
5 | Big Ones | 3 |
6 | Jagged Little Pill | 4 |
7 | Facelift | 5 |
8 | Warner 25 Anos | 6 |
9 | Plays Metallica By Four Cellos | 7 |
10 | Audioslave | 8 |
SELECT EmployeeId, LastName
FROM employees
EmployeeId | LastName |
---|---|
1 | Adams |
2 | Edwards |
3 | Peacock |
4 | Park |
5 | Johnson |
6 | Mitchell |
7 | King |
8 | Callahan |
You can also rename fields using SELECT ___ AS ___
SELECT Title AS album_title, AlbumID
FROM albums
album_title | AlbumId |
---|---|
For Those About To Rock We Salute You | 1 |
Balls to the Wall | 2 |
Restless and Wild | 3 |
Let There Be Rock | 4 |
Big Ones | 5 |
Jagged Little Pill | 6 |
Facelift | 7 |
Warner 25 Anos | 8 |
Plays Metallica By Four Cellos | 9 |
Audioslave | 10 |
We can also select distinct values in a field using SELECT DISTINCT
.
SELECT DISTINCT Name
FROM playlists
Name |
---|
Music |
Movies |
TV Shows |
Audiobooks |
90’s Music |
Music Videos |
Brazilian Music |
Classical |
Classical 101 - Deep Cuts |
Classical 101 - Next Steps |
We can combine distinct and as to get a new field with just the unique values of Name
SELECT DISTINCT Name AS unique_name
FROM playlists
unique_name |
---|
Music |
Movies |
TV Shows |
Audiobooks |
90’s Music |
Music Videos |
Brazilian Music |
Classical |
Classical 101 - Deep Cuts |
Classical 101 - Next Steps |
We can store data in a view that we create using CREATE VIEW
.
CREATE VIEW unique_names AS
SELECT DISTINCT Name AS unique_name
FROM playlists
## Error: view unique_names already exists
SELECT *
FROM unique_names
unique_name |
---|
Music |
Movies |
TV Shows |
Audiobooks |
90’s Music |
Music Videos |
Brazilian Music |
Classical |
Classical 101 - Deep Cuts |
Classical 101 - Next Steps |
You can also limit the query to only the first x amount of rows.
SELECT *
FROM employees
LIMIT 4
EmployeeId | LastName | FirstName | Title | ReportsTo | BirthDate | HireDate | Address | City | State | Country | PostalCode | Phone | Fax | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | Adams | Andrew | General Manager | NA | 1962-02-18 00:00:00 | 2002-08-14 00:00:00 | 11120 Jasper Ave NW | Edmonton | AB | Canada | T5K 2N1 | +1 (780) 428-9482 | +1 (780) 428-3457 | andrew@chinookcorp.com |
2 | Edwards | Nancy | Sales Manager | 1 | 1958-12-08 00:00:00 | 2002-05-01 00:00:00 | 825 8 Ave SW | Calgary | AB | Canada | T2P 2T3 | +1 (403) 262-3443 | +1 (403) 262-3322 | nancy@chinookcorp.com |
3 | Peacock | Jane | Sales Support Agent | 2 | 1973-08-29 00:00:00 | 2002-04-01 00:00:00 | 1111 6 Ave SW | Calgary | AB | Canada | T2P 5M5 | +1 (403) 262-3443 | +1 (403) 262-6712 | jane@chinookcorp.com |
4 | Park | Margaret | Sales Support Agent | 2 | 1947-09-19 00:00:00 | 2003-05-03 00:00:00 | 683 10 Street SW | Calgary | AB | Canada | T2P 5G3 | +1 (403) 263-4423 | +1 (403) 263-4289 | margaret@chinookcorp.com |
8.3 Filtering rows
We can use WHERE
to filter rows. This link goes over the different operators used to filter. https://www.sqlitetutorial.net/sqlite-where/
SELECT *
FROM employees
WHERE
= 1 ReportsTo
EmployeeId | LastName | FirstName | Title | ReportsTo | BirthDate | HireDate | Address | City | State | Country | PostalCode | Phone | Fax | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | Edwards | Nancy | Sales Manager | 1 | 1958-12-08 00:00:00 | 2002-05-01 00:00:00 | 825 8 Ave SW | Calgary | AB | Canada | T2P 2T3 | +1 (403) 262-3443 | +1 (403) 262-3322 | nancy@chinookcorp.com |
6 | Mitchell | Michael | IT Manager | 1 | 1973-07-01 00:00:00 | 2003-10-17 00:00:00 | 5827 Bowness Road NW | Calgary | AB | Canada | T3B 0C5 | +1 (403) 246-9887 | +1 (403) 246-9899 | michael@chinookcorp.com |
SELECT *
FROM tracks
WHERE
= 1
GenreId AND
< 200000 Milliseconds
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice |
---|---|---|---|---|---|---|---|---|
11 | C.O.D. | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 199836 | 6566314 | 0.99 |
40 | Perfect | 6 | 1 | 1 | Alanis Morissette & Glenn Ballard | 188133 | 6145404 | 0.99 |
42 | Right Through You | 6 | 1 | 1 | Alanis Morissette & Glenn Ballard | 176117 | 5793082 | 0.99 |
51 | We Die Young | 7 | 1 | 1 | Jerry Cantrell | 152084 | 4925362 | 0.99 |
59 | Put You Down | 7 | 1 | 1 | Jerry Cantrell | 196231 | 6420530 | 0.99 |
339 | Communication Breakdown | 30 | 1 | 1 | Jimmy Page/John Bonham/John Paul Jones | 192653 | 6287257 | 0.99 |
341 | The Girl I Love She Got Long Black Wavy Hair | 30 | 1 | 1 | Jimmy Page/John Bonham/John Estes/John Paul Jones/Robert Plant | 183327 | 5995686 | 0.99 |
343 | Communication Breakdown(2) | 30 | 1 | 1 | Jimmy Page/John Bonham/John Paul Jones | 161149 | 5261022 | 0.99 |
346 | Somethin’ Else | 30 | 1 | 1 | Bob Cochran/Sharon Sheeley | 127869 | 4165650 | 0.99 |
347 | Communication Breakdown(3) | 30 | 1 | 1 | Jimmy Page/John Bonham/John Paul Jones | 185260 | 6041133 | 0.99 |
Sometimes, you may not remember exactly the data that you want to search. In this case, you perform an inexact search using the LIKE
operator.
SELECT *
FROM tracks
WHERE
LIKE '%Johnson%' Composer
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice |
---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
6 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 |
7 | Let’s Get It Up | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 233926 | 7636561 | 0.99 |
8 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 |
9 | Snowballed | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 203102 | 6599424 | 0.99 |
10 | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 |
11 | C.O.D. | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 199836 | 6566314 | 0.99 |
12 | Breaking The Rules | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263288 | 8596840 | 0.99 |
13 | Night Of The Long Knives | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205688 | 6706347 | 0.99 |
14 | Spellbound | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 270863 | 8817038 | 0.99 |
The IN
operator allows you to check whether a value is in a list of a comma-separated list of values.
SELECT *
FROM tracks
WHERE
IN (1,2) AlbumId
TrackId | Name | AlbumId | MediaTypeId | GenreId | Composer | Milliseconds | Bytes | UnitPrice |
---|---|---|---|---|---|---|---|---|
1 | For Those About To Rock (We Salute You) | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 343719 | 11170334 | 0.99 |
6 | Put The Finger On You | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205662 | 6713451 | 0.99 |
7 | Let’s Get It Up | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 233926 | 7636561 | 0.99 |
8 | Inject The Venom | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 210834 | 6852860 | 0.99 |
9 | Snowballed | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 203102 | 6599424 | 0.99 |
10 | Evil Walks | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263497 | 8611245 | 0.99 |
11 | C.O.D. | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 199836 | 6566314 | 0.99 |
12 | Breaking The Rules | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 263288 | 8596840 | 0.99 |
13 | Night Of The Long Knives | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 205688 | 6706347 | 0.99 |
14 | Spellbound | 1 | 1 | 1 | Angus Young, Malcolm Young, Brian Johnson | 270863 | 8817038 | 0.99 |