The Big Picture:
-
SQL is a declarative language (cf. imperative). You say what information you want instead of scripting how to get it.
-
Queries are on tables and queries return tables. This means you use the output of one query as the input to another to easily create complex queries.
-
Common SQL in a Nutshell:
- querying data: SELECT, WHERE, =/IN, ORDER BY, ASC/DESC, LIMIT, joins
- changing data: INSERT, UPDATE
- aggregation: COUNT, AVG, MIN, MAX, SUM, GROUP BY
Example: Supervisors and Grants
Properties of our dataset:
- A supervisor has a name and belongs to exactly 1 research group
- A grants has a name and an amount
- Supervisors can have 0-many grants
Our schema:
supervisors
+------------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| supervisor | varchar(100) | YES | | NULL | |
| groupname | varchar(25) | YES | | NULL | |
+------------+--------------+------+-----+---------+-------+
grants
+--------+--------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| id | int(11) | NO | PRI | NULL | |
| name | varchar(100) | YES | | NULL | |
| amount | int(11) | YES | | NULL | |
+--------+--------------+------+-----+---------+-------+
supervisors_grants
+---------------+---------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+---------------+---------+------+-----+---------+-------+
| supervisor_id | int(11) | NO | PRI | 0 | |
| grant_id | int(11) | NO | PRI | 0 | |
+---------------+---------+------+-----+---------+-------+
The dataset from this toy database can be found
here.
Querying Data
Give us all the information on every supervisor whose name ends in 'Miller':
SELECT * FROM supervisors WHERE supervisor LIKE '%Miller';
Give us a table containing only the name and amount for grants bigger than $10,000;
SELECT name, amount FROM grants WHERE amount>1000;
Give us all the information on supervisors in the EECS group with an ID under 50
SELECT * FROM supervisors WHERE groupname='EECS' AND id<50;
Give us the names of the 10 largest grants
SELECT name FROM grants ORDER BY amount DESC LIMIT 10;
Modifying Data
INSERT INTO table_name (column1, column2, column3,...)
VALUES (value1, value2, value3,...)
INSERT INTO supervisors VALUES (1, 'Rob Miller');
UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
UPDATE grants SET value=5000 WHERE id=12;
DELETE FROM table_name
WHERE some_column=some_value
DELETE FROM supervisors WHERE name LIKE '%Miller';
Aggregation
Give us the average grant size and the total grant money in the grants table:
SELECT AVG(amount) FROM grants;
SELECT SUM(amount) FROM grants;
Give us the names and values of the largest and smallest grants:
SELECT name, MAX(amount) FROM grants;
SELECT name, MIN(amount) FROM grants;
Give us the number of supervisors in our table:
SELECT COUNT(*) FROM supervisors;
Joins
Use joins to connect tables so you can query across them. Without a
predicate on the join you get the Cartesian product of two
tables. With a predicate, some column value in the first table must
match a column value in the second table for that set of rows to make
it into the resulting temporary join table.
Joins have very efficient implementations in modern database
management systems and are much faster than correlated subqueries (a
SELECT nested inside a SELECT). Use them!
Without a qualifier you get the Cartesian product of two tables. You
can give tables temporary short names to make your query shorter and
facilitate unambiguous references to columns from a particular table:
SELECT * from supervisors, grants;
or
SELECT * from supervisors s, grants g;
Give us the name and amount for every grant on which Rob Miller was
the supervisor. Note how a predicate is used to link supervisors to
supervisor_grants, and then another predicate is used to link
supervisor_grants to grants so that queries can now go across all
three tables:
SELECT g.name, g.amount FROM supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id AND s.supervisor='Rob Miller';
Give us the name of every supervisor that has worked on a grant whose name includes the word 'wireless':
SELECT s.supervisor from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id AND g.name LIKE '%wireless%';
Give us the name of every supervisor that has worked on a grant whose name includes the word 'wireless':
SELECT s.supervisor from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id AND g.name LIKE '%wireless%';
Give us the total of the grants Rob Miller has:
SELECT SUM(g.amount) from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id AND s.supervisor='Rob Miller';
Give us the total of the grants Rob Miller has. This query uses GROUP
BY, which is typically used in conjunction with an aggregation
function to combine data:
SELECT s.supervisor, SUM(g.amount) from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id GROUP BY s.supervisor;
Give us the total of the grants for every supervisor, ordered by that
total. We give 'SUM(g.amount)' the name 'fatcash' so we can reference
it in the ORDER BY clause:
SELECT s.supervisor, SUM(g.amount) as fatcash from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id GROUP BY s.supervisor ORDER by fatcash DESC;