6.470 SQL

The Big Picture:

SQL is a declarative language (cf. imperative). You say what information you want instead of scripting how to get it.
Queries are on tables and queries return tables. This means you use the output of one query as the input to another to easily create complex queries.
Common SQL in a Nutshell:
- querying data: SELECT, WHERE, =/IN, ORDER BY, ASC/DESC, LIMIT, joins
- changing data: INSERT, UPDATE
- aggregation: COUNT, AVG, MIN, MAX, SUM, GROUP BY

Example: Supervisors and Grants

Properties of our dataset:

A supervisor has a name and belongs to exactly 1 research group
A grants has a name and an amount
Supervisors can have 0-many grants

Our schema:

supervisors
+------------+--------------+------+-----+---------+-------+
| Field      | Type         | Null | Key | Default | Extra |
+------------+--------------+------+-----+---------+-------+
| id         | int(11)      | NO   | PRI | NULL    |       | 
| supervisor | varchar(100) | YES  |     | NULL    |       | 
| groupname  | varchar(25)  | YES  |     | NULL    |       | 
+------------+--------------+------+-----+---------+-------+

grants
+--------+--------------+------+-----+---------+-------+
| Field  | Type         | Null | Key | Default | Extra |
+--------+--------------+------+-----+---------+-------+
| id     | int(11)      | NO   | PRI | NULL    |       | 
| name   | varchar(100) | YES  |     | NULL    |       | 
| amount | int(11)      | YES  |     | NULL    |       | 
+--------+--------------+------+-----+---------+-------+

supervisors_grants
+---------------+---------+------+-----+---------+-------+
| Field         | Type    | Null | Key | Default | Extra |
+---------------+---------+------+-----+---------+-------+
| supervisor_id | int(11) | NO   | PRI | 0       |       | 
| grant_id      | int(11) | NO   | PRI | 0       |       | 
+---------------+---------+------+-----+---------+-------+

The dataset from this toy database can be found here.

Querying Data

Give us all the information on every supervisor whose name ends in 'Miller':

SELECT * FROM supervisors WHERE supervisor LIKE '%Miller';

Give us a table containing only the name and amount for grants bigger than $10,000;

SELECT name, amount FROM grants WHERE amount>1000;

Give us all the information on supervisors in the EECS group with an ID under 50

SELECT * FROM supervisors WHERE groupname='EECS' AND id<50;

Give us the names of the 10 largest grants

SELECT name FROM grants ORDER BY amount DESC LIMIT 10;

Modifying Data

INSERT INTO table_name (column1, column2, column3,...)
VALUES (value1, value2, value3,...)

INSERT INTO supervisors VALUES (1, 'Rob Miller');

UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value

UPDATE grants SET value=5000 WHERE id=12;

DELETE FROM table_name
WHERE some_column=some_value

DELETE FROM supervisors WHERE name LIKE '%Miller';

Aggregation

Give us the average grant size and the total grant money in the grants table:

SELECT AVG(amount) FROM grants; SELECT SUM(amount) FROM grants;

Give us the names and values of the largest and smallest grants:

SELECT name, MAX(amount) FROM grants; SELECT name, MIN(amount) FROM grants;

Give us the number of supervisors in our table:

SELECT COUNT(*) FROM supervisors;

Joins

Use joins to connect tables so you can query across them. Without a predicate on the join you get the Cartesian product of two tables. With a predicate, some column value in the first table must match a column value in the second table for that set of rows to make it into the resulting temporary join table.
Joins have very efficient implementations in modern database management systems and are much faster than correlated subqueries (a SELECT nested inside a SELECT). Use them!

Without a qualifier you get the Cartesian product of two tables. You can give tables temporary short names to make your query shorter and facilitate unambiguous references to columns from a particular table:

SELECT * from supervisors, grants; or SELECT * from supervisors s, grants g;

Give us the name and amount for every grant on which Rob Miller was the supervisor. Note how a predicate is used to link supervisors to supervisor_grants, and then another predicate is used to link supervisor_grants to grants so that queries can now go across all three tables:

SELECT g.name, g.amount FROM supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id AND s.supervisor='Rob Miller';

Give us the name of every supervisor that has worked on a grant whose name includes the word 'wireless':

SELECT s.supervisor from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id AND g.name LIKE '%wireless%';

Give us the name of every supervisor that has worked on a grant whose name includes the word 'wireless':

SELECT s.supervisor from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id AND g.name LIKE '%wireless%';

Give us the total of the grants Rob Miller has:

SELECT SUM(g.amount) from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id AND s.supervisor='Rob Miller';

Give us the total of the grants Rob Miller has. This query uses GROUP BY, which is typically used in conjunction with an aggregation function to combine data:

SELECT s.supervisor, SUM(g.amount) from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id GROUP BY s.supervisor;

Give us the total of the grants for every supervisor, ordered by that total. We give 'SUM(g.amount)' the name 'fatcash' so we can reference it in the ORDER BY clause:

SELECT s.supervisor, SUM(g.amount) as fatcash from supervisors s, supervisors_grants sg, grants g WHERE s.id=sg.supervisor_id AND sg.grant_id=g.id GROUP BY s.supervisor ORDER by fatcash DESC;