If you work with data then learning SQL is a must.
SQL (Structured Query Language) is a domain-specific language used for managing and querying data stored in RDMSes.
SQL is based on relational algebra and is often broken into multiple parts:
- DDL (Data Definition Language)
- DML (Data Manipulation Language)
The SQL DDL is used for defining and modifying tables and table schemas.
SQL supports many built-in types.
Some common types include:
char(n)An n-length character string.
varchar(n)A variable-length character string with maximum length n.
intAn integer (machine-dependant size).
smallintA small integer (machine-dependant size).
numeric(p, d)A fixed-point number with a sign bit, p digits, and d of the p digits to the right of the decimal point.
doubleA double precision floating point (machine-dependant size).
float(n)A float with at least n precision.
text(n)Holds a string with a maximum length of n or 65,535 bytes.
Each type may include a special null value.
Schemas can be defined using the create table command.
The create table command specifies column names as well as their type.
In addition, create table supports default values and optional integrity constraints:
create table employees ( employee_id int primary key, preferred_name varchar(250), job_id int not null, display_job_title varchar(250), salary int default 0, start_date datetime not null );
You can also define indexes:
create index ix_salary index on employees(salary); [1, P. 164].
Tables can be deleted with the drop table command:
drop table employees;.
The alter table statement can be used to add, delete, or modify columns to an existing table:
alter table employees add first_name varchar(100), add last_name varchar(100);
Constraints can be defined on columns. SQL prevents updates to the database that violate a constraint [1, P. 69].
The primary key constraint specifies that a column (or a set of columns) uniquely identifies the relation and is nonnull:
create table employees ( employee_id int primary key, -- ... );
The foreign key constraint specifies a column (or a set of columns) that must correspond to the primary key values of columns in another table:
create table referrals ( employee_id int, -- ... foreign key (employee_id) references employees(employee_id) );
The check clause can be used to ensure column values satisfy a predicate:
create table employees ( -- ... salary int check (salary >= 18000), );
Queries are performed using the select statement, which produces a table as a result (the result set).
select * from employees;
Select statements are made up of multiple clauses.
The select clause defines the columns that are selected. It also be used to derive new columns:
select employee_id as id, join_date, round(salary / 12) as monthly_salary from employees;
The from clause defines which table (rowset) to read the data from. If multiple tables are defined (e.g.
from employees, candidates) then the rowset is a Cartesian product of the tables.
The where clause takes a predicate and applies it to the result table:
select * from employees where salary > 100000;
The predicate can use logical connectives
The group by clause forms groups of rows based on the defined grouping columns:
select major_version, minor_version from app_releases group by major_version, minor_version
Aggregate functions take a collection of values as input and return a single value.
There are 5 builtin SQL aggregate functions:
The input to
sum must be numbers but the other functions can operate on collections of nonnumeric data types [1, P. 91].
The following query returns a single row with the average employee salary:
select avg(salary) as average_salary from employees;
In general, aggregate functions ignore null values in their input (except
count(*)). This behavior can be confusing and so it’s best to avoid nulls in your table [1, P. 96].
The having clause operates on groups (as opposed to the where clause which operates on individual rows before they are grouped).
select major_version, minor_version, count(patch_version) as patch_count from app_releases group by major_version, minor_version having patch_count > 1
Select results can be ordered with the order by clause.
order by C1, C2, ..., results are ordered first by C1, in case of ties they are ordered by C2, etc.
The sort order is ascending by default. The
desc keyword sets the sort order to descending:
select * from log_events order by event_time desc, event_name asc;
SQL supports nested subqueries, i.e. select statements nested within select statement.
In the where clause:
select * from employees where employee_id in ( select employee_id from referrals where hiring_decision = 'hire' );
In the from clause (supported by most SQL implementations):
select dept_name, avg_salary from ( select dept_name, avg(salary) as avg_salary from employees group by dept_name ) where avg_salary > 120000;
A correlated subquery uses values of its outer query. These can be slow.
Scalar subqueries are subqueries that result in a single column and a single row.
Scalar subqueries can be used wherever expressions returning a single value are permitted:
select * from employee where salary > ( select avg(salary) from employees );
Runtime errors can occur when using scalar subqueries, since it’s not always possible to determine whether a query will return a single row at compile time [1, P. 107].
A join clause allows you to combine rows from two or more tables.
There are different classes of join.
A cross join returns the Cartesian product of rows from tables in the join.
select * from employees cross join referrals;
Cross joins are performed implicitly in from statements with multiple tables:
select * from employee, referrals;
An inner join returns returns all rows from the Cartesian product of two tables that satisfy a given join predicate.
select employee_id, referral_id from employees inner join referrals on employees.employee_id = referrals.employee_id;
An equi-join is a type of join that uses only equality comparisons in the join predicate.
A natural join () is a special case of equi-join where the result is a combination of rows that have matching values for columns with the same names. The resulting table of a natural join contains a concatenation of the rows with matching values but with only one column for each of the shared columns [1, P. 127].
select employee_id, referral_id from employees natural join referrals;
Outer joins performs a join between two tables but preserves unmatched rows.
A full outer join preserves all non-matching rows in both tables.
select employee_id, referral_id from employees full outer join referrals;
A left outer join preserves rows in the table to the left of the join operator.
select employee_id, referral_id from employees left outer join referrals;
A right outer join preserves rows in the table to the right of the join operator.
select employee_id, referral_id from employees right outer join referrals;
SQL provides an interface for inserting, updating, and deleting rows.
The delete statement is used to delete rows:
delete from employees where employee_id = 12
The insert into statement is used to insert rows:
insert into employees (employee_id, job_id, start_date, salary) values (2, 1, NOW(), 90000);
The update statement is used to update rows:
update employees set salary = salary * 1.05 where salary < 500000;
A view is the result set of a stored query on the data.
create view high_earners as select employee_id from employees where salary > 1000000; select * from high_earners;
A materialized view is a cached view. The implementation is vendor-specific, as it’s not standardized in SQL [1, P. 140].
In some implementations, a materialized view is recalculated when one of its underlying dependencies is modified: either immediately or lazily when the view is accessed. An alternative approach is to update materialized views periodically [1, P. 140].
Transactions are a sequence of SQL statements that are either applied atomically or rolled back in the case of failure [1, P. 144].
An example PostgreSQL transaction:
begin; update accounts set balance = balance - 1000 where account_id = 1; update accounts set balance = balance + 1000 where account_id = 2; commit;
SQL supports authorization.
Authorization privileges includes:
Users can be authorized to have none, all, or a combination of privileges on specified parts of a database using the grant statement:
grant delete on employees to alice;
Roles are also supported:
create role accountant; grant accountant to bob; grant select on employees to accountant;
Privileges can be revoked with the revoke statement:
revoke grant option for select on referrals from bob;
-  A. Silberschatz, H. F. Korth, and S. Sudarshan, Database System Concepts, Seventh Edition. McGraw-Hill Book Company, 2020.