写在前面

前面学习了连接查询，这是SQL查询的重点也是难点内容，一定要灵活掌握。接下来学习另一个重点内容—子查询，这是衡量开发人员SQL水平的一个指标。

本文的主要内容如下：（1）子查询；（2）子查询分类；（3）子查询出现在select后面；（4）子查询出现在from后面；（5）子查询出现在where或者having后面；（6）子查询出现在where或者having后面—标量子查询；（7）子查询出现在where或者having后面—列子查询；（8）子查询出现在where或者having后面—行子查询；（9）子查询出现在exists后面(相关子查询)；（10）文章总结。

子查询

子查询是指一个查询语句嵌套在另一个查询语句内部的查询，这个特性从MySQL4.1开始引入。说白了就是出现在select语句中的select语句，这就是子查询或者说是内查询，外部的select查询语句则称为主查询或者外查询。

子查询分类

子查询按照结果集的行列数不同，可以分为4种：标量子查询、列子查询、行子查询和表子查询：
（1）标量子查询，结果集中只有一行一列；
（2）列子查询，结果集中只有一列多行；
（3）行子查询，结果集中只有一行多列；
（4）表子查询，结果集中一般是多行多列；

当然上面这些子查询在主查询中出现的位置是不同的，具体表现如下：
（1）select后面：只能是标量子查询；
（2）from后面：支持表子查询；
（3）where或者having后面：支持标量子查询（一行一列）、列子查询（一列多行）、行子查询（一行多列）；
（4）exists后面（也就是相关子查询）：支持表子查询（多行多列）。

在select子句中先计算子查询，子查询结果作为外层另一个查询的过滤条件，查询可以基于一个表或者多个表。

数据准备

为了后续学习的必要，这里先准备了五张表，这5张表在企业中用的较多，分别是：部门表(departments)、员工信息表(employees)、职位表(jobs)、位置表(locations)和薪资等级表(job_grades)。

第一步，创建对应的数据表：

drop database if exists `hellobook`;

create database `hellobook`;

USE `hellobook`;

/*部门表*/
DROP TABLE IF EXISTS `departments`;
CREATE TABLE `departments` (
  `department_id` int(4) NOT NULL AUTO_INCREMENT comment '部门id',
  `department_name` varchar(3) DEFAULT NULL comment '部门名称',
  `manager_id` int(6) DEFAULT NULL comment '管理者id',
  `location_id` int(4) DEFAULT NULL comment '部门位置id，来源于表locations中的location_id',
  PRIMARY KEY (`department_id`),
  KEY `loc_id_fk` (`location_id`)
) ENGINE=InnoDB AUTO_INCREMENT=271 comment '部门表';


/*员工表*/
DROP TABLE IF EXISTS `employees`;
CREATE TABLE `employees` (
  `employee_id` int(6) NOT NULL AUTO_INCREMENT comment '员工id',
  `first_name` varchar(20) DEFAULT NULL comment '名',
  `last_name` varchar(25) DEFAULT NULL comment '姓',
  `email` varchar(25) DEFAULT NULL comment '电子邮箱',
  `phone_number` varchar(20) DEFAULT NULL comment '手机',
  `job_id` varchar(10) DEFAULT NULL comment '职位id，来源于jobs表中的job_id',
  `salary` double(10,2) DEFAULT NULL comment '薪水',
  `commission_pct` double(4,2) DEFAULT NULL comment '佣金百分比',
  `manager_id` int(6) DEFAULT NULL comment '上级id',
  `department_id` int(4) DEFAULT NULL comment '所属部门id，来源于departments中的department_id',
  `hiredate` datetime DEFAULT NULL comment '入职日期',
  PRIMARY KEY (`employee_id`)
) ENGINE=InnoDB AUTO_INCREMENT=207 comment '员工表';


/*职位信息表*/
DROP TABLE IF EXISTS `jobs`;
CREATE TABLE `jobs` (
  `job_id` varchar(10) NOT NULL comment '职位id',
  `job_title` varchar(35) DEFAULT NULL comment '职位名称',
  `min_salary` int(6) DEFAULT NULL comment '薪资范围最小值',
  `max_salary` int(6) DEFAULT NULL comment '薪资范围最大值',
  PRIMARY KEY (`job_id`)
) ENGINE=InnoDB comment '职位id';


/*位置表*/
DROP TABLE IF EXISTS `locations`;
CREATE TABLE `locations` (
  `location_id` int(11) NOT NULL AUTO_INCREMENT comment '位置id',
  `street_address` varchar(40) DEFAULT NULL comment '街道地址',
  `postal_code` varchar(12) DEFAULT NULL comment '邮编',
  `city` varchar(30) DEFAULT NULL comment '城市名称',
  `state_province` varchar(25) DEFAULT NULL comment '省',
  `country_id` varchar(2) DEFAULT NULL comment '国家编号',
  PRIMARY KEY (`location_id`)
) ENGINE=InnoDB AUTO_INCREMENT=3201 comment '位置表';


/*薪资等级表*/
DROP TABLE IF EXISTS `job_grades`;
CREATE TABLE `job_grades`(
  `grade_level` varchar(3) comment '等级',
  `lowest_sal`  int comment '薪资最低值',
  `highest_sal` int comment '薪资最高值',
  PRIMARY KEY (`grade_level`)
) comment '薪资等级表';

第二步，插入数据，数据在Gitee上，因此这里就忽略。

子查询出现在select后面

前面说过，select后面只能出现标量子查询，即结果集中只有一行一列。

例子1：查询出每个部门的员工个数，显示department_id、department_name、manager_id、location_id、员工个数。其实这就是departments表中所有字段，外加员工个数字段，因此此时的SQL语句为：

select d.*,(
select count(*) from employees as e
where e.department_id = d.department_id
)as 员工个数
from departments as d;

执行结果如下所示（部分）：

例子2：查询员工号employee_id为102的部门的名称，即department_name的值，显示部门名称，此时的SQL语句为：

select (
  select d.department_name
  from departments as d,employees as e
  where e.department_id = d.department_id
  and e.employee_id=102
)as 部门名称;

执行结果如下所示：

子查询出现在from后面

此时就相当于将子查询的结果集充当一张表，但是这个表必须取别名否则找不到这个表，然后将真实的表和子查询的结果表进行连接查询。

例子3：查询每个部门平均工资的工资等级，显示department_id、平均工资和grade_level。可以按照如下思路进行分析：

第一步，查询每个部门的平均工资：

1	select e.department_id, avg(e.salary) as 平均工资 from employees as e group by e.department_id;

第二步，查询薪资登记表信息：

1	select * from job_grades;

第三步，将上述两个步骤的结果进行连接查询，刷选条件是：平均工资在lowest_sal和highest_sal之间：

select e.department_id, money as '平均工资',g.grade_level
from 
(
  select a.department_id,avg(a.salary) as money
  from employees as a
  group by a.department_id
)as e, job_grades as g
where e.money between g.lowest_sal and g.highest_sal;

执行结果如下所示：

子查询出现在where或者having后面

前面说过，出现在where或者having后面的子查询可以是：标量子查询（一行一列）、列子查询（一列多行）和行子查询（一行多列）。

不过这需要注意以下几点内容：
（1）子查询放在小括号内；
（2）子查询一般放在条件的右侧；
（3）标量子查询，一般搭配单行操作符使用；
（4）列子查询，一般搭配多行操作符使用，常用的多行操作符有：>、<、>=、<=、=、<>、!=；
（5）子查询在主查询之前执行，因为主查询的条件用到了子查询的结果。

接下来学习一下mysql中子查询常用的关键字，如in、not in、any、some、all等等：
（a）in：表示列表中的任意一个；not in表示不在列表中。in常用于where表达式中，其作用是查询某个范围内的数据；

（b）any和some：请注意any和some所表达的含义一致，通常是和子查询返回的某一个值进行比较，如a>some(1,2,3)，此时a=2是满足条件的，只要a大于子查询返回值中的任意一个即可，也就是a大于子查询中的最小值就行，其实就等同于a>min(1,2,3)。any和some可以和=、>、>=、<、<=、<>结合起来使用，分别表示等于、大于、大于等于、小于、小于等于、不等于其中的任何一个数据；

（c）all：all通常是和子查询返回的所有值进行比较，如a>all(1,2,3)，此时a=4是满足条件的，a必须大于子查询返回值中的全部值才行，也就是a大于子查询中的最大值就行，其实就等同于a>max(1,2,3)。同样all可以和=、>、>=、<、<=、<>结合起来使用，分别表示等于、大于、大于等于、小于、小于等于、不等于其中的任何一个数据；

标量子查询

接下来介绍出现在where或者having后面的标量子查询，它分为两种情况：一个标量子查询和多个标量子查询。

一个标量子查询

例子4：查询出工资比Abel高的员工信息，显示employees表中所有字段。

第一步，查询Abel的工资信息，这是标量子查询：

1	select salary from employees where last_name = 'Abel';

第二步，查询工资比Abel高的员工信息，即满足salary > 第一步执行结果：

select * 
from employees as e
where e.salary > (select salary 
                from employees 
                 where last_name = 'Abel');

执行结果如下所示（部分）：

多个标量子查询

例子5：查询job_id与employee_id为141员工相同，工资salary比employee_id为143员工高的员工的信息，显示姓名(last_name表示）、job_id和工资。

第一步，查询employee_id为141员工的job_id：

1
2
3

select job_id
from employees
where employee_id = 141;

第二步，查询employee_id为143员工的salary：

1
2
3

select salary
from employees
where employee_id = 143;

第三步，查询员工的信息，显示姓名(last_name表示）、job_id和工资，要求job_id = 第一步执行结果，并且job_id > 第二步执行结果：

select e.last_name as 姓名, e.job_id,e.salary as 工资
from employees as e
where e.job_id =(
                   select job_id
                   from employees
                   where employee_id = 141 
                 )
and  e.salary > (
                   select salary
                   from employees
                   where employee_id = 143
                );

执行结果如下所示（部分）：

子查询+分组函数

例子6：查询最低工资大于部门编号(department_id)为50的部门的最低工资的部门编号department_id和其最低工资salary，显示department_id和salary。

第一步，查询部门编号(department_id)为50的最低工资：

1	select min(salary) from employees where department_id = 50;

第二步，每个部门的最低工资：

1	select min(salary) from employees group by department_id;

第三步，查询部门编号department_id和工资salary，条件是部门的最低工资大于第一步的执行结果：

select min(e.salary) as minSalary, e.department_id
from employees as e
group by department_id
having min(e.salary) > (
                         select min(salary)
                         from employees
                         where department_id = 50
                       );

错误的标量子查询
前面多次说过标量子查询的结果集都是一行一列，为了验证这个论据，可以将例子6中子查询语句中的min(salary)修改为salary，此时代码执行如下：

select min(e.salary) as minSalary, e.department_id
from employees as e
group by department_id
having min(e.salary) > (
                         select salary
                         from employees
                         where department_id = 50
                       );

[Err] 1242 - Subquery returns more than 1 row

可以看到错误提示信息是子查询返回超过了一行记录，这也就说标量子查询只支持一行一列记录。

列子查询

接下来介绍出现在where或者having后面的列子查询，列子查询需要搭配多行操作符使用，如in、not in、any、some、all，同时为了提升效率，一般都会使用distinct关键字来进行去重。

例子7：查询location_id为1400或者1700部门中所有员工的姓名，显示员工姓名（使用last_name字段）。

第一步，查询location_id为1400或者1700的部门编号department_id
：

1	select distinct department_id from departments where location_id in (1400,1700);

第二步，查询员工姓名，要求部门编号在第一步执行结果中：

select e.last_name as 姓名
from employees as e 
where e.department_id 
in (
  select distinct department_id 
  from departments 
  where location_id 
  in (1400,1700)
);

当然也可以使用any关键字来实现，如下所示：

select e.last_name as 姓名
from employees as e 
where e.department_id 
= any (
  select distinct department_id 
  from departments 
  where location_id 
  in (1400,1700)
);

既然可以使用any，那么就可以使用<>all了，因为<>all等价于any：

select e.last_name as 姓名
from employees as e 
where e.department_id 
<> all (
  select distinct department_id 
  from departments 
  where location_id 
  in (1400,1700)
);

例子8：查询其他工种中，比job_id为IT_PROG部门任意工资低的员工的信息，显示编号、姓名、job_id和salary。

第一步，查询job_id为IT_PROG部门的任意工资：

1
2
3

select distinct salary
from employees
where job_id = 'IT_PROG';

第二步，查询员工的编号、姓名、job_id和salary，条件是salary<第一步中任意工资，同时job_id不为IT_PROG：

select e.employee_id as 编号,e.last_name as 姓名, job_id,salary
from employees as e
where e.salary < any (
select distinct salary
from employees
where job_id = 'IT_PROG'
)
and job_id != 'IT_PROG';

执行结果如下所示（部分）：

既然是小于其中的某一个，那么自然等价于小于其中的最大值，此时SQL可以修改为如下所示：

select e.employee_id as 编号,e.last_name as 姓名, job_id,salary
from employees as e
where e.salary <(
select max(salary)
from employees
where job_id = 'IT_PROG'
)
and job_id != 'IT_PROG';

例子9：查询其他工种中，比job_id为IT_PROG部门所有工资低的员工的信息，显示编号、姓名、job_id和salary。

第一步，查询job_id为IT_PROG部门的所有工资：

1
2
3

select distinct salary
from employees
where job_id = 'IT_PROG';

第二步，查询员工的编号、姓名、job_id和salary，条件是salary<第一步中所有工资，同时job_id不为IT_PROG：

select e.employee_id as 编号,e.last_name as 姓名, job_id,salary
from employees as e
where e.salary < all (
select distinct salary
from employees
where job_id = 'IT_PROG'
)
and job_id != 'IT_PROG';

既然是小于其中的所有，那么自然等价于小于其中的最小值，此时SQL可以修改为如下所示：

select e.employee_id as 编号,e.last_name as 姓名, job_id,salary
from employees as e
where e.salary <(
select min(salary)
from employees
where job_id = 'IT_PROG'
)
and job_id != 'IT_PROG';

行子查询

行子查询和列子查询非常相似，只是它的结果集中只会出现一行多列的情况。

例子10：查询员工编号(employee_id)最小，并且工资(salary)最高的员工信息，显示员工表中的所有字段信息。

第一步，查询员工(employee_id)最小的编号：

1 2	select min(e.employee_id) from employees as e;

第二步，查询员工的最高工资：

1 2	select max(e.salary) from employees as e;

第三步，查询员工的最高工资，但是条件是其员工编号employee_id为第一步的执行结果：

select *
from employees as e
where e.employee_id = (
                         select min(employee_id)
                         from employees
                      )
and e.salary = (
                select max(salary)
                from employees
               );

其实查询最高工资和最小员工编号使用的都是employees表，因此可以将这两个过程进行合并，此时SQL就变成了如下的结果：

select *
from employees as e
where (e.employee_id,e.salary) = 
(
  select min(employee_id),max(salary)
  from employees
);

同时由于最高工资和最小员工编号这两个条件限定，导致子查询最终返回的是一行多列，因此这里的=就可以换成in，此时SQL就变成了如下的结果：

select *
from employees as e
where (e.employee_id,e.salary) in 
(
  select min(employee_id),max(salary)
  from employees
);

无论使用上面三种方式中的哪种，最终结果如下所示：

显然第一种方式较为普遍，但是后面两种较为简洁，就是逻辑上不太好理解。

子查询出现在exists后面(相关子查询)

子查询出现在exists后面时，也称为相关子查询。exists查询结果只有1和0这两种，通常用于判断子查询的结果集中是否有值，而不显示具体的值，这在某些场合下非常有用。需要说明的是，在大多数情况下，能用exists的子查询，都能使用in来代替，因此exists用的不是很多。

当子查询出现在exists后面时，就不再满足之前说的执行顺序，变成了先执行主查询，然后主查询的结果再根据子查询进行过滤，子查询中涉及到的主查询中使用的字段，因此称为相关子查询。

接下来将通过几个例子来学习如何使用相关子查询。

例子11：判断是否存在工资为30000元的员工，存在返回1，反之为0：

select exists(
  select employee_id
  from employees
  where salary = 30000
)as '是否存在，1存在，0不存在'

执行结果如下所示：

例子12：查询所有员工的部门名称。注意这里有些员工是没有部门的，因此首先要进行判断其是否存在部门名称，然后再获取所有员工的部门名称。

如果不使用exists，可以使用in来实现，此时SQL如下所示：

select d.department_name
from departments as d
where d.department_id in (
  select department_id from employees
);

如果使用exists，那么此时SQL如下所示：

select d.department_name
from departments as d
where exists(
select 1
from employees as e
where e.department_id = d.department_id
);

例子13：查询没有员工的部门，显示部门表信息。

使用exists关键字，此时SQL如下所示：

select *
from departments as d
where not exists (
  select 1
  from employees as e
  where e.department_id = d.department_id
  and e.department_id is NOT NULL
);

同样使用not in语句，此时SQL如下所示：

select *
from departments as d
where d.department_id not in (
  select department_id
  from employees
  where employees.department_id is NOT NULL
);

请注意，在使用not in语句的时候，上面使用了is NOT NULL，这个可能会存在坑，不信往下看。

通过观察发现employees员工表中的每行记录都是存在employee_id，因此可以将上述子查询中的where employees.department_id is NOT NULL判断可以去除，此时SQL语句如下所示：

select *
from departments as d
where d.department_id not in (
  select department_id
  from employees
);

执行结果如下所示：

这个结果怎么会是空呢？也就是说在使用in的情况下，如果子查询中列的值为NULL时，那么外查询结果就是空。其原因在于，开发者在定义employees表时，允许employee_id字段为空，所以如果不想出现这个问题，可以在定义表的时候，指定列不允许为空。

但是如果开发者使用的是exists，不存在这个问题，不信的话可以将exists中的同样语句去除，再次运行，你会发现结果和之前的一致。

但是我也说过，exists用的不是很多，大部分可以使用in来代替，这是有条件的，就是这个列在定义的时候，不允许为空。

文章总结

本篇文章学习了SQL查询中的子查询，这是重点也是难度，需要灵活掌握。同时尤其注意当字段允许为空的时候，使用in查询会出现一些问题，建议开发者在定义表结构的时候，列默认都指定不为空，这样能避免一些不必要的问题发生。

那么这样关于子查询的学习就到此为止，后续学习其他知识。