字符串拆分,行转列

# 需求描述

业务中一个一对多的关系被记录在一行数据中,用逗号拼接。现在需要将其恢复成多行的一对多关系,然后与其他表作关联。

可以理解为将数据:

+---------------------+
| num                 |
+---------------------+
| 7654,7698,7782,7788 |
+---------------------+
1
2
3
4
5

处理为:

+------+
| num  |
+------+
| 7654 |
| 7698 |
| 7782 |
| 7788 |
+------+
1
2
3
4
5
6
7
8

# 实现SQL

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('7654,7698,7782,7788',',',help_topic_id+1),',',-1) AS num  
FROM mysql.help_topic 
WHERE help_topic_id < LENGTH('7654,7698,7782,7788')-LENGTH(  REPLACE('7654,7698,7782,7788',',',''))+1; 
1
2
3

# 测试数据

CREATE TABLE `company` (
`id` int(20) DEFAULT NULL,
`name` varchar(100) DEFAULT NULL,
`shareholder` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `company` VALUES ('1', '阿里巴巴', '马云');
INSERT INTO `company` VALUES ('2', '淘宝', '马云,孙正义');
1
2
3
4
5
6
7

sql

select id,SUBSTRING_INDEX(SUBSTRING_INDEX(shareholder,',',help_topic_id+1),',',-1) as shareholder
from company inner join mysql.help_topic
where help_topic_id < LENGTH(shareholder)- LENGTH(REPLACE(shareholder,",",""))+1;
1
2
3

结果:

+----+-------------+
| id | shareholder |
+----+-------------+
|  1 | 马云         |
|  2 | 马云         |
|  2 | 孙正义       |
+----+-------------+
1
2
3
4
5
6
7

# 原理分析

# SUBSTRING_INDEX(str, delim, count)

参数 说明
str 需要截取的字符串
delim 分隔符
count 当 count 为正数,取第 n 个分隔符之前的所有字符; 当 count 为负数,取倒数第 n 个分隔符之后的所有字符。

举例:

select substring_index('7654,7698,7782,7788',',',1) as str1 ,substring_index('7654,7698,7782,7788',',',2) as str2 ,substring_index('7654,7698,7782,7788',',',-1) as 'str-1';
1

结果:

+------+-----------+-------+
| str1 | str2      | str-1 |
+------+-----------+-------+
| 7654 | 7654,7698 | 7788  |
+------+-----------+-------+
1
2
3
4
5

# 辅助表

SQL中使用了mysql.help_topic表,该表一共659行数据,help_topic_id字段范围[0,658]。这里主要是需要一段递增id。也可以换成其他表,但需要保证连续递增,且跨度为1.

如果没有符合要求的表,也可以通过任意满足行数要求的表,自行构造递增序列。

这里假设使用一个辅助表temp_table,生成一段[1,5]的序列,那么需要保证辅助表至少有5行数据,如果辅助表数据过多,需要添加条件限制查询行数,避免耗时过多。

select @num:=@num+1 as id from temp_table,(select @num:=0) as t where @num<5 ;
1

结果:

+----+
| id |
+----+
|  1 |
|  2 |
|  3 |
|  4 |
|  5 |
+----+
1
2
3
4
5
6
7
8
9

那么相应的完整SQL为

SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('7654,7698,7782,7788',',',t1.id),',',-1) AS num  
FROM (select @num:=@num+1 as id from temp_table,(select @num:=0) as t where @num<5) as t1
WHERE t1.id <= LENGTH('7654,7698,7782,7788')-LENGTH(  REPLACE('7654,7698,7782,7788',',',''))+1; 
1
2
3

这里因为生成的序列是从1开始,所以临界条件也要做相应的修改。

如果不使用变量参数,还可以使用以下方式生成序列

SELECT 1 + a.N + b.N * 10 AS n
FROM (
    SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL 
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) AS a
CROSS JOIN (
    SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL 
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) AS b
1
2
3
4
5
6
7
8
9

# 展开SQL

要想分割字符串为n行,则对于递增的行号i,该行的字符为第i-1和第i个分隔符之间的字符串。

辅助表与目标表进行join,主要是为了实现将一行字符串变成多行,并配上一个自增序列,这样就可以通过这个自增序列来控制截取位置。即每次先截取第i个分隔符前的字符串,再取截取的字符串中最后一个分隔符(第i-1个)后的字符串。

select id,SUBSTRING_INDEX(shareholder,',',help_topic_id+1) as tmp_shareholder, SUBSTRING_INDEX(SUBSTRING_INDEX(shareholder,',',help_topic_id+1),',',-1) as shareholder
from company inner join mysql.help_topic
where help_topic_id < LENGTH(shareholder)- LENGTH(REPLACE(shareholder,",",""))+1;
1
2
3
+----+-----------------+-------------+
| id | tmp_shareholder | shareholder |
+----+-----------------+-------------+
|  1 | 马云            | 马云         |
|  2 | 马云            | 马云         |
|  2 | 马云,孙正义      | 孙正义       |
+----+-----------------+-------------+
1
2
3
4
5
6
7
上次更新: 2024/04/30, 08:49:24
最近更新
01
docker-compose笔记
01-12
02
MySQL数据迁移
11-27
03
Docker部署服务,避免PID=1
11-27
更多文章>