字符串拆分,行转列
# 需求描述
业务中一个一对多的关系被记录在一行数据中,用逗号拼接。现在需要将其恢复成多行的一对多关系,然后与其他表作关联。
可以理解为将数据:
+---------------------+
| num |
+---------------------+
| 7654,7698,7782,7788 |
+---------------------+
1
2
3
4
5
2
3
4
5
处理为:
+------+
| num |
+------+
| 7654 |
| 7698 |
| 7782 |
| 7788 |
+------+
1
2
3
4
5
6
7
8
2
3
4
5
6
7
8
# 实现SQL
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('7654,7698,7782,7788',',',help_topic_id+1),',',-1) AS num
FROM mysql.help_topic
WHERE help_topic_id < LENGTH('7654,7698,7782,7788')-LENGTH( REPLACE('7654,7698,7782,7788',',',''))+1;
1
2
3
2
3
# 测试数据
CREATE TABLE `company` (
`id` int(20) DEFAULT NULL,
`name` varchar(100) DEFAULT NULL,
`shareholder` varchar(100) DEFAULT NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8;
INSERT INTO `company` VALUES ('1', '阿里巴巴', '马云');
INSERT INTO `company` VALUES ('2', '淘宝', '马云,孙正义');
1
2
3
4
5
6
7
2
3
4
5
6
7
sql
:
select id,SUBSTRING_INDEX(SUBSTRING_INDEX(shareholder,',',help_topic_id+1),',',-1) as shareholder
from company inner join mysql.help_topic
where help_topic_id < LENGTH(shareholder)- LENGTH(REPLACE(shareholder,",",""))+1;
1
2
3
2
3
结果:
+----+-------------+
| id | shareholder |
+----+-------------+
| 1 | 马云 |
| 2 | 马云 |
| 2 | 孙正义 |
+----+-------------+
1
2
3
4
5
6
7
2
3
4
5
6
7
# 原理分析
# SUBSTRING_INDEX(str, delim, count)
参数 | 说明 |
---|---|
str | 需要截取的字符串 |
delim | 分隔符 |
count | 当 count 为正数,取第 n 个分隔符之前的所有字符; 当 count 为负数,取倒数第 n 个分隔符之后的所有字符。 |
举例:
select substring_index('7654,7698,7782,7788',',',1) as str1 ,substring_index('7654,7698,7782,7788',',',2) as str2 ,substring_index('7654,7698,7782,7788',',',-1) as 'str-1';
1
结果:
+------+-----------+-------+
| str1 | str2 | str-1 |
+------+-----------+-------+
| 7654 | 7654,7698 | 7788 |
+------+-----------+-------+
1
2
3
4
5
2
3
4
5
# 辅助表
SQL
中使用了mysql.help_topic
表,该表一共659行数据,help_topic_id
字段范围[0,658]
。这里主要是需要一段递增id。也可以换成其他表,但需要保证连续递增,且跨度为1.
如果没有符合要求的表,也可以通过任意满足行数要求的表,自行构造递增序列。
这里假设使用一个辅助表temp_table
,生成一段[1,5]
的序列,那么需要保证辅助表至少有5行数据,如果辅助表数据过多,需要添加条件限制查询行数,避免耗时过多。
select @num:=@num+1 as id from temp_table,(select @num:=0) as t where @num<5 ;
1
结果:
+----+
| id |
+----+
| 1 |
| 2 |
| 3 |
| 4 |
| 5 |
+----+
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
那么相应的完整SQL为
SELECT SUBSTRING_INDEX(SUBSTRING_INDEX('7654,7698,7782,7788',',',t1.id),',',-1) AS num
FROM (select @num:=@num+1 as id from temp_table,(select @num:=0) as t where @num<5) as t1
WHERE t1.id <= LENGTH('7654,7698,7782,7788')-LENGTH( REPLACE('7654,7698,7782,7788',',',''))+1;
1
2
3
2
3
这里因为生成的序列是从1开始,所以临界条件也要做相应的修改。
如果不使用变量参数,还可以使用以下方式生成序列
SELECT 1 + a.N + b.N * 10 AS n
FROM (
SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) AS a
CROSS JOIN (
SELECT 0 AS N UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
) AS b
1
2
3
4
5
6
7
8
9
2
3
4
5
6
7
8
9
# 展开SQL
要想分割字符串为n
行,则对于递增的行号i
,该行的字符为第i-1
和第i
个分隔符之间的字符串。
辅助表与目标表进行join
,主要是为了实现将一行字符串变成多行,并配上一个自增序列,这样就可以通过这个自增序列来控制截取位置。即每次先截取第i
个分隔符前的字符串,再取截取的字符串中最后一个分隔符(第i-1
个)后的字符串。
select id,SUBSTRING_INDEX(shareholder,',',help_topic_id+1) as tmp_shareholder, SUBSTRING_INDEX(SUBSTRING_INDEX(shareholder,',',help_topic_id+1),',',-1) as shareholder
from company inner join mysql.help_topic
where help_topic_id < LENGTH(shareholder)- LENGTH(REPLACE(shareholder,",",""))+1;
1
2
3
2
3
+----+-----------------+-------------+
| id | tmp_shareholder | shareholder |
+----+-----------------+-------------+
| 1 | 马云 | 马云 |
| 2 | 马云 | 马云 |
| 2 | 马云,孙正义 | 孙正义 |
+----+-----------------+-------------+
1
2
3
4
5
6
7
2
3
4
5
6
7
编辑 (opens new window)
上次更新: 2024/04/30, 08:49:24