在PGSQL中模拟MySQL的substring_index()函数

Question

在PGSQL中模拟MySQL的substring_index()函数

13

我希望找到一种优雅的方法来模拟Postgres中MySQL的subtring_index()函数的行为。

在MySQL中，这很简单：

mysql> create temporary table test1(test varchar(200));
Query OK, 0 rows affected (0.01 sec)

mysql> insert into test1 values('apples||oranges'),('apples||grapes');
Query OK, 2 rows affected (0.00 sec)
Records: 2  Duplicates: 0  Warnings: 0

mysql> select * from test1;
+-----------------+
| test            |
+-----------------+
| apples||oranges |
| apples||grapes  |
+-----------------+
2 rows in set (0.00 sec)

mysql> select substring_index(test, '||', 1) as field1, substring_index(test, '||', -1) as field2 from test1;
+--------+---------+
| field1 | field2  |
+--------+---------+
| apples | oranges |
| apples | grapes  |
+--------+---------+
2 rows in set (0.00 sec)

但我当前在PGSQL中的解决方法相当丑陋：

hoth=# create temporary table test1(test text);
CREATE TABLE

hoth=# insert into test1 values('apples||oranges'),('apples||grapes');
INSERT 0 2

hoth=# select * from test1;
      test       
-----------------
 apples||oranges
 apples||grapes
(2 rows)

hoth=# select substring(test, 0, position('||' in test)) as field1,  substring(test, position('||' in test) + 2, char_length(test)) as field2  from test1;
 field1 | field2  
--------+---------
 apples | oranges
 apples | grapes
(2 rows)

也许可以使用正则表达式来寻找更优雅的解决方案，或者甚至通过将字符串分割成数组并存储在一个变量中来减少开销（如果字符串是从子查询或其他什么地方生成的），我欢迎任何建议。

- jesse_galley

2

我认为一个非常创新的解决方案是以更适合你要执行的查询方式来存储数据（例如通过规范化或使用数组类型）。我意识到这并不总是可行的，但我想提出这个建议，特别是因为你的MySQL示例似乎是专门编码为分成两个部分。 - IMSoP

2个回答

9

这是我如何在PostgreSQL中实现（或模拟）MySQL的substring_index()函数：

CREATE OR REPLACE FUNCTION public.substring_index (
  str text,
  delim text,
  count integer = 1,
  out substring_index text
)
RETURNS text AS
$body$
BEGIN
  IF count > 0 THEN
    substring_index = array_to_string((string_to_array(str, delim))[:count], delim);
  ELSE
    DECLARE
      _array TEXT[];
    BEGIN
      _array = string_to_array(str, delim);
      substring_index = array_to_string(_array[array_length(_array, 1) + count + 1:], delim);    
    END;  
  END IF;
END;
$body$
LANGUAGE 'plpgsql'
IMMUTABLE
CALLED ON NULL INPUT
SECURITY INVOKER
COST 5;

以下是来自MySQL文档的示例：

postgres=# SELECT substring_index('www.mysql.com', '.', 2);
 substring_index
-----------------
 www.mysql
(1 row)

postgres=# SELECT substring_index('www.mysql.com', '.', -2);
 substring_index
-----------------
 mysql.com
(1 row)

- Sahap Asci

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Richard Huxton · Accepted Answer

21

时刻留意手册内容。

http://www.postgresql.org/docs/current/static/functions-string.html

如果split_part(string text, delimiter text, field int)不能满足您的需求（甚至可以说是更多，如果我理解了您的MySQL函数），那么您需要解释何时以及为什么使用它。

- Richard Huxton

1

啊，手册好像把“字符串函数和操作符”与“其他字符串函数”分开了，而我显然忽略了后者。谢谢。 - jesse_galley

@jesse_galley: "其中一些用于内部实现表9-5中列出的SQL标准字符串函数。" [强调是我的]。因此，第一个列表旨在涵盖SQL标准指定的函数，而第二个列表旨在涵盖PostgreSQL扩展。 - mu is too short

3

虽然 SPLIT_PART 可以解决上面的示例，但它不是模拟 MySQL 的 SUBSTRING_INDEX 的工具，因为 SUBSTRING_INDEX 返回指定分隔符出现次数左侧或右侧的子字符串。当面对任意数量的分隔符出现时，该如何模拟其行为呢？例如，从 URL 的较大部分中选择域名；例如：从 'sports.adventures.hobbies.domain.com' 和 'pets.domain.com' 中选择 'domain.com' 。在 PostgreSQL 中解决这个问题可能需要使用正则表达式或结合各种字符串函数。 - Ruben Ramirez Padron

3

split_part 无法完全替代 SUBSTRING_INDEX 的另一个原因是：field 必须大于零。MySQL允许使用负值相对于字符串的末尾进行拆分。 - zliw