有两种方式可以完成同样的事情。
例如:从人员数据库中获取不同的姓名。
第一种方式是:
SELECT name
FROM person
GROUP BY name
具有相同的结果:
SELECT DISTINCT name
FROM person
我很好奇,PostgreSQL SQL引擎处理命令的方式是否存在不同,并且哪种方式更快,或者它们执行的是相同的操作?
有两种方式可以完成同样的事情。
例如:从人员数据库中获取不同的姓名。
第一种方式是:
SELECT name
FROM person
GROUP BY name
具有相同的结果:
SELECT DISTINCT name
FROM person
我很好奇,PostgreSQL SQL引擎处理命令的方式是否存在不同,并且哪种方式更快,或者它们执行的是相同的操作?
create table Persons (
Name varchar(30)
)
INSERT INTO Persons (Name)
SELECT
arrays.firstnames[s.a % ARRAY_LENGTH(arrays.firstnames,1) + 1]
|| arrays.lastnames[s.a % ARRAY_LENGTH(arrays.lastnames,1) + 1] AS name
FROM generate_series(1,600000) AS s(a) -- number of names to generate
CROSS JOIN(
SELECT ARRAY[
'Adam','Bill','Bob','Calvin','Donald','Dwight','Frank','Fred','George','Howard',
'James','John','Jacob','Jack','Martin','Matthew','Max','Michael','Lukas',
'Paul','Peter','Phil','Roland','Ronald','Samuel','Steve','Theo','Warren','William',
'Abigail','Alice','Allison','Amanda','Anne','Barbara','Betty','Carol','Cleo','Donna',
'Jane','Jennifer','Julie','Martha','Mary','Melissa','Patty','Sarah','Simone','Susan'
] AS firstnames,
ARRAY[
'Matthews','Smith','Jones','Davis','Jacobson','Williams','Donaldson','Maxwell','Peterson','Storms','Stevens',
'Franklin','Washington','Jefferson','Adams','Jackson','Johnson','Lincoln','Grant','Fillmore','Harding','Taft',
'Truman','Nixon','Ford','Carter','Reagan','Bush','Clinton','Hancock'
] AS lastnames
) AS arrays
select count(*) from Persons
| count | | -----: | | 600000 |
explain analyse select distinct Name from Persons
| QUERY PLAN | | :------------------------------------------------------------------------------------------------------------------- | | HashAggregate (cost=6393.82..6395.82 rows=200 width=78) (actual time=194.609..194.757 rows=1470 loops=1) | | Group Key: name | | -> Seq Scan on persons (cost=0.00..5766.66 rows=250866 width=78) (actual time=0.030..61.243 rows=600000 loops=1) | | Planning time: 0.259 ms | | Execution time: 194.898 ms |
explain analyse select Name from Persons group by Name
| QUERY PLAN | | :---------------------------------------------------------------------------------------------------------------------------------------------- | | Group (cost=5623.88..5625.88 rows=200 width=78) (actual time=226.358..227.145 rows=1470 loops=1) | | Group Key: name | | -> Sort (cost=5623.88..5624.88 rows=400 width=78) (actual time=226.356..226.596 rows=4410 loops=1) | | Sort Key: name | | Sort Method: quicksort Memory: 403kB | | -> Gather (cost=5564.59..5606.59 rows=400 width=78) (actual time=206.700..219.546 rows=4410 loops=1) | | Workers Planned: 2 | | Workers Launched: 2 | | -> Partial HashAggregate (cost=4564.59..4566.59 rows=200 width=78) (actual time=196.862..197.072 rows=1470 loops=3) | | Group Key: name | | -> Parallel Seq Scan on persons (cost=0.00..4303.27 rows=104528 width=78) (actual time=0.039..66.876 rows=200000 loops=3) | | Planning time: 0.069 ms | | Execution time: 227.301 ms |
db<>fiddle 这里
所以在这个例子中,使用DISTINCT仍然更快。
但是由于GROUP BY开始并行工作,这也可能取决于托管postgresql的服务器。
SELECT name FROM person UNION SELECT name FROM person
。但我会选择SELECT DISTINCT
。 - jarlh