DISTINCT在JPA中使用时是和哪个列配合使用的?是否可以更改它所配合的列?
这里有一个使用DISTINCT的JPA查询示例:
select DISTINCT c from Customer c
这并不太合理 - 根据哪一列进行去重?它是否在实体上作为注释指定,因为我找不到呢?
我想指定用于去重的列,类似于:
select DISTINCT(c.name) c from Customer c
我正在使用MySQL和Hibernate。
DISTINCT在JPA中使用时是和哪个列配合使用的?是否可以更改它所配合的列?
这里有一个使用DISTINCT的JPA查询示例:
select DISTINCT c from Customer c
这并不太合理 - 根据哪一列进行去重?它是否在实体上作为注释指定,因为我找不到呢?
我想指定用于去重的列,类似于:
select DISTINCT(c.name) c from Customer c
我正在使用MySQL和Hibernate。
你离成功很近了。
select DISTINCT(c.name) from Customer c
select c from Customer c where id in (select min(d.id) from Customer d group by d.name)
... 但这是一个情况相关的问题,因为你需要根据可用的属性来选择一个实体。 - Jules根据基础的JPQL或Criteria API查询类型,DISTINCT
在JPA中有两个含义。
对于返回标量投影的标量查询,比如下面的查询:
List<Integer> publicationYears = entityManager.createQuery("""
select distinct year(p.createdOn)
from Post p
order by year(p.createdOn)
""", Integer.class)
.getResultList();
LOGGER.info("Publication years: {}", publicationYears);
我们需要将DISTINCT
关键字传递给底层的SQL语句,因为我们希望DB引擎在返回结果集之前过滤重复项:
SELECT DISTINCT
extract(YEAR FROM p.created_on) AS col_0_0_
FROM
post p
ORDER BY
extract(YEAR FROM p.created_on)
-- Publication years: [2016, 2018]
Hibernate 6可以自动消除父实体引用的重复,因此您无需像在Hibernate 5中那样使用DISTINCT
关键字。
因此,在运行以下查询时:
List<Post> posts = entityManager.createQuery("""
select p
from Post p
left join fetch p.comments
where p.title = :title
""", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
assertEquals(1, posts.size());
assertEquals(2, posts.get(0).getComments().size());
即使它有两个关联的PostComment
子实体,我们可以看到仅获取了单个Post
实体。
在JPA中,对于实体查询,DISTINCT
有不同的含义。
如果不使用DISTINCT
,像下面这样的查询:
List<Post> posts = entityManager.createQuery("""
select distinct p
from Post p
left join fetch p.comments
where p.title = :title
""", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
is going to JOIN the post
and the post_comment
tables like this:SELECT p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1, 1]
但是每个相关联的post_comment
行在结果集中都会复制父post
记录。因此,Post
实体的List
将包含重复的Post
实体引用。
要消除Post
实体引用,我们需要使用DISTINCT
:
List<Post> posts = entityManager.createQuery("""
select distinct p
from Post p
left join fetch p.comments
where p.title = :title
""", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
但是SQL查询语句中也传递了DISTINCT
,这一点并不理想:
SELECT DISTINCT
p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1]
通过向SQL查询传递DISTINCT
,执行计划将执行额外的排序阶段,这会增加开销,但不会带来任何价值,因为由于子PK列,父子组合始终返回唯一记录:
Unique (cost=23.71..23.72 rows=1 width=1068) (actual time=0.131..0.132 rows=2 loops=1)
-> Sort (cost=23.71..23.71 rows=1 width=1068) (actual time=0.131..0.131 rows=2 loops=1)
Sort Key: p.id, pc.id, p.created_on, pc.post_id, pc.review
Sort Method: quicksort Memory: 25kB
-> Hash Right Join (cost=11.76..23.70 rows=1 width=1068) (actual time=0.054..0.058 rows=2 loops=1)
Hash Cond: (pc.post_id = p.id)
-> Seq Scan on post_comment pc (cost=0.00..11.40 rows=140 width=532) (actual time=0.010..0.010 rows=2 loops=1)
-> Hash (cost=11.75..11.75 rows=1 width=528) (actual time=0.027..0.027 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on post p (cost=0.00..11.75 rows=1 width=528) (actual time=0.017..0.018 rows=1 loops=1)
Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
Rows Removed by Filter: 3
Planning time: 0.227 ms
Execution time: 0.179 ms
为了从执行计划中消除排序阶段,我们需要使用HINT_PASS_DISTINCT_THROUGH
JPA查询提示:
List<Post> posts = entityManager.createQuery("""
select distinct p
from Post p
left join fetch p.comments
where p.title = :title
""", Post.class)
.setParameter(
"title",
"High-Performance Java Persistence eBook has been released!"
)
.setHint(QueryHints.HINT_PASS_DISTINCT_THROUGH, false)
.getResultList();
LOGGER.info(
"Fetched the following Post entity identifiers: {}",
posts.stream().map(Post::getId).collect(Collectors.toList())
);
现在,SQL查询将不包含DISTINCT
,但是Post
实体引用的重复项将被删除:
SELECT
p.id AS id1_0_0_,
pc.id AS id1_1_1_,
p.created_on AS created_2_0_0_,
p.title AS title3_0_0_,
pc.post_id AS post_id3_1_1_,
pc.review AS review2_1_1_,
pc.post_id AS post_id3_1_0__
FROM post p
LEFT OUTER JOIN
post_comment pc ON p.id=pc.post_id
WHERE
p.title='High-Performance Java Persistence eBook has been released!'
-- Fetched the following Post entity identifiers: [1]
而执行计划将确认这一次我们不再有额外的排序阶段:
Hash Right Join (cost=11.76..23.70 rows=1 width=1068) (actual time=0.066..0.069 rows=2 loops=1)
Hash Cond: (pc.post_id = p.id)
-> Seq Scan on post_comment pc (cost=0.00..11.40 rows=140 width=532) (actual time=0.011..0.011 rows=2 loops=1)
-> Hash (cost=11.75..11.75 rows=1 width=528) (actual time=0.041..0.041 rows=1 loops=1)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Seq Scan on post p (cost=0.00..11.75 rows=1 width=528) (actual time=0.036..0.037 rows=1 loops=1)
Filter: ((title)::text = 'High-Performance Java Persistence eBook has been released!'::text)
Rows Removed by Filter: 3
Planning time: 1.184 ms
Execution time: 0.160 ms
如果您使用的是Hibernate 6,则不再需要QueryHints.HINT_PASS_DISTINCT_THROUGH
,应将其删除,因为该功能已从框架中删除。@QueryHints(@QueryHint(name = "hibernate.query.passDistinctThrough", value = "false"))
。 - dk7PASS_DISTINCT_THROUGH
是由 HHH-10965 实现的,自 Hibernate ORM 5.2.2 起可用。Spring Boot 1.5.9 非常老,使用的是 Hibernate ORM 5.0.12。因此,如果您想从这些令人惊叹的功能中受益,您需要升级您的依赖项。 - Vlad Mihalcea@Entity
@NamedQuery(name = "Customer.listUniqueNames",
query = "SELECT DISTINCT c.name FROM Customer c")
public class Customer {
...
private String name;
public static List<String> listUniqueNames() {
return = getEntityManager().createNamedQuery(
"Customer.listUniqueNames", String.class)
.getResultList();
}
}
我同意kazanaki的答案,它对我很有帮助。 我想选择整个实体,所以我使用了
select DISTINCT(c) from Customer c
在我的情况下,我有一个多对多的关系,并且我希望在一次查询中加载带有集合的实体。SELECT DISTINCT new com.mypackage.MyNameType(c.name) from Customer c
我正在添加一个略微具体的答案,以防某人遇到与我相同的问题并找到了这个问题。
我使用带有查询注释的JPQL(无查询构建)。
我需要获取嵌入到另一个实体中的实体的不同值,该关系通过ManyToOne注释进行断言。
我有两个数据库表:
在Java Spring代码中,这导致实现了三个类:
LinkEntity :
@Entity
@Immutable
@Table(name="link_entity")
public class LinkEntity implements Entity {
@EmbeddedId
private LinkEntityPK pk;
// ... Getter, setter, toString()
}
LinkEntityPK:
@Embeddable
public class LinkEntityPK implements Entity, Serializable {
/** The main entity we want to have distinct values of */
@ManyToOne
@JoinColumn(name = "code_entity")
private MainEntity mainEntity;
/** */
@Column(name = "code_pk2")
private String codeOperation;
/** */
@Column(name = "code_pk3")
private String codeFonction;
主实体:
@Entity
@Immutable
@Table(name = "main_entity")
public class MainEntity implements Entity {
/** We use this for LinkEntity*/
@Id
@Column(name="code_entity")
private String codeEntity;
private String name;
// And other attributes, getters and setters
因此,获取主实体的不同值的最终查询为:
@Repository
public interface EntityRepository extends JpaRepository<LinkEntity, String> {
@Query(
"Select " +
"Distinct linkEntity.pk.intervenant " +
"From " +
"LinkEntity as linkEntity " +
"Join MainEntity as mainEntity On " +
"mainEntity = linkEntity.pk.mainEntity ")
List<MainEntity> getMainEntityList();
}