如何使用JpaRepository进行批量（多行）插入？

Question

如何使用JpaRepository进行批量（多行）插入？

hibernatespring-bootkotlinspring-data-jpacockroachdb

108

当我在服务层使用一个长的List<Entity>，并调用JpaRepository的saveAll方法时，Hibernate的跟踪日志显示每个实体都发出了单个SQL语句。

我能否强制它执行批量插入（即多行），而不需要手动操作EntityManger、事务等，甚至是原始的SQL语句字符串？

所谓的多行插入是指不仅从以下情况转变而来：

start transaction
INSERT INTO table VALUES (1, 2)
end transaction
start transaction
INSERT INTO table VALUES (3, 4)
end transaction
start transaction
INSERT INTO table VALUES (5, 6)
end transaction

至：

start transaction
INSERT INTO table VALUES (1, 2)
INSERT INTO table VALUES (3, 4)
INSERT INTO table VALUES (5, 6)
end transaction

但是改为：

start transaction
INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)
end transaction

在生产环境中，我正在使用CockroachDB，性能差异显著。

以下是一个最小化的示例，用于重现问题（为简单起见，我们使用H2）。

./src/main/kotlin/ThingService.kt:

package things

import org.springframework.boot.autoconfigure.SpringBootApplication
import org.springframework.boot.runApplication
import org.springframework.web.bind.annotation.RestController
import org.springframework.web.bind.annotation.GetMapping
import org.springframework.data.jpa.repository.JpaRepository
import javax.persistence.Entity
import javax.persistence.Id
import javax.persistence.GeneratedValue

interface ThingRepository : JpaRepository<Thing, Long> {
}

@RestController
class ThingController(private val repository: ThingRepository) {
    @GetMapping("/test_trigger")
    fun trigger() {
        val things: MutableList<Thing> = mutableListOf()
        for (i in 3000..3013) {
            things.add(Thing(i))
        }
        repository.saveAll(things)
    }
}

@Entity
data class Thing (
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
)

@SpringBootApplication
class Application {
}

fun main(args: Array<String>) {
    runApplication<Application>(*args)
}

./src/main/resources/application.properties：

jdbc.driverClassName = org.h2.Driver
jdbc.url = jdbc:h2:mem:db
jdbc.username = sa
jdbc.password = sa

hibernate.dialect=org.hibernate.dialect.H2Dialect
hibernate.hbm2ddl.auto=create

spring.jpa.generate-ddl = true
spring.jpa.show-sql = true

spring.jpa.properties.hibernate.jdbc.batch_size = 10
spring.jpa.properties.hibernate.order_inserts = true
spring.jpa.properties.hibernate.order_updates = true
spring.jpa.properties.hibernate.jdbc.batch_versioned_data = true

./build.gradle.kts:

import org.jetbrains.kotlin.gradle.tasks.KotlinCompile

plugins {
    val kotlinVersion = "1.2.30"
    id("org.springframework.boot") version "2.0.2.RELEASE"
    id("org.jetbrains.kotlin.jvm") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.spring") version kotlinVersion
    id("org.jetbrains.kotlin.plugin.jpa") version kotlinVersion
    id("io.spring.dependency-management") version "1.0.5.RELEASE"
}

version = "1.0.0-SNAPSHOT"

tasks.withType<KotlinCompile> {
    kotlinOptions {
        jvmTarget = "1.8"
        freeCompilerArgs = listOf("-Xjsr305=strict")
    }
}

repositories {
    mavenCentral()
}

dependencies {
    compile("org.springframework.boot:spring-boot-starter-web")
    compile("org.springframework.boot:spring-boot-starter-data-jpa")
    compile("org.jetbrains.kotlin:kotlin-stdlib-jdk8")
    compile("org.jetbrains.kotlin:kotlin-reflect")
    compile("org.hibernate:hibernate-core")
    compile("com.h2database:h2")
}

运行：

./gradlew bootRun

触发数据库插入操作：

curl http://localhost:8080/test_trigger

日志输出：

Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: select thing0_.id as id1_0_0_, thing0_.value as value2_0_0_ from thing thing0_ where thing0_.id=?
Hibernate: call next value for hibernate_sequence
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)
Hibernate: insert into thing (value, id) values (?, ?)

- Tobias Hermann

@Cepr0 谢谢，但我已经在做这件事（累积到列表中并调用saveAll）。我只是添加了一个最小的代码示例来重现问题。 - Tobias Hermann

你设置了 hibernate.jdbc.batch_size 属性吗？ - Cepr0

@Cepr0 是的。（请参见上文） - Tobias Hermann

3

错误的，它必须以这种形式出现：spring.jpa.properties.hibernate.jdbc.batch_size。 - Cepr0

你展示的是批量插入。批量插入是一种更快的技术，但它是特定于数据库的，并且不受JPA支持。 - Razvan P

显示剩余2条评论

6个回答

20

SimpleJpaRepository 中的以下代码是潜在的问题：

@Transactional
public <S extends T> S save(S entity) {
    if (entityInformation.isNew(entity)) {
        em.persist(entity);
        return entity;
    } else {
        return em.merge(entity);
    }
}

除了批处理大小属性设置外，您还需要确保类SimpleJpaRepository调用persist而不是merge。有几种方法可以解决此问题：使用不查询序列的@Id生成器，如

@Id
@GeneratedValue(generator = "uuid2")
@GenericGenerator(name = "uuid2", strategy = "uuid2")
var id: Long

如果你的实体实现了Persistable接口，并重写了isNew()调用，可以强制持久化将记录视为新的。

@Entity
class Thing implements Pesistable<Long> {
    var value: Int,
    @Id
    @GeneratedValue
    var id: Long = -1
    @Transient
    private boolean isNew = true;
    @PostPersist
    @PostLoad
    void markNotNew() {
        this.isNew = false;
    }
    @Override
    boolean isNew() {
        return isNew;
    }
}

或者重写 save(List) 方法并使用实体管理器调用 persist()。

@Repository
public class ThingRepository extends SimpleJpaRepository<Thing, Long> {
    private EntityManager entityManager;
    public ThingRepository(EntityManager entityManager) {
        super(Thing.class, entityManager);
        this.entityManager=entityManager;
    }

    @Transactional
    public List<Thing> save(List<Thing> things) {
        things.forEach(thing -> entityManager.persist(thing));
        return things;
    }
}

以上代码基于以下链接:

- Jean Marois

1

感谢Jean分享有用的链接。但是，使用Persistable方法持久化@Generated @Id值仍存在问题。只有当我手动按自己的逻辑设置id字段时批处理才会执行。如果我依赖于@Generated反射Long类型的id属性，则语句不会批量运行。你分享的所有链接都没有使用@Generated类型策略与Persistable方法。我甚至检查了第二个链接中提供的Github代码链接，但它也是手动分配id属性。 - iamharish15

我认为这个回复没有被真正理解（和足够赞赏）。我自己也发现了saveAll的同样问题。因此，重新表述一下问题：如果您有工作批处理，您的实体不使用生成的ID，并且您使用SimpleJpaRepository与saveAll，则：1. saveAll将在循环中使用save 2. save将调用entityInformation.isNew(entity)，对于每个调用都会得到false的响应。3.将为每个实体调用合并。4.如果我理解正确，这些合并调用首先选择，而这些选择无法批处理，因此由于saveAll实现不正确，这将创建N + 1问题。 - Martin Mucha

2

使用Spring和JPA进行批处理 https://medium.com/@clydecroix/batching-database-writes-in-spring-479bee626fbf?sk=8ee224e83a830a6cce92fa4e3e76967e - Clyde D'Cruz

嗨，请问你能帮我查看一下关于多对多问题的提问吗？ https://stackoverflow.com/questions/77257277/spring-updating-many-to-manay-reletionship/77259130?noredirect=1#comment136208510_77259130 - undefined

9

您可以配置Hibernate来执行批量DML操作。请参考Spring Data JPA - concurrent Bulk inserts/updates。我认为答案的第2部分可以解决您的问题：

启用DML语句的批处理支持会减少插入/更新相同记录数所需的数据库往返次数。

以下是批量插入和更新语句的引用：

hibernate.jdbc.batch_size = 50

hibernate.order_inserts = true hibernate.order_updates = true hibernate.jdbc.batch_versioned_data = true 更新内容：您需要在application.properties文件中以不同的方式设置hibernate属性。它们位于命名空间下：spring.jpa.properties.*。例如，以下是一个示例：

spring.jpa.properties.hibernate.jdbc.batch_size = 50
spring.jpa.properties.hibernate.order_inserts = true
....

- rieckpil

谢谢你的建议。我已经尝试过了，但是没有成功。我在我的问题中添加了一个最小代码示例来重现这个问题，即使使用你提供的设置也是如此。 - Tobias Hermann

谢谢，我调整了我的配置（并相应更新了我的问题），但仍然没有运气。 - Tobias Hermann

你尝试过使用不同的数据库吗？或者你是否必须使用H2？@TobiasHermann我建议下一步尝试使用MySQL数据库。并非所有的数据库驱动程序都能正确实现JDBC批量插入/更新。 - rieckpil

我尝试使用CockroachDB 2.0.2。它支持多行插入，并且当我在应用程序中手动创建所需的java.sql.PreparedStatement并使用javax.sql.DataSource的原始java.sql.Connection发送时，速度大约快了10倍。 - Tobias Hermann

这个意思是什么：spring.jpa.properties.hibernate.order_inserts？ - java dev

嗨，你能帮我看一下关于多对多问题的问题吗？ https://stackoverflow.com/questions/77257277/spring-updating-many-to-manay-reletionship/77259130?noredirect=1#comment136208510_77259130 - undefined

3

所有提到的方法都可以使用，但如果插入数据的源在其他表中，则速度会很慢。首先，即使使用 batch_size>1，插入操作也将以多个 SQL 查询执行。其次，如果源数据位于其他表中，则需要使用其他查询获取数据（在最坏的情况下，将所有数据加载到内存中），并将其转换为静态批量插入。第三，即使启用了批处理，对于每个实体使用单独的 persist() 调用，将导致实体管理器一级缓存充斥着所有这些实体实例。

但是，Hibernate 还有另一种选择。如果您将 Hibernate 用作 JPA 提供程序，则可以回退到支持从其他表中使用子查询本地支持批量插入的 HQL。以下是示例：

Session session = entityManager.unwrap(Session::class.java)
session.createQuery("insert into Entity (field1, field2) select [...] from [...]")
  .executeUpdate();

这将取决于您的ID生成策略。如果 Entity.id 是由数据库生成的（例如MySQL自动递增），则会成功执行。如果 Entity.id 是由您的代码生成的（特别是针对UUID生成器），则会出现“不支持的ID生成方法”异常。

然而，在后一种情况下，可以通过自定义SQL函数来解决此问题。例如，在PostgreSQL中，我使用 uuid-ossp 扩展程序，它提供了 uuid_generate_v4() 函数，最后在我的自定义对话框中注册它。

import org.hibernate.dialect.PostgreSQL10Dialect;
import org.hibernate.dialect.function.StandardSQLFunction;
import org.hibernate.type.PostgresUUIDType;

public class MyPostgresDialect extends PostgreSQL10Dialect {

    public MyPostgresDialect() {
        registerFunction( "uuid_generate_v4", 
            new StandardSQLFunction("uuid_generate_v4", PostgresUUIDType.INSTANCE));
    }
}

然后我将这个类注册为一个Hibernate对话框：

hibernate.dialect=MyPostgresDialect

最后我可以在批量插入查询中使用这个函数：

SessionImpl session = entityManager.unwrap(Session::class.java);
session.createQuery("insert into Entity (id, field1, field2) "+
  "select uuid_generate_v4(), [...] from [...]")
  .executeUpdate();

最重要的是Hibernate生成的底层SQL语句，以完成这个操作，而且只需要一条查询语句：

insert into entity ( id, [...] ) select uuid_generate_v4(), [...] from [...]

- Lukasz Frankowski

嗨，你能帮我看一下关于多对多问题的问题吗？ https://stackoverflow.com/questions/77257277/spring-updating-many-to-manay-reletionship/77259130?noredirect=1#comment136208510_77259130 - undefined

2

Hibernate使用事务写后策略自动执行批量插入、更新或删除。

但是仅设置属性spring.jpa.properties.hibernate.jdbc.batch_size=100是不够的，我们还必须将ID生成器设置为@GeneratedValue(strategy = GenerationType.SEQUENCE, generator = "seq_generator") 如果在实体上使用GenerationType.AUTO或GenerationType.IDENTITY，则在此情况下批量插入和更新将无法工作。因为在这种情况下，Hibernate不知道要插入的Id值，因为它是在DB级别上生成的，所以它禁用了批量插入并进行单个插入。

因此，为了使用批量插入和更新，我们的实体应该具有序列作为ID生成器。

- Pankaj Singh

1

我曾经遇到同样的问题，但是我无法在批处理中看到我的Hibernate查询，我意识到查询并没有转化为实际查询的内容。但是为了确保批量操作，您可以启用生成统计信息spring.jpa.properties.hibernate.generate_statistics=true，然后您将会看到：

当您添加spring.jpa.properties.hibernate.jdbc.batch_size=100时，您将开始看到一些差异，如较少的jdbc语句和更多的jdbc批次：

- Guilherme Alencar

在遇到这个问题之前，我花了很长时间来找出我的配置问题所在，直到检查了Hibernate统计信息。谢谢你让我知道我不是唯一一个没有超越Hibernate初始日志的人。 - void

我已经在那里了@void，感谢反馈。 - Guilherme Alencar

嗨，你能帮我看一下关于多对多问题的问题吗？ https://stackoverflow.com/questions/77257277/spring-updating-many-to-manay-reletionship/77259130?noredirect=1#comment136208510_77259130 - undefined

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Cepr0 · Accepted Answer

要使用Spring Boot和Spring Data JPA进行批量插入，您只需要两件事：

将选项spring.jpa.properties.hibernate.jdbc.batch_size设置为所需的适当值（例如：20）。
使用准备好进行插入的实体列表的repo的saveAll()方法。

此处有一个工作示例here。

关于将插入语句转换为以下内容的问题：

INSERT INTO table VALUES (1, 2), (3, 4), (5, 6)

在 PostgreSQL 中可以使用此功能：您可以在 JDBC 连接字符串中将选项 reWriteBatchedInserts 设置为 true：

jdbc:postgresql://localhost:5432/db?reWriteBatchedInserts=true

然后jdbc驱动程序将执行这个转换。

有关批处理的其他信息，您可以在这里找到。

更新

Kotlin演示项目：sb-kotlin-batch-insert-demo

更新

如果使用IDENTITY标识生成器，则Hibernate会在JDBC级别自动禁用插入批处理。