使用分区来实现动态租户的多模式多租户

Question

使用分区来实现动态租户的多模式多租户

springdynamiceclipselinkpartitioningmulti-tenant

3

我正在编写一个必须支持多租户的Web应用程序。我正在使用JPA作为持久层，并对EclipseLink进行评估。

我想使用的多租户策略是：每个客户一个模式。 Hibernate支持这种策略（http://docs.jboss.org/hibernate/orm/4.2/devguide/en-US/html/ch16.html#d5e4771），我已经成功地使用过了。但是，据我所知，它仅在使用本机Hibernate API时支持，而我要使用JPA。

另一方面，EclipseLink支持单表和多表多租户策略。然而，它还支持分区，通过简单的自定义分区策略，我可以轻松地为每个客户设置一个分区。

第一个问题可能是，对于这种情况是否适合使用分区。

然而，主要问题在于客户群可能（希望）随时间增长，因此我必须使EclipseLink动态地“了解”新客户（即：无需重新启动webapp）。据我所知，为在EclipseLink中设置分区，我必须使用不同的“连接池”（或“节点”）来设置持久性单元：每个节点都有其配置的数据源和名称。另一方面，分区策略将通过其名称确定要使用的节点。到目前为止还好，但我计划使用Spring的LocalContainerEntityManagerFactoryBean设置我的持久性单元。当处理LocalContainerEntityManagerFactoryBean时，我可能会在启动时动态发现客户，以便可以通过那个时间传递所有节点/客户的所需属性，但如果之后添加新客户会发生什么？我认为动态更改持久性单元属性对已构建的EntityManagerFactory单例实例不会产生任何影响...我担心如果我请求一个不存在于EntityManagerFactory创建时已知的相应节点的分区，EclipseLink会抱怨。如果我错了，请纠正我。

我认为将LocalContainerEntityManagerFactoryBean作用域声明为“prototype” bean会是一个非常糟糕的想法，而且我认为根本行不通。另一方面，由于客户交互绑定到特定的HTTP会话，我可以选择使用“中间”方法，将LocalContainerEntityManagerFactoryBean作用域声明为“session”，但我认为在这种情况下，我将不得不处理像增加的内存消耗和多个EntityManagerFactories之间的共享缓存协调（每个客户在给定时间使用应用程序时都有一个）的问题。

如果我无法使这种策略奏效，我认为我将不得不放弃分区作为一个整体，回到“动态数据源路由”方法，但在这种情况下，我担心EclipseLink共享缓存一致性（我认为我将不得不完全禁用它，这将是一个真正的劣势）。

提前感谢您对此的任何反馈。

- Mauro Molinari

2个回答

0

请查看EclipseLink EntityManagerFactory类上的refreshMetadata方法，详见http://wiki.eclipse.org/EclipseLink/DesignDocs/340192#EntityManagerFactory，它会导致单例重新加载配置数据。这不会影响正在运行的EntityManager实例，但会导致任何新获取的EntityManagers使用新的配置数据，似乎符合您的需求。

需要取消包装EntityManagerFactory以访问http://javadox.com/org.eclipse.persistence/eclipselink/2.5.0/org/eclipse/persistence/jpa/JpaEntityManagerFactory.html接口：

JpaHelper.getEntityManagerFactory(em).refreshMetadata(properties);

- Chris

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Mauro Molinari · Accepted Answer

说实话，我没有尝试克里斯的建议，而是选择了一种更精细调整的解决方案。这是我的解决方案。

in my case, tenant = customer; each customer data is in its own database schema, potentially located in a dedicated DBMS instance (of whatever vendor); in other words, I have one different data source per customer
since I use partitioning, this means that every customer has its own partition; each partition is identified by the corresponding unique customer id
every user that logs into the application belongs to a different customer; I use Spring Security to handle authentication and authorization, hence I can retrieve information about the user (including its owning customer) by querying the SecurityContextHolder
I defined my own EclipseLink PartitioningPolicy which determines the customer of the currently logged in user as described in the previous point, and then returns a list containing an only Accessor that identifies that customer partition

all my tables must be partitioned and I don't want to specify that on EVERY entity with annotations, so I registered this partitioning policy into EclipseLink on startup and set it as the default one; briefly:

JpaEntityManagerFactory jpaEmf = entityManagerFactory.unwrap(JpaEntityManagerFactory.class);
ServerSession serverSession = jpaEmf.getServerSession();
serverSession.getProject().addPartitioningPolicy(myCustomerPolicy);
serverSession.setPartitioningPolicy(myCustomerPolicy);

接下来，为了动态添加数据源到EclipseLink（在EclipseLink术语中称为“连接池”），使得上述策略中指定的客户ID与EclipseLink中已知的“连接池”相匹配，我执行以下操作：

a listener intercepts any user successful login

this listener queries EclipseLink to see it already knows about a connection pool identified by the user customer id; if it does, we're done, EclipseLink can correctly handle the partition; otherwise a new connection pool is created and added to EclipseLink; proof of concept:

String customerId = principal.getCustomerId();
JpaEntityManagerFactory jpaEmf = entityManagerFactory.unwrap(JpaEntityManagerFactory.class);
ServerSession serverSession = jpaEmf.getServerSession();
if (!serverSession.getConnectionPools().containsKey(customerId)) {
  DataSource customerDataSource = createDataSourceForCustomer(customerId);
  DatabaseLogin login = new DatabaseLogin();
  login.useDataSource(customerId);
  login.setConnector(new JNDIConnector(customerDataSource));
  Class<? extends DatabasePlatform> databasePlatformClass = determineDbVendorPlatform(customerId);
  login.usePlatform(databasePlatformClass.newInstance());
  ConnectionPool connectionPool = new ExternalConnectionPool(customerId, login, serverSession);
  connectionPool.startUp();
  serverSession.addConnectionPool(connectionPool);
}

用户登录操作当然是针对中央数据库（或任何其他身份验证来源）进行的，因此在执行任何特定于客户的JPA查询之前执行上述代码（因此在分区策略首次引用它之前将客户连接池添加到EclipseLink中）。

但需要考虑一个重要方面。在EclipseLink中，数据分区意味着可识别的数据片段（即实体实例）只存在于一个分区中，或者在多个分区中平等地复制。实体实例的标识（即主键）确定实体实例的唯一性。这意味着不应该存在两个具有相同id=x 的类型为E的不同客户/租户T1和T2的实体实例，否则EclipseLink可能会认为它们是完全相同的实体实例。这可能导致在单个JPA会话期间读取/写入来自不同客户的混合数据 =& gt; 造成灾难。可能的解决方案：

在这种情况下，要使用的分区由当前登录用户确定；这意味着在HTTP会话范围内执行的每个查询都将是相同的；由于我使用了事务作用域实体管理器，其生命周期最多等于请求持续时间（该时间又延伸到HTTP会话），因此仅禁用EclipseLink共享缓存即可避免来自不同客户的数据混合，但这仍然是不理想的。我能找到的最佳选择是确保所有ID（主键）都是由EclipseLink集中交叉客户处理生成的，并且id=x用于实体的生成只分配给一个客户的一个实体实例。这实际上意味着将ID分配序列“分区”到客户端，并防止使用MySQL自增列（也称为数据库标识生成类型）。因此，我选择使用表生成类型用于实体标识符，并将该表放置在存储用户和客户信息的中央数据库中。

实现选项2的最后一个小问题是，即使EclipseLink文档说可以使用eclipselink.connection-pool.sequence配置选项指定专用于表序列的连接池（=数据源），但在设置了如上所述的默认分区策略时，似乎会被忽略。事实上，我的客户端分区策略会为每个查询调用，甚至用于ID分配的查询。因此，该策略必须拦截这些查询并将它们路由到中央数据源。

我找不到此问题的明确解决方法，但我能想到的最佳选项是：

如果查询的SQL字符串以“UPDATE SEQUENCE”开头，则意味着它是用于ID分配的查询，假设专用于序列分配的表称为SEQUENCE（这是默认值）。
如果您采用向生成器添加SEQUENCE后缀的约定，则如果执行的查询名称以“SEQUENCE”结尾，则表示它是用于ID分配的查询。

我选择了选项2，并正确定义了我的ID生成映射。

@Entity
public class MyEntity {
  @Id
  @TableGenerator(name = "MyEntity_SEQUENCE", allocationSize = 10)
  @GeneratedValue(generator = "MyEntity_SEQUENCE")
  private Long id;
}

这将使EclipseLink使用名为SEQUENCE的表，其中包含一行，其SEQ_NAME列值为MyEntity_SEQUENCE。用于更新此ID分配序列的查询将被命名为MyEntity_SEQUENCE，我们完成了。但是，我使我的分区策略可配置，以便在任何时候可以从一种序列查询标识策略切换到另一种，以防EclipseLink实现中发生破坏此“启发式算法”的情况。

这基本上就是整个情况。目前，它一直运作良好。欢迎反馈、改进和建议。