慢的Hibernate/C3P0如何处理非慢速的Postgres SELECT?

3
针对Postgres数据库的某个索引SELECT查询需要的时间非常不确定 - 从50毫秒到几秒钟,甚至在最轻载时偶尔要花费分钟级别的时间。
我们的Postgres查询日志记录超过10毫秒的所有内容,但从未记录任何这些内容。EXPLAIN输出表明该查询并不特别高效,但在这个小型数据库(数千条记录)中不应该如此缓慢,并且我们相信Postgres日志。
通过设置我们的应用程序日志来报告所有的Hibernate、C3P0和Spring/Spring Data日志(请参见末尾的版本号),证据表明这非常可能是Hibernate/C3P0的问题,然而,日志中的所有证据都表明池大小和利用率目前都很好。不幸的是,我们无法进一步深入研究。
你能为26秒的间隔提供一个解释吗?
10:19:29.149 DEBUG org.hibernate.SQL [I=9534] - select eventrepor0_.consortium_id as consorti1_3_3_, eventrepor0_.customer_resource_id as customer6_3_3_, eventrepor0_.item_type_id as item2_3_3_, eventrepor0_.reporting_date as reportin3_3_3_, eventrepor0_.event_subtype as event4_3_3_, eventrepor0_.event_count as event5_3_3_, customerre1_.id as id1_2_0_, customerre1_.customer_id as customer2_2_0_, customerre1_.resource_id as resource3_2_0_, resource2_.id as id1_8_1_, resource2_.data_type_id as data2_8_1_, resource2_.platform_id as platform5_8_1_, resource2_.prop_id as prop3_8_1_, resource2_.title as title4_8_1_, resource2_1_.doi as doi1_6_1_, resource2_1_.isbn as isbn2_6_1_, resource2_1_.online_issn as online3_6_1_, resource2_1_.print_issn as print4_6_1_, resource2_1_.publisher as publishe5_6_1_, resource2_1_.yop as yop6_6_1_, case when resource2_1_.id is not null then 1 when resource2_.id is not null then 0 end as clazz_1_, platform3_.id as id1_4_2_, platform3_.api_key as api2_4_2_, platform3_.platform_name as platform3_4_2_, hostnames4_.platform_id as platform1_4_5_, hostnames4_.hostname as hostname2_5_5_ from event_report eventrepor0_ inner join customer_resource customerre1_ on eventrepor0_.customer_resource_id=customerre1_.id left outer join resource resource2_ on customerre1_.resource_id=resource2_.id left outer join published_resource resource2_1_ on resource2_.id=resource2_1_.id left outer join platform platform3_ on resource2_.platform_id=platform3_.id left outer join platform_hostnames hostnames4_ on platform3_.id=hostnames4_.platform_id where eventrepor0_.consortium_id=? and eventrepor0_.customer_resource_id=? and eventrepor0_.item_type_id=? and eventrepor0_.reporting_date=? and eventrepor0_.event_subtype=?
10:19:29.149 DEBUG c.m.v.a.ThreadPoolAsynchronousRunner [I=9534] - com.mchange.v2.async.ThreadPoolAsynchronousRunner@4ffa2724: Adding task to queue -- com.mchange.v2.c3p0.stmt.GooGooStatementCache$1StmtAcquireTask@31e6b320
10:19:29.149 DEBUG c.m.v.c3p0.stmt.GooGooStatementCache [I=9534] - CULLING: update event_report set event_count=event_count+1 where customer_resource_id=? and item_type_id=? and event_subtype=? and reporting_date=? and consortium_id=?
10:19:29.149 DEBUG c.m.v.a.ThreadPoolAsynchronousRunner [I=9534] - com.mchange.v2.async.ThreadPoolAsynchronousRunner@4ffa2724: Adding task to queue -- com.mchange.v2.c3p0.stmt.GooGooStatementCache$StatementDestructionManager$1UncheckedStatementCloseTask@20fa1378
10:19:29.149 DEBUG c.m.v.c3p0.stmt.GooGooStatementCache [I=9534] - cxnStmtMgr.statementSet( org.postgresql.jdbc4.Jdbc4Connection@38e040d2 ).size(): 5
10:19:29.150 DEBUG c.m.v.c3p0.stmt.GooGooStatementCache [I=9534] - checkoutStatement: com.mchange.v2.c3p0.stmt.GlobalMaxOnlyStatementCache stats -- total size: 20; checked out: 5; num connections: 6; num keys: 20
10:19:29.150 TRACE o.h.e.j.internal.JdbcCoordinatorImpl [I=9534] - Registering statement [com.mchange.v2.c3p0.impl.NewProxyPreparedStatement@7cc20161]
10:19:29.150 TRACE o.h.e.j.internal.JdbcCoordinatorImpl [I=9534] - Registering last query statement [com.mchange.v2.c3p0.impl.NewProxyPreparedStatement@7cc20161]
10:19:29.150 TRACE o.h.type.descriptor.sql.BasicBinder [I=9534] - binding parameter [1] as [VARCHAR] - 
10:19:29.150 TRACE o.h.type.descriptor.sql.BasicBinder [I=9534] - binding parameter [2] as [BIGINT] - 47
10:19:29.150 TRACE org.hibernate.type.EnumType [I=9534] - Binding [SEARCH_REG] to parameter: [3]
10:19:29.150 TRACE o.h.type.descriptor.sql.BasicBinder [I=9534] - binding parameter [4] as [TIMESTAMP] - Tue Jul 16 00:00:00 BST 2013
10:19:29.151 TRACE o.h.type.descriptor.sql.BasicBinder [I=9534] - binding parameter [5] as [VARCHAR] - 
10:19:29.151 TRACE org.hibernate.loader.Loader [I=9534] - Bound [6] parameters total
[... massive gap ...]
10:19:55.644 TRACE o.h.e.j.internal.JdbcCoordinatorImpl [I=9534] - Registering result set [com.mchange.v2.c3p0.impl.NewProxyResultSet@fa7b109]

关于并发性的注意事项:即使只有一个请求,也存在很大的变化范围:端到端时间为50-300毫秒,但当一个用户提交大约100个这样的查询批处理(可能有10-20个并发运行时),很有可能会有几个需要5-10秒钟。然而,C3P0统计数据从来没有更糟:
com.mchange.v2.c3p0.stmt.GlobalMaxOnlyStatementCache stats -- total size: 20; checked out: 6; num connections: 6; num keys: 20

这些服务器非常强大,因此没有明显的磁盘、网络或CPU活动。我们使用NewRelic进行监控。

我们的数据源设置如下:

ComboPooledDataSource dataSource = new com.mchange.v2.c3p0.ComboPooledDataSource();
dataSource.setInitialPoolSize(5);
dataSource.setMaxPoolSize(20);
dataSource.setMinPoolSize(5);
dataSource.setMaxStatements(20);
dataSource.setIdleConnectionTestPeriod(3600);
dataSource.setTestConnectionOnCheckin( Boolean.TRUE.toString() );
dataSource.setPreferredTestQuery("select 1");

JPA属性:

props.put("hibernate.dialect", "org.hibernate.dialect.PostgreSQL82Dialect");
props.put("hibernate.show_sql", "false");
props.put("generate_statistics", "false");
props.put("javax.persistence.sharedCache.mode", "ENABLE_SELECTIVE");
props.put("javax.persistence.validation.mode", "NONE");
props.put("hibernate.cache.use_second_level_cache", "false");
props.put("hibernate.cache.region.factory_class", "org.hibernate.cache.impl.NoCachingRegionFactory");
props.put("hibernate.hbm2ddl.auto", "false");

版本:Postgres 9.1.7,使用最新的9.2 JDBC驱动程序; Hibernate 4.2.3.Final; C3P0 0.9.2.1; Spring 3.2.2.RELEASE; Spring Data JPA 1.1.0; Tomcat 7; JDK 1.7
更新 - 我们目前正在使用的C3P0属性(切换到使用maxStatementsPerConnection后)
c.m.v.c.i.AbstractPoolBackedDataSource [] - 初始化c3p0池... com.mchange.v2.c3p0.ComboPooledDataSource [ acquireIncrement -> 3, acquireRetryAttempts -> 30, acquireRetryDelay -> 1000, autoCommitOnClose -> false, automaticTestTable -> null, breakAfterAcquireFailure -> false, checkoutTimeout -> 0, connectionCustomizerClassName -> null, connectionTesterClassName -> com.mchange.v2.c3p0.impl.DefaultConnectionTester, dataSourceName -> 2s05p58v1s6oref13lw967|538ab4bc, debugUnreturnedConnectionStackTraces -> false, description -> null, driverClass -> org.postgresql.Driver, factoryClassLocation -> null, forceIgnoreUnresolvedTransactions -> false, identityToken -> 2s05p58v1s6oref13lw967|538ab4bc, idleConnectionTestPeriod -> 1800, initialPoolSize -> 5, jdbcUrl -> jdbc:postgresql://******, maxAdministrativeTaskTime -> 0, maxConnectionAge -> 0, maxIdleTime -> 0, maxIdleTimeExcessConnections -> 0, maxPoolSize -> 20, maxStatements -> 0, maxStatementsPerConnection -> 20, minPoolSize -> 5, numHelperThreads -> 3, preferredTestQuery -> select 1, properties -> {user=******, password=******}, propertyCycle -> 0, statementCacheNumDeferredCloseThreads -> 0, testConnectionOnCheckin -> true, testConnectionOnCheckout -> false, unreturnedConnectionTimeout -> 0, userOverrides -> {}, usesTraditionalReflectiveProxies -> false ]

尝试切换连接池,例如Commons-DBCP或Tomcat连接池,以查看这是否是池或Hibernate的问题。如果您的机器数量较少且性能强大,则还可以尝试将池大小增加到200。 - Chris Vest
垃圾回收日志中有任何异常情况吗? - DaveH
如果是数据库端锁定的话,系统日志中会看到长时间的持续时间;log_min_duration_statement输出包括等待锁定的时间。所以我会去查看客户端。 - Craig Ringer
谢谢大家。我已经增加了池大小,这似乎有所帮助,尽管我不愿意在这个阶段改变池实现方式。垃圾回收方面没有太多进展。 - Andrew Regan
1个回答

5
我不能确定这是否是你的问题原因,但我认为有很大的可能性!
你将maxStatements设置为一个远远低于你所承载负荷的值。尝试将maxStatements设置为零(关闭语句缓存),或者尝试将maxStatementsPerConnection设置为20,这可能是你想要的。目前,你已经设置了全局最大值为20个准备好的语句,由最多20个连接共享。这不太可能产生良好的性能。

谢谢您的建议,我相信您对maxStatements的误用是正确的。目前更改为使用maxStatementsPerConnection并将MaxPoolSize更新为100似乎正在起作用,我只偶尔看到超过1秒的时间。我会再给它一个小时或两个小时来适应... - Andrew Regan
情况肯定有所改善,我们已经走上了正确的轨道,所以我认为这个问题已经解决了 - 谢谢! - Andrew Regan

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接