Cassandra Datastax驱动程序引发写入超时异常。

41

在执行大量数据的批量加载时,基于日志数据递增计数器时,我遇到了超时异常。我正在使用Datastax 2.0-rc2 Java驱动程序。

这是否是服务器无法跟上(即服务器端配置问题)导致的问题,还是客户端因等待服务器响应而感到厌烦导致的问题?无论哪种情况,是否有简单的配置更改可以解决这个问题?

Exception in thread "main" com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    at com.datastax.driver.core.ResultSetFuture.extractCauseFromExecutionException(ResultSetFuture.java:271)
    at com.datastax.driver.core.ResultSetFuture.getUninterruptibly(ResultSetFuture.java:187)
    at com.datastax.driver.core.Session.execute(Session.java:126)
    at jason.Stats.analyseLogMessages(Stats.java:91)
    at jason.Stats.main(Stats.java:48)
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.exceptions.WriteTimeoutException.copy(WriteTimeoutException.java:54)
    at com.datastax.driver.core.Responses$Error.asException(Responses.java:92)
    at com.datastax.driver.core.ResultSetFuture$ResponseCallback.onSet(ResultSetFuture.java:122)
    at com.datastax.driver.core.RequestHandler.setFinalResult(RequestHandler.java:224)
    at com.datastax.driver.core.RequestHandler.onSet(RequestHandler.java:373)
    at com.datastax.driver.core.Connection$Dispatcher.messageReceived(Connection.java:510)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline$DefaultChannelHandlerContext.sendUpstream(DefaultChannelPipeline.java:791)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:296)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.unfoldAndFireMessageReceived(FrameDecoder.java:462)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.callDecode(FrameDecoder.java:443)
    at org.jboss.netty.handler.codec.frame.FrameDecoder.messageReceived(FrameDecoder.java:303)
    at org.jboss.netty.channel.SimpleChannelUpstreamHandler.handleUpstream(SimpleChannelUpstreamHandler.java:70)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:564)
    at org.jboss.netty.channel.DefaultChannelPipeline.sendUpstream(DefaultChannelPipeline.java:559)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:268)
    at org.jboss.netty.channel.Channels.fireMessageReceived(Channels.java:255)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:88)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at org.jboss.netty.util.ThreadRenamingRunnable.run(ThreadRenamingRunnable.java:108)
    at org.jboss.netty.util.internal.DeadLockProofWorker$1.run(DeadLockProofWorker.java:42)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
    at java.lang.Thread.run(Thread.java:744)
Caused by: com.datastax.driver.core.exceptions.WriteTimeoutException: Cassandra timeout during write query at consistency ONE (1 replica were required but only 0 acknowledged the write)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:53)
    at com.datastax.driver.core.Responses$Error$1.decode(Responses.java:33)
    at com.datastax.driver.core.Message$ProtocolDecoder.decode(Message.java:165)
    at org.jboss.netty.handler.codec.oneone.OneToOneDecoder.handleUpstream(OneToOneDecoder.java:66)
    ... 21 more

其中一个节点大约在事件发生时报告了这个情况:

ERROR [Native-Transport-Requests:12539] 2014-02-16 23:37:22,191 ErrorMessage.java (line 222) Unexpected exception during request
java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
    at sun.nio.ch.SocketDispatcher.read(Unknown Source)
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(Unknown Source)
    at sun.nio.ch.IOUtil.read(Unknown Source)
    at sun.nio.ch.SocketChannelImpl.read(Unknown Source)
    at org.jboss.netty.channel.socket.nio.NioWorker.read(NioWorker.java:64)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.process(AbstractNioWorker.java:109)
    at org.jboss.netty.channel.socket.nio.AbstractNioSelector.run(AbstractNioSelector.java:312)
    at org.jboss.netty.channel.socket.nio.AbstractNioWorker.run(AbstractNioWorker.java:90)
    at org.jboss.netty.channel.socket.nio.NioWorker.run(NioWorker.java:178)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source)
    at java.lang.Thread.run(Unknown Source)
4个回答

44

虽然我不了解这个问题的根本原因,但是我通过增加conf/cassandra.yaml文件中的超时值来解决了这个问题。

write_request_timeout_in_ms: 20000

我曾经遇到过同样的问题。我在Cassandra中使用BatchStatement来写入数据。我的批处理大小为10000。减小批处理大小后,我就没有遇到这个异常了。因此,也许你正在尝试在单个请求中加载太多的数据到Cassandra中。 - abi_pat
1
这实际上是一个非常糟糕的选择。你可能找出了为什么会发生这种情况,因为我现在也面临着同样的错误。 - iMajna
7
@Superbrain_bug,感谢您分享对此解决方法的看法。我相信一些人会觉得您的判断很有趣。如果您找到了另一个解决方法,请务必告诉大家。 - Jay
其中一个原因可能是Cassandra正在运行一些内存密集型的内部进程,如压缩、修复等,并且您没有足够的内存在2秒内完成写操作 - 在开发过程中,这种情况经常发生。它可以正常工作10-15分钟,然后出现错误,所以我不得不重新启动它。非常烦人。 - walv

31

我们在连接了SAN存储的ESX集群中的单个节点上遇到了类似的问题(这是datastax不推荐的, 但目前我们没有其他选择)。

注意: 下面的设置可能会对Cassandra能够达到的最大性能造成重大影响,但我们选择了稳定的系统而非高性能。

在运行iostat -xmt 1时,我们发现同时出现了高的w_await时间和WriteTimeoutExceptions。结果证明,在默认的write_request_timeout_in_ms: 2000设置下,memtable无法写入磁盘。

我们将memtable的大小从512Mb(默认为堆空间的25%,在我们的情况下为2Gb)显着减小到32Mb:

# Total permitted memory to use for memtables. Cassandra will stop
# accepting writes when the limit is exceeded until a flush completes,
# and will trigger a flush based on memtable_cleanup_threshold
# If omitted, Cassandra will set both to 1/4 the size of the heap.
# memtable_heap_space_in_mb: 2048
memtable_offheap_space_in_mb: 32

我们还稍微增加了写入超时时间,将其设为3秒:

write_request_timeout_in_ms: 3000

如果您遇到高IO等待时间,请确保定期写入磁盘:

#commitlog_sync: batch
#commitlog_sync_batch_window_in_ms: 2
#
# the other option is "periodic" where writes may be acked immediately
# and the CommitLog is simply synced every commitlog_sync_period_in_ms
# milliseconds.
commitlog_sync: periodic
commitlog_sync_period_in_ms: 10000

这些设置使得内存表保持较小并经常被写入。所有的异常情况都已得到解决,我们成功通过了系统压力测试。


2

这是协调器(即服务器)在等待写入确认时超时了。


嗨,Chris,我该如何进一步调试以找出ACK未到达的原因?我遇到了类似的问题,正在尝试找到根本原因...谢谢。 - opstalj

1

对于Cassandra,检查您的GC设置可能会很有价值。

在我的情况下,我正在使用一个信号量来限制异步写入,但仍然(有时)会出现超时。

事实证明,我使用的GC设置不适合,我一直使用cassandra-unit作为方便之选,这导致它以默认的VM设置运行。因此,我们最终会触发停止世界GC,导致写入超时。将与正在运行的cassandra docker映像相同的GC设置应用于该问题,一切都很好。

这可能是一个不常见的原因,但它对我有所帮助,因此似乎值得在这里记录。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接