Netty事件执行器组破坏了管道。

Question

Netty事件执行器组破坏了管道。

javanetty

7

情况：我有一个代理应用程序，使用的是Netty 4.0.17.Final（FYI：我已经遇到了版本4.0.13.Final和4.0.9.Final的问题），并且基于来自Netty示例的代理。

我的代码与示例之间的主要区别在于，当通道激活时，我的代码不会连接到后端服务器，而是只有在第一次读取时才会连接，因为这个读取必须先对输入进行一些检查，然后才能连接并将该消息转发到后端服务器。

我已经为我的应用程序进行了单元测试和负载测试，运行时间长达数小时，并且一切运作良好。

问题：由于需要执行一些阻塞操作，因此尝试为执行此操作的处理程序使用单独的EventExecutorGroup（以便IO线程不被阻塞）:

private static final EventExecutorGroup handlersExecutor = new DefaultEventExecutorGroup(10);
...
pipeline.addLast(handlersExecutor, "authenticationHandler", new FrontendHandler(outboundAddress));

这个（＝我做出的唯一更改！）会在负载测试期间使应用程序出现故障。导致什么问题？3500个客户端连接中的XXX个告诉我，这些客户端的500条消息中有YY条没有从代理服务器那里收到回复（每个请求都应该得到一条响应）。以下是客户端日志的摘录：

“2014年2月14日00:39:56.146 [id：0x34cb2c60]错误（com.nsn.ucpsimulator.common.UcpDecoder）-空闲连接（/ 127.0.0.1:7201）。接收到PDU：13” “2014年2月14日00:39:56.146 [id：0xf0955993]错误（com.nsn.ucpsimulator.common.UcpDecoder）-空闲连接（/ 127.0.0.1:7201）。接收到PDU：13” “2014年2月14日00:39:56.147 [id：0x9a911fa3]错误（com.nsn.ucpsimulator.common.UcpDecoder）-空闲连接（/ 127.0.0.1:7201）。接收到PDU：13” “2014年2月14日00:39:56.149 [id：0x811bbadf]错误（com.nsn.ucpsimulator.common.UcpDecoder）-空闲连接（/ 127.0.0.1:7201）。接收到PDU：13” “2014年2月14日00:39:56.150 [id：0x0c4d4c5a]错误（com.nsn.ucpsimulator.common.UcpDecoder）-空闲连接（/ 127.0.0.1:7201）。接收到PDU：13”

代理应用程序告诉我已经接收并转发了500条消息，但只收到了13条回复并转发回客户端。

2014-02-14 00:39:57.683 [id: 0x39af563b]错误(be.demmel.fun.UcpDecoder) - 空闲连接(/127.0.0.1:49359)，接收到PDUs: 500 2014-02-14 00:39:57.683 [id: 0x82056d39]错误(be.demmel.fun.FrontendHandler) - 空闲连接(/127.0.0.1:52004)，即将关闭，转发的PDUs: 500，成功数: 500 2014-02-14 00:40:00.717 [id: 0xcdca8f66]错误(be.demmel.fun.UcpDecoder) - 空闲连接(/127.0.0.1:7900)，接收到PDUs: 13 2014-02-14 00:40:00.718 [id: 0xcdca8f66]错误(be.demmel.fun.BackendHandler) - 空闲连接(/127.0.0.1:7900)，转发的PDUs: 13，成功数: 13

服务器告诉我一切正常： 2014-02-14 00:40:02.855 [id: 0x4980be2c]错误(com.nsn.ucpsimulator.common.UcpDecoder) - 空闲连接(/127.0.0.1:37944)，接收到PDUs: 500 2014-02-14 00:40:02.856 [id: 0x4980be2c]错误(com.nsn.ucpsimulator.server.TestUcpHandler) - 空闲连接(/127.0.0.1:37944)，返回的PDUs: 500

有人知道这是什么原因吗？附加信息：注意，使用单独的EventExecutorGroup作为阻塞处理程序时，一切正常。每次XX个客户端阻塞时，它们都在转发到客户端的相同数量的答复处阻塞。我已上传了Netty代码(其中包括代理、服务器和客户端应用程序以及README): https://github.com/AndrewBourgeois/ucp-proxy/tree/master/src/main/java/be/demmel/fun 当代理应用程序被杀死时，服务器端出现此错误：

java.io.IOException: Connection reset by peer
    at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[na:1.7.0_45]
    at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[na:1.7.0_45]
    at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[na:1.7.0_45]
    at sun.nio.ch.IOUtil.read(IOUtil.java:192) ~[na:1.7.0_45]
    at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[na:1.7.0_45]
    at io.netty.buffer.UnpooledUnsafeDirectByteBuf.setBytes(UnpooledUnsafeDirectByteBuf.java:401) ~[netty-all-4.0.9.Final.jar:na]
    at io.netty.buffer.AbstractByteBuf.writeBytes(AbstractByteBuf.java:869) ~[netty-all-4.0.9.Final.jar:na]
    at io.netty.channel.socket.nio.NioSocketChannel.doReadBytes(NioSocketChannel.java:208) ~[netty-all-4.0.9.Final.jar:na]
    at io.netty.channel.nio.AbstractNioByteChannel$NioByteUnsafe.read(AbstractNioByteChannel.java:87) ~[netty-all-4.0.9.Final.jar:na]
    at io.netty.channel.nio.NioEventLoop.processSelectedKey(NioEventLoop.java:478) ~[netty-all-4.0.9.Final.jar:na]
    at io.netty.channel.nio.NioEventLoop.processSelectedKeysOptimized(NioEventLoop.java:447) ~[netty-all-4.0.9.Final.jar:na]
    at io.netty.channel.nio.NioEventLoop.run(NioEventLoop.java:341) ~[netty-all-4.0.9.Final.jar:na]
    at io.netty.util.concurrent.SingleThreadEventExecutor$2.run(SingleThreadEventExecutor.java:101) [netty-all-4.0.9.Final.jar:na]
    at java.lang.Thread.run(Thread.java:744) [na:1.7.0_45]

我认为这个错误提示表明我的Netty处理程序没有处理服务器的回复。

- AndrewBourgeois

你解决过这个问题吗？我也遇到了类似的问题。 - Programmer9000

2个回答

0

我认为你的问题在于每次添加处理程序时都使用create a new DefaultEventExecutorGroup(10)。你应该只创建一次并将实例传递进去。

- Norman Maurer

不好意思，我在简化这篇文章的代码时犯了一个错误，只是这样而已。我已经在我的问题中修复了它。Github上的测试项目有正确的代码。如果我给你客户端和服务器端的代码（你只需要mvn exec:java），你能花2分钟来重现这个问题吗？ - AndrewBourgeois

当然可以…请给我链接，我会进行检查。 - Norman Maurer

https://github.com/AndrewBourgeois/ucp-proxy。我添加了一个小的README。附注：此代码也经常复制 https://github.com/netty/netty/issues/2086（已修复但尚未发布）;) - AndrewBourgeois

我在Github项目中删除了“UCP”代码，以进一步简化代码。如果您能够重现它，请告诉我（我已经检查过，没有“UCP”代码仍然可以重现）。谢谢！ - AndrewBourgeois

@AndrewBourgeois 我在运行测试时进行了tcpdump，你的示例程序在短时间内生成了大量数据包（3745442）。我发现不时出现了许多TCP数据包丢失/重传的情况（在wireshark中使用tcp.analysis.*过滤器）。这是否可能是“在添加了单独的执行器后，远程写入速度过快，导致一些客户端丢失了数据包”？ - Jestan Nirojan

显示剩余3条评论

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Derek Troy-West · Accepted Answer

浏览了一下你的 GitHub 项目，你的执行方式有点像：

--> serve request
  --> authenticate (blocking db call)
    --> forward request
    <-- receive response
<-- serve response

如果没有单独的EventExecutorGroup，所有的执行都在NioEventLoopGroup内部运行，而该组仅用于非阻塞操作。每个服务请求都会解码，然后立即在DB调用上阻塞，因此您的服务器实际上受到NioEventLoopGroup线程数的限制。

您已经在ChannelHandler周围添加了一个DefaultEventExecutorGroup来进行身份验证，因此现在服务请求和身份验证部分解耦，因为每个请求将被解码，然后执行将传递给DEEG，使NioEventLoopGroup可以解码更多请求。

除了连接到DB的引导程序配置为使用与初始通道相同的NioEventLoopGroup：

b.group(inboundChannel.eventLoop())

这意味着您仍在使用阻塞的DB连接来阻塞主Netty工作线程。

我不确定在那之后会发生什么，但也许您正在处理一堆请求（实际上将它们全部排队等待DEEG可用），然后因为它们都在等待阻塞的DB调用（由于与服务器解码内容竞争而导致其执行能力被饥饿），所以超时了。

比如说（假设您有足够的并发客户端）：

[原始的2个线程NioEventLoopGroup，没有EventExecutorGroup]

nio-thread-1: serve-request 1 and authenticate (block)
nio-thread-2: serve-request 2 and authenticate (block)

(db calls completes)

nio-thread-1: forward-request 1 (non-blocking)
nio-thread-2: forward-request 2 (non-blocking)

nio-thread-1: serve-request 3 and authenticate (block)
nio-thread-2: serve-request 4 and authenticate (block)

(db calls complete)

nio-thread-1: forward-request 3 (non-blocking)
nio-thread-2: forward-request 4 (non-blocking)

nio-thread-1: either serve-response 1/2 or serve-request 5 (and block)
nio-thread-2: either serve-response 1/2 or serve-request 6 (and block)

虽然不太美观，但假设服务器请求和服务器响应同等紧急处理，您只能同时处理大约n*2个请求。

[2线程NioEventLoopGroup，2线程DefaultEventExecutorGroup]

nio-thread-1: serve-request 1 and pass to DEEG
nio-thread-2: serve-request 2 and pass to DEEG
nio-thread-1: serve-request 3 and pass to DEEG
nio-thread-2: serve-request 4 and pass to DEEG
nio-thread-1: serve-request 5 and pass to DEEG
nio-thread-2: serve-request 6 and pass to DEEG
nio-thread-1: serve-request 7 and pass to DEEG
nio-thread-2: serve-request 8 and pass to DEEG

def-evt-eg-1: try to authenticate, pass execution back to nio-thread-x
def-evt-eg-2: try to authenticate, pass execution back to nio-thread-x

nio-thread-1: serve-request 9 and pass to DEEG
nio-thread-2: serve-request 10 and pass to DEEG
nio-thread-1: serve-request 11 and pass to DEEG
nio-thread-2: serve-request 12 and pass to DEEG
nio-thread-1: authenticate against DB (block)
nio-thread-2: serve-request 12 and pass to DEEG
nio-thread-2: serve-request 13 and pass to DEEG
nio-thread-2: serve-request 14 and pass to DEEG
nio-thread-2: serve-request 15 and pass to DEEG
nio-thread-2: authenticate against DB (block)

现在您可以处理更多的请求，但您发出DB调用的速率和服务器总延迟将取决于您拥有的并发客户端数量、DEEG线程数与NioEventLoop线程数之比、上下文切换等因素。

通过在运行应用程序时打印一些基本的线程诊断信息，您可能可以对此进行可视化。我可能完全错误，因为我没有机会运行它并亲自查看，这只是我的猜测。