在Coldfusion 9中使用cflock进行排队

3
自从我们从ColdFusion 8 Enterprise升级到ColdFusion 9 Enterprise以来,我们一直在处理与事件网关相关的问题。
我们设置了一个事件网关来建立与第三方的连接。他们每10秒至少发送更新,有时甚至每秒钟发送多次更新。
我们配置了一个Java类作为事件网关监听器,并将事件和数据推送到CFC函数。在该函数中,我们实际上使用了一个命名的,以确保请求按顺序处理,并且在此处请求将排队等待对命名锁的独占访问。此锁具有30秒的超时时间。
我还在这个函数中添加了很多调试信息,并且注意到了一些事情:
- 请求将在标记之前排队等待锁,这个队列可以超过40个事件。 - 所有请求在6秒内获取锁后都会被处理。平均处理时间实际上是1.5至2秒。
因此,问题在于,偶尔会发生锁定超时,而且超过了30秒。首先记录是否正在等待锁的请求。它看起来像这样:
"Information","Thread-23","06/23/10","15:45:18","APP1","F4B42A5A-F34D-C614-DE01B150E6906F78 (1277304318606) : PRE LOCK"

接着在日志中往下看,我看到同一请求的记录如下:

"Error","Thread-23","06/23/10","15:45:48","APP1","F4B42A5A-F34D-C614-DE01B150E6906F78 (1277304348607) : LOCK ERROR: A timeout occurred while attempting to lock lock_ResponseDispatcher."

它们之间有30秒的时间差。请求和任何与之相关的事件数据在此时丢失。这对我来说不好。

所以我想看看队列是否被足够快地处理。我不确定<cflock>如何排队事件。是否有硬性限制?

无论如何,在这个特定的运行中,我看到了:

  • 当已经有6个请求在队列中时,该请求进入队列,因此它是队列中的第7个

  • 在接下来的30秒内,约有17个请求从队列中移除

  • 大约有相同数量的请求添加到队列中

  • 在此期间,该请求未被处理,并在30秒后超时

我简直不敢相信自己的眼睛!就好像<cflock>队列不是按先进先出(FIFO)而是按后进先出(FILO)的顺序处理一样!

这种情况可能吗?有没有人见过这种行为?

提前感谢任何有想法的人。

Ciaran

2个回答

2

我认为关键在于我使用了event gateways这种异步方式。实际上,在进行实验后,我的问题原因似乎非常明显:)

我在CF管理员中指定了可以用于处理事件网关请求的线程数(请参见事件网关->设置)。在CF Dev Edition中,此设置保持为1,但在Enterprise Edition中可以增加。我将其增加到了5以进行此实验。只有当它增加时才会出现这种奇怪的行为。

因此,我的事件网关代码非常简单,它只创建一个UUID(以便我可以在日志中跟踪请求),然后锁定线程5秒钟以模拟真实处理。此sleep发生在cflock调用内部,因此一次只能处理一个线程。我们需要这样做来避免在实际代码中处理重复项。

以下是完整的CFC:

component {

    public void function onIncomingMessage (required struct msg) {

        var sys = createObject("java", "java.lang.System");
        var tag = createUUID();
        var logFile = "test\gatewaytest";

        writelog (file=logFile, text="#tag# - about to queue");

        try {

            lock name="myTestLock" timeout="120" {
                writelog (file=logFile, text="#tag# - got lock");

                thread action="sleep" duration="5000"; //ms 
            }

            writelog (file=logFile, text="#tag# - released lock");

        } catch (any e) {
            writelog (file=logFile, text="#tag# - ERROR - #e.message#");
        }
    }
}

请注意锁的超长超时时间(2分钟)。这是为了解决事件网关异步处理所带来的问题。
事件网关是一个简单的内置CFML类型,其ID为“TestGW”,我将其链接到上面提到的CFC。
我设置了一个简单的脚本来发送事件到事件网关,以下是完整代码:
<cfset msg = {mymessage = "hello gateway"} />
<cfset sendGatewayMessage("TestGW", msg) />

场景1 - 单线程:
如果事件网关处理线程数设置为1,并且我向网关发送大量请求,将会看到以下日志输出:
"Information","Thread-17","06/25/10","10:32:09",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - about to queue"
"Information","Thread-17","06/25/10","10:32:09",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - got lock"
"Information","Thread-17","06/25/10","10:32:14",,"50805BB4-1C23-9073-67A70A86CA6F8E54 - released lock"
"Information","Thread-17","06/25/10","10:32:14",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - about to queue"
"Information","Thread-17","06/25/10","10:32:14",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - got lock"
"Information","Thread-17","06/25/10","10:32:19",,"50811F1A-1C23-9073-67AD3E9C0BF2000C - released lock"
"Information","Thread-17","06/25/10","10:32:19",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - about to queue"
"Information","Thread-17","06/25/10","10:32:19",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - got lock"
"Information","Thread-17","06/25/10","10:32:24",,"5081E27F-1C23-9073-67B5D2EF6AED8426 - released lock"
"Information","Thread-17","06/25/10","10:32:24",,"5082A5E1-1C23-9073-674E9467F395686F - about to queue"
"Information","Thread-17","06/25/10","10:32:24",,"5082A5E1-1C23-9073-674E9467F395686F - got lock"
"Information","Thread-17","06/25/10","10:32:29",,"5082A5E1-1C23-9073-674E9467F395686F - released lock"

重点要注意的是,它是单线程的。它只是一个接一个地排队事件,一切都按顺序发生。
场景2 - 更多线程: 如果将事件网关处理线程数增加到5,并且我猛击网关,会看到以下日志输出:
"Information","Thread-18","06/25/10","11:26:01",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - about to queue"
"Information","Thread-18","06/25/10","11:26:01",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - got lock"
"Information","Thread-27","06/25/10","11:26:01",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - about to queue"
"Information","Thread-21","06/25/10","11:26:02",,"526CDEED-C2B3-3C92-0F57CFA317AC02F8 - about to queue"
"Information","Thread-20","06/25/10","11:26:02",,"526CEE25-F25B-890C-F7501B5489C6BB21 - about to queue"
"Information","Thread-25","06/25/10","11:26:02",,"526CFD3C-EAFD-40E7-EBA2BE59B87D5936 - about to queue"
"Information","Thread-24","06/25/10","11:26:03",,"526D0FC5-E5E2-642E-452636C8838ADE33 - about to queue"
"Information","Thread-26","06/25/10","11:26:03",,"526D1096-C82E-535B-36D57D3A431D1436 - about to queue"
"Information","Thread-23","06/25/10","11:26:03",,"526D1F9C-9A9C-FA84-E153A944123E77BE - about to queue"
"Information","Thread-19","06/25/10","11:26:04",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - about to queue"
"Information","Thread-22","06/25/10","11:26:04",,"526D3F09-073F-2B0C-E94652D1C95B09CB - about to queue"
"Information","Thread-18","06/25/10","11:26:06",,"526CC05B-C9E1-FADE-73CE3426BC0A3F92 - released lock"
"Information","Thread-22","06/25/10","11:26:06",,"526D3F09-073F-2B0C-E94652D1C95B09CB - got lock"
"Information","Thread-22","06/25/10","11:26:11",,"526D3F09-073F-2B0C-E94652D1C95B09CB - released lock"
"Information","Thread-27","06/25/10","11:26:11",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - got lock"
"Information","Thread-27","06/25/10","11:26:16",,"526CD0EB-049E-D382-2C3A7E3C0DBF8ED3 - released lock"
"Information","Thread-19","06/25/10","11:26:16",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - got lock"
"Information","Thread-19","06/25/10","11:26:21",,"526D2EDC-EA54-4D83-3F6BB681A5CCAA89 - released lock"

请特别注意具有UUID 526D3F09-073F-2B0C-E94652D1C95B09CB的请求。它是最后一个记录的请求,因此在队列的末尾。然而,一旦锁可用,它会跳过并获取锁 - 而不是先前存在的526CD0EB-049E-D382-2C3A7E3C0DBF8ED3请求。 结论: 由于当使用事件网关等待cflock时,在使用多个线程时无法保证处理线程的顺序。我们需要确保锁的超时值足够高,以便在繁忙时可以完全处理事件队列,然后再处理任何一个请求超过锁定超时时间。
我想这可能会对使用多线程事件网关的人有所帮助!
祝好,Ciaran.

1

我不确定如何处理你的FIFO vs LIFO问题,但我可以为此提供建议:

它们之间有30秒的时间差。请求和与其相关的任何事件数据在此时丢失。对我来说不好。

CFLock标签有一个名为throwOnTimeout的属性,默认值为true。如果将其设置为false,则在超时事件发生时,不会抛出异常,而是跳过锁定的代码块并正常继续处理。您可以利用这一点,例如:

<cfset made_it_through_lock = false />
<cflock name="single_threaded_lock_name" throwOnTimeout="false">
    <!--- ... do stuff ... --->
    <cfset made_it_through_lock = true />
</cflock>
<cfif made_it_through_lock eq false>
    <!--- 
          log the event data that you don't want to lose, then abort, 
          setting the necessary http status code & headers
    --->
</cfif>

我实际上是通过在锁定的代码中使用try/catch来捕获异常的。但是对于那个建议加一分。 - Ciaran Archer
是的,那是另一个选项,但抛出和捕获异常是很昂贵的。如果这是一种预期的行为,您可能会发现不使用它们可以获得更好的性能。我在想——您能否将FIFO与LIFO问题简化为一个仅执行锁定和超时的简单示例,以显示它确实是LIFO? - Adam Tuttle
这并不是预期行为。然而,我在尝试想出一种简单地演示问题的方法。我怀疑关键问题在于,这个锁所在的 CFC 是通过异步事件网关调用的。正常情况下调用该 CFC 只会导致请求被 CF 排队(并显示在 cfstat 中)。通过事件网关调用任何调用甚至都没有在 cfstat 上注册。我会进行一些实验并回到你这里。 - Ciaran Archer
我想我已经弄清楚了 - 请看下面!谢谢。 - Ciaran Archer

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接