Hbase批量获取和SocketTimeoutException

Question

Hbase批量获取和SocketTimeoutException

3

我正在使用Java，并希望批量获取数据，类似于以下代码片段：

```

final List<Get> gets = uids.stream()
                .map(uid -> new Get(toBytes(uid)))
                .collect(Collectors.toList());

Configuration configuration = HBaseConfiguration.create();

conf.set("hbase.zookeeper.quorum", quorum);
conf.set("hbase.zookeeper.property.clientPort", properties.getString("HBASE_CONFIGURATION_ZOOKEEPER_CLIENTPORT"));
conf.set("zookeeper.znode.parent", properties.getString("HBASE_CONFIGURATION_ZOOKEEPER_ZNODE_PARENT"));

HTable table = new HTable(configuration, tableName);
return table.get(gets);

当获取列表包含1万个时，一切正常。

当我尝试在一个批处理中执行10万次获取时，会出现异常：

java.lang.RuntimeException: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 100000 actions: SocketTimeoutException: 100000 times, 
Caused by: org.apache.hadoop.hbase.client.RetriesExhaustedWithDetailsException: Failed 100000 actions: SocketTimeoutException: 100000 times, 
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.makeException(AsyncProcess.java:203) ~[hbase-query-layer-r575958b.jar:?]
        at org.apache.hadoop.hbase.client.AsyncProcess$BatchErrors.access$500(AsyncProcess.java:187) ~[hbase-query-layer-r575958b.jar:?]
        at org.apache.hadoop.hbase.client.AsyncProcess.getErrors(AsyncProcess.java:922) ~[hbase-query-layer-r575958b.jar:?]
        at org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.processBatchCallback(HConnectionManager.java:2402) ~[hbase-query-layer-r575958b.jar:?]
        at org.apache.hadoop.hbase.client.HTable.batchCallback(HTable.java:868) ~[hbase-query-layer-r575958b.jar:?]
        at org.apache.hadoop.hbase.client.HTable.batchCallback(HTable.java:883) ~[hbase-query-layer-r575958b.jar:?]
        at org.apache.hadoop.hbase.client.HTable.batch(HTable.java:858) ~[hbase-query-layer-r575958b.jar:?]
        at org.apache.hadoop.hbase.client.HTable.get(HTable.java:825) ~[hbase-query-layer-r575958b.jar:?]
        at hbase_query_layer.hbase.HbaseConnector.get(HbaseConnector.java:89) ~[hbase-query-layer-r575958b.jar:?]
        ... 15 more

出了什么问题？

此外，我在Web界面上看到针对区域服务器（存储表的位置）的请求不断增长（批处理大小为100K，在几分钟后，我看到请求计数为700K，仍在增长，但只有我的客户端向该表写入内容）。

此外，在HBase regionserver中，我在hbase-hbase-regionserver.out文件中看到：

Exception in thread "RpcServer.handler=25,port=60020" java.lang.StackOverflowError
        at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:203)
        at org.apache.hadoop.hbase.CellUtil$1.advance(CellUtil.java:203)

如何解决这个问题？

- Piotr Sobolewski

1

基本上，你的100K批量太大了。尽量保持在1K左右。此外，这样大的批处理大小不会带来任何收益。我从未听说过有人使用100K的批量处理。 - Anil Gupta

那么为什么服务器不能向我发送例如 Frame 太大/批处理太大的信息。我在超过1小时后收到 Socket 超时异常。 - Piotr Sobolewski

@AnilGupta 我从kafka下载了一些消息，批量放入hbase中。在此之前，我需要检查行是否存在。因此，我需要进行n次获取和（最多）一次放置操作。之后，我需要进行n次增量操作。 - Piotr Sobolewski

1个回答

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Piotr Sobolewski · Accepted Answer

我发现了问题：https://issues.apache.org/jira/browse/HBASE-11813 不幸的是，我的HBase版本是0.98.0.2.1.1.0-385-hadoop2，所以我需要创建类似于以下的块：

final List<List<Increment>> batchesToExecute = chopped(increments, conf.getBatchIncrementSize());


static <T> List<List<T>> chopped(List<T> list, final int L) {
    List<List<T>> parts = new ArrayList<>();
    final int N = list.size();
    for (int i = 0; i < N; i += L) {
        parts.add(new ArrayList<>(list.subList(i, Math.min(N, i + L))));
    }
    return parts;
}