ClickHouse。在集群上创建数据库超时结束

3

我有一个包含两个Clickhouse节点的集群。这两个实例都在Docker容器中。主机之间的所有通信都成功检查 - ping、telnet、wget都正常工作。在Zookeeper中,我可以在ddl分支下看到我的查询。

每次执行“在集群上创建数据库”语句时,都会超时。问题出在哪里?有人有什么想法吗?

以下是配置文件的片段。

Ver 20.10.3.30

<remote_servers>
        <history_cluster>
            <shard>
                <replica>
                    <host>10.3.194.104</host>
                    <port>9000</port>
                </replica>
                <replica>
                    <host>10.3.194.105</host>
                    <port>9000</port>
                </replica>
            </shard>
        </history_cluster>
  </remote_servers>
  <zookeeper>
                <node index="1">
                        <host>10.3.194.106</host>
                        <port>2181</port>
                </node>
  </zookeeper>

"宏"部分。
    <macros incl="macros" optional="true" />

日志片段
2020.11.20 22:38:44.104001 [ 90 ] {68062325-a6cf-4ac3-a355-c2159c66ae8b} <Error> executeQuery: Code: 159, e.displayText() = DB::Exception: Watching task /clickhouse/task_queue/ddl/query-0000000013 is executing longer than distributed_ddl_task_timeout (=180) seconds. There are 2 unfinished hosts (0 of them are currently active), they are going to execute the query in background (version 20.10.3.30 (official build)) (from 172.17.0.1:51272) (in query: create database event_history on cluster history_cluster;), Stack trace (when copying this message, always include the lines below):

0. DB::Exception::Exception<std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, long&, unsigned long&, unsigned long&>(int, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> > const&, std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char> >&, long&, unsigned long&, unsigned long&) @ 0xd8dcc75 in /usr/bin/clickhouse
1. DB::DDLQueryStatusInputStream::readImpl() @ 0xd8dc84d in /usr/bin/clickhouse
2. DB::IBlockInputStream::read() @ 0xd71b1a5 in /usr/bin/clickhouse
3. DB::AsynchronousBlockInputStream::calculate() @ 0xd71761d in /usr/bin/clickhouse
4. ? @ 0xd717db8 in /usr/bin/clickhouse
5. ThreadPoolImpl<ThreadFromGlobalPool>::worker(std::__1::__list_iterator<ThreadFromGlobalPool, void*>) @ 0x7b8c17d in /usr/bin/clickhouse
6. std::__1::__function::__func<ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'(), std::__1::allocator<ThreadFromGlobalPool::ThreadFromGlobalPool<void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()>(void&&, void ThreadPoolImpl<ThreadFromGlobalPool>::scheduleImpl<void>(std::__1::function<void ()>, int, std::__1::optional<unsigned long>)::'lambda1'()&&...)::'lambda'()>, void ()>::operator()() @ 0x7b8e67a in /usr/bin/clickhouse
7. ThreadPoolImpl<std::__1::thread>::worker(std::__1::__list_iterator<std::__1::thread, void*>) @ 0x7b8963d in /usr/bin/clickhouse
8. ? @ 0x7b8d153 in /usr/bin/clickhouse
9. start_thread @ 0x9609 in /usr/lib/x86_64-linux-gnu/libpthread-2.31.so
10. clone @ 0x122293 in /usr/lib/x86_64-linux-gnu/libc-2.31.so


你能提供两个节点的xml-config中部分吗? - vladimir
请在每个节点上检查日志: /var/log/clickhouse-server/clickhouse-server.log - vladimir
1个回答

5
最有可能的问题是节点的docker内部IP地址/主机名。
发起节点(执行“on cluster”的节点)向ZK放置任务,对于10.3.194.104和10.3.194.105。所有节点都会不断检查任务队列并获取任务。如果它们的IP地址/主机名是127.0.0.1 / localhost,则它们永远找不到自己的任务。因为10.3.194.104!= 127.0.0.1。

谢谢,你的猜测完全正确。我已经使用“--network host”选项重新启动了容器,并成功执行了该语句。 - CaptainVoronin
@CaptainVoronin,您能详细说明一下您是如何做到的吗?因为如果您说在Docker中添加network_mode与host一起使用,而不带端口是不可能的。 - Shubham

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接