GitHub Actions中Gradle的Kafka集成测试无法运行

5
我们公司一直在将应用程序从CircleCI迁移到GitHub Actions,但现在遇到了一个奇怪的情况。
项目代码没有任何更改,但我们的Kafka集成测试在GH Actions机器上开始失败。在CircleCI和本地(MacOS和Fedora Linux机器)中一切正常运行。
CircleCI和GH Actions机器都在Ubuntu上运行(测试版本为18.04和20.04)。由于MacOS中没有Docker,因此没有在GH Actions中测试它。
这里是构建和集成测试使用的docker-compose和workflow文件:
- docker-compose.yml
version: '2.1'

services:
  postgres:
    container_name: listings-postgres
    image: postgres:10-alpine
    mem_limit: 500m
    networks:
      - listings-stack
    ports:
      - "5432:5432"
    environment:
      POSTGRES_DB: listings
      POSTGRES_PASSWORD: listings
      POSTGRES_USER: listings
      PGUSER: listings
    healthcheck:
      test: ["CMD", "pg_isready"]
      interval: 1s
      timeout: 3s
      retries: 30

  listings-zookeeper:
    container_name: listings-zookeeper
    image: confluentinc/cp-zookeeper:6.2.0
    environment:
      ZOOKEEPER_CLIENT_PORT: 2181
      ZOOKEEPER_TICK_TIME: 2000
    networks:
      - listings-stack
    ports:
      - "2181:2181"
    healthcheck:
      test: nc -z localhost 2181 || exit -1
      interval: 10s
      timeout: 5s
      retries: 10

  listings-kafka:
    container_name: listings-kafka
    image: confluentinc/cp-kafka:6.2.0
    depends_on:
      listings-zookeeper:
        condition: service_healthy
    environment:
      KAFKA_BROKER_ID: 1
      KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://listings-kafka:9092,PLAINTEXT_HOST://localhost:29092
      KAFKA_LISTENER_SECURITY_PROTOCOL_MAP: PLAINTEXT:PLAINTEXT,PLAINTEXT_HOST:PLAINTEXT
      KAFKA_INTER_BROKER_LISTENER_NAME: PLAINTEXT
      KAFKA_OFFSETS_TOPIC_REPLICATION_FACTOR: 1
      KAFKA_ZOOKEEPER_CONNECT: listings-zookeeper:2181
    volumes:
      - /var/run/docker.sock:/var/run/docker.sock
    networks:
      - listings-stack
    ports:
      - "29092:29092"
    healthcheck:
      test: kafka-topics --bootstrap-server 127.0.0.1:9092 --list
      interval: 10s
      timeout: 10s
      retries: 50

networks: {listings-stack: {}}
  • build.yml
name: Build

on: [ pull_request ]

env:
  AWS_ACCESS_KEY_ID: ${{ secrets.TUNNEL_AWS_ACCESS_KEY_ID }}
  AWS_SECRET_ACCESS_KEY: ${{ secrets.TUNNEL_AWS_SECRET_ACCESS_KEY }}
  AWS_DEFAULT_REGION: 'us-east-1'
  CIRCLECI_KEY_TUNNEL: ${{ secrets.ID_RSA_CIRCLECI_TUNNEL }}

jobs:
  build:
    name: Listings-API Build
    runs-on: [ self-hosted, zap ]

    steps:
      - uses: actions/checkout@v2
        with:
          token: ${{ secrets.GH_OLXBR_PAT }}
          submodules: recursive
          path: ./repo
          fetch-depth: 0

      - name: Set up JDK 11
        uses: actions/setup-java@v2
        with:
          distribution: 'adopt'
          java-version: '11'
          architecture: x64
          cache: 'gradle'

      - name: Docker up
        working-directory: ./repo
        run: docker-compose up -d

      - name: Build with Gradle
        working-directory: ./repo
        run: ./gradlew build -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2 -x integrationTest

      - name: Integration tests with Gradle
        working-directory: ./repo
        run: ./gradlew integrationTest -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2

      - name: Sonarqube
        working-directory: ./repo
        env:
          GITHUB_TOKEN: ${{ secrets.GH_OLXBR_PAT }}
          SONAR_TOKEN: ${{ secrets.SONAR_TOKEN }}
        run: ./gradlew sonarqube --info -Dhttps.protocols=TLSv1,TLSv1.1,TLSv1.2

      - name: Docker down
        if: always()
        working-directory: ./repo
        run: docker-compose down --remove-orphans

      - name: Cleanup Gradle Cache
        # Remove some files from the Gradle cache, so they aren't cached by GitHub Actions.
        # Restoring these files from a GitHub Actions cache might cause problems for future builds.
        run: |
          rm -f ${{ env.HOME }}/.gradle/caches/modules-2/modules-2.lock
          rm -f ${{ env.HOME }}/.gradle/caches/modules-2/gc.properties

使用Spock框架编写集成测试,发生错误的部分是这些:

  boolean compareRecordSend(String topicName, int expected) {
    def condition = new PollingConditions()
    condition.within(kafkaProperties.listener.pollTimeout.getSeconds() * 5) {
      assert expected == getRecordSendTotal(topicName)
    }
    return true
  }

  int getRecordSendTotal(String topicName) {
    kafkaTemplate.flush()
    return kafkaTemplate.metrics().find {
      it.key.name() == "record-send-total" && it.key.tags().get("topic") == topicName
    }?.value?.metricValue() ?: 0
  }

我们遇到的错误是:
Condition not satisfied after 50.00 seconds and 496 attempts
    at spock.util.concurrent.PollingConditions.within(PollingConditions.java:185)
    at com.company.listings.KafkaAwareBaseSpec.compareRecordSend(KafkaAwareBaseSpec.groovy:31)
    at com.company.listings.application.worker.listener.notifier.ListingNotifierITSpec.should notify listings(ListingNotifierITSpec.groovy:44)

    Caused by:
    Condition not satisfied:

    expected == getRecordSendTotal(topicName)
    |        |  |                  |
    10       |  0                  v4
                false

我们已经对GH Actions机器进行了调试(通过SSH),并手动运行了一些东西。错误仍然会发生,但如果再次运行集成测试(以及后续运行),一切都可以完美地运行。
我们还尝试初始化所有必要的主题并预先发送一些消息到它们,但行为是相同的。
我们有以下问题:
1. 在Ubuntu机器上运行Kafka docker化时是否存在任何问题(错误也发生在同事的Ubuntu机器上)? 2. 关于为什么会发生这种情况,您有任何想法吗?
编辑
application.yml(与Kafka相关的配置)
spring:
  kafka:
    bootstrap-servers: localhost:29092
    producer:
      batch-size: 262144
      buffer-memory: 536870912
      retries: 1
      key-serializer: org.apache.kafka.common.serialization.StringSerializer
      value-serializer: org.apache.kafka.common.serialization.ByteArraySerializer
      acks: all
      properties:
        linger.ms: 0

1
@OneCricketeer 我已经将Kafka的广告侦听器配置更改为KAFKA_ADVERTISED_LISTENERS: PLAINTEXT://listings-kafka:9092,PLAINTEXT_HOST://localhost:29092,但我仍然得到相同的行为。我尝试在这个特定的测试中使用Testcontainers,但这个应用程序有点老旧,从docker-compose到Testcontainers的全部更改对于现在来说太大了。 - BParolini
@OneCricketeer,我已经编辑了对docker-compose.yml所做的更改,并将Kafka相关配置添加到帖子中。 - BParolini
缺失的 bootstrap-servers 是一个复制/粘贴错误。我已经修复了。 测试正在主机上运行,这就是为什么配置使用 localhost 作为引导服务器的原因。 - BParolini
你是否恰好还有连接到其他本地容器的测试案例可以正常工作? - OneCricketeer
1
所有连接到PostgreSQL容器的测试都正常工作。 当在本地或CircleCI上运行测试时,测试可以顺利运行。但是在GH Actions中失败了。 - BParolini
显示剩余2条评论
1个回答

0

我们发现Kafka测试之间存在一些测试序列依赖关系。

我们将Gradle版本更新为7.3-rc-3,该版本具有更确定性的测试扫描方法。这个更新“解决了”我们的问题,同时我们准备修复测试之间的依赖关系。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接