AWS 蓝绿部署 CodeDeploy 到 ECS 安装生命周期事件超时。

4
正如标题所示,ecs的蓝/绿部署永远无法完成,因为install生命周期事件永远无法完成并超时。
这是显示情况的图片: enter image description here 应用规范文件:
version: 0.0 
Resources: 
  - TargetService: 
      Type: AWS::ECS::Service 
      Properties: 
        TaskDefinition: <TASK_DEFINITION> 
        LoadBalancerInfo: 
          ContainerName: "WordpressContainer" 
          ContainerPort: 80 

任务定义文件:

{ 
    "executionRoleArn": "arn:aws:iam::336636872471:role/WordpressPipelineExecutionRole", 
    "containerDefinitions": [ 
        { 
            "name": "WordpressContainer", 
            "image": "<IMAGE1_NAME>", 
            "essential": true, 
            "portMappings": [ 
                { 
                    "hostPort": 80, 
                    "protocol": "tcp", 
                    "containerPort": 80 
                } 
            ] 
        } 
    ], 
    "requiresCompatibilities": [ 
        "FARGATE" 
    ], 
    "networkMode": "awsvpc", 
    "cpu": "256", 
    "memory": "512", 
    "family": "wordpress" 
} 

我正在将一个最基本的WordPress Docker镜像推送到ECR,它会触发一个流水线但卡在了CodeDeploy

有什么想法是怎么回事吗? 我该如何调试?

P.S. 它在60分钟后超时并显示以下信息:

等待替换任务集变为健康状态时,部署超时。超时时间为60分钟。

2个回答

0

我遇到了类似的问题。我查看了云日志,但没有发现明显的问题。我基于https://pypi.org/project/cloudcomponents.cdk-blue-green-container-deployment/构建了我的架构。

以下是结果图像的链接

代码部署失败

生命周期事件

目标组1

目标组2

appspec.yaml

version: 0.0
Resources:
  - TargetService:
      Type: AWS::ECS::Service
      Properties:
        TaskDefinition: <TASK_DEFINITION>
        LoadBalancerInfo:
          ContainerName: "sample-website"
          ContainerPort: 80

buildspec.yml

version: 0.2

phases:
  pre_build:
    commands:
      - echo Logging in to Amazon ECR...
      - aws --version
      - aws ecr get-login-password | docker login --username AWS --password-stdin $REPOSITORY_URI
      - COMMIT_HASH=$(echo $CODEBUILD_RESOLVED_SOURCE_VERSION | cut -c 1-7)
      - IMAGE_TAG=${COMMIT_HASH:=latest}
      - echo Setting environment variables...
      - echo $EXECUTION_ROLE_ARN
      - echo $FAMILY
      - sed -i "s|SED_REPLACE_EXECUTION_ROLE_ARN|$EXECUTION_ROLE_ARN|g" taskdef.json
      - sed -i "s|SED_REPLACE_FAMILY|$FAMILY|g" taskdef.json
      - cat taskdef.json
  build:
    commands:
      - echo Docker build and tagging started on `date`
      - docker build -t $REPOSITORY_URI:latest -t $REPOSITORY_URI:$IMAGE_TAG -f Dockerfile .
      - echo Docker build and tagging completed on `date`
  post_build:
    commands:
- echo Pushing the Docker images to container registry...
      - docker push $REPOSITORY_URI:latest
      - docker push $REPOSITORY_URI:$IMAGE_TAG
      - echo Writing image definitions file...
      - printf '{"ImageURI":"%s"}' $REPOSITORY_URI:$IMAGE_TAG > imageDetail.json
      - echo Build completed on `date`
artifacts:
  files:
    - "appspec.yaml"
    - "taskdef.json"
  secondary-artifacts:
    ManifestArtifact:
      files:
        - appspec.yaml
        - taskdef.json
    ImageArtifact:
      files:
        - imageDetail.json

taskdef.json

{
  "executionRoleArn": "SED_REPLACE_EXECUTION_ROLE_ARN",
  "containerDefinitions": [
    {
      "name": "sample-website",
      "image": "<IMAGE1_NAME>",
      "essential": true,
      "portMappings": [
        {
          "hostPort": 80,
          "protocol": "tcp",
          "containerPort": 80
        }
      ]
    }
  ],
  "requiresCompatibilities": ["FARGATE"],
  "networkMode": "awsvpc",
  "cpu": "256",
  "memory": "512",
  "family": "SED_REPLACE_FAMILY"
}

from aws_cdk import (
    aws_ec2 as ec2,
    Stack,
    aws_ecs as ecs,
    aws_ecs_patterns as ecs_patterns,
    aws_codecommit as codecommit,
    aws_ecr as ecr,
    aws_codepipeline as codepipeline,
    aws_codepipeline_actions as pipeline_actions,
    aws_codebuild as codebuild,
    aws_codedeploy as codedeploy,
    aws_elasticloadbalancingv2 as ecb,
    Duration
)

from cloudcomponents.cdk_blue_green_container_deployment import (
    EcsDeploymentGroup,
    EcsService,
    DummyTaskDefinition
)
from os import path
from constructs import Construct

class Ab3ArchitectureStack(Stack):

    def __init__(self, scope: Construct, construct_id: str, **kwargs) -> None:
        super().__init__(scope, construct_id, **kwargs)

        source_artifact = codepipeline.Artifact(artifact_name='SourceArtifact')
        image_artifact = codepipeline.Artifact(artifact_name='ImageArtifact')
        manifest_artifact = codepipeline.Artifact(artifact_name='ManifestArtifact')

        ab3_code_repo = codecommit.Repository(self, 'AB3_Code_Repository',
            repository_name="AB3_NginxApp", #this name is what's seen in CodeCommit
            #code=codecommit.Code.from_directory(path.abspath( "/home/ec2-user/environment/AB3_NginxApp"))
        )

        ab3_ecr_repo = ecr.Repository(self, 'AB3_ECR_Repository',
            image_scan_on_push=True
        )

        #Ran into a build error so created a policy(AB3ElasticContainerRegistry) and attached it to Ab3ArchitectureStack-AB3CodeBuildRole
        #Fixed token issue, but then neded to attach policy to Ab3ArchitectureStack-AB3PipelineBuildAB3BuildCode
        ab3_build = codebuild.Project(self, 'AB3_Code_Build',
            source=codebuild.Source.code_commit(
                repository=ab3_code_repo
            ),
            environment=codebuild.BuildEnvironment(
                build_image=codebuild.LinuxBuildImage.STANDARD_5_0,
                compute_type=codebuild.ComputeType.SMALL,
                privileged=True,
                environment_variables={
                    'REPOSITORY_URI' : codebuild.BuildEnvironmentVariable(value=ab3_ecr_repo.repository_uri),
                    'FAMILY' : codebuild.BuildEnvironmentVariable(value='AB3-blue-green-family'),
                    #The below is a temporary fix and won't work in other enivronments
                    'EXECUTION_ROLE_ARN' : codebuild.BuildEnvironmentVariable(value='arn:aws:iam::460994089204:role/Admin')
                }
            )
        )

        ab3_vpc = ec2.Vpc(self, 'AB3Vpc', max_azs=2)     # default is all AZs in region

        ab3_cluster = ecs.Cluster(self, 'AB3Cluster', vpc=ab3_vpc)

        ab3_load_balancer = ecb.ApplicationLoadBalancer(self, 'AB3LoadBalancer',
            vpc=ab3_vpc,
            internet_facing=True
        )

        ab3_prod_listener = ab3_load_balancer.add_listener('ProductionListener',
            port=80
        )

        ab3_test_listener = ab3_load_balancer.add_listener('TestListener',
            port=8080
        )

        ab3_prod_tgt_group = ecb.ApplicationTargetGroup(self, 'ProdTargetGoup',
            port=80,
            target_type= ecb.TargetType.IP,
            vpc=ab3_vpc
        )

        ab3_prod_listener.add_target_groups('AddProdTgtGroup',
            target_groups=[ab3_prod_tgt_group]
        )

        ab3_test_tgt_group = ecb.ApplicationTargetGroup(self, 'TestTargetGoup',
            port=8080,
            target_type= ecb.TargetType.IP,
            vpc=ab3_vpc
        )

        ab3_test_listener.add_target_groups('AddTestTgtGroup',
            target_groups=[ab3_test_tgt_group]
        )

        ab3_ecs_service = EcsService(self, "AB3ECSService",
            cluster=ab3_cluster,
            service_name='AB3-Service',
            desired_count=2,
            task_definition= DummyTaskDefinition(self, 'DummyTaskDef', #should be replaced by CodeDeploy in CodePipeline
                image='nginx',
                family='AB3-blue-green-family'
            ),
            test_target_group=ab3_test_tgt_group,
            prod_target_group=ab3_prod_tgt_group
        )

        ab3_ecs_service.connections.allow_from(ab3_load_balancer, ec2.Port.tcp(80))
        ab3_ecs_service.connections.allow_from(ab3_load_balancer, ec2.Port.tcp(8080))

        #ab3_deployment_group = codedeploy.IEcsDeploymentGroup() TypeError Protocols can't be instantiated
        ab3_deployment_group = EcsDeploymentGroup(self, "AB3DeployGroup",
            application_name='AB3-Application',
            deployment_group_name='AB3-Deployment-Group',
            ecs_services=[ab3_ecs_service],
            target_groups=[ab3_prod_tgt_group,ab3_test_tgt_group],
            prod_traffic_listener= ab3_prod_listener,
            test_traffic_listener=ab3_test_listener,
            termination_wait_time=Duration.minutes(15)
        )

        pipeline = codepipeline.Pipeline(self, "AB3Pipeline",
        stages=[
            codepipeline.StageProps(
                stage_name="Source",
                actions=[
                    pipeline_actions.CodeCommitSourceAction(
                        repository=ab3_code_repo,
                        branch='main',
                        output=source_artifact,
                        action_name='AB3Source'
                    )
                ] 
            ),
            codepipeline.StageProps(
                stage_name="Build",
                actions=[
                    pipeline_actions.CodeBuildAction(
                        input=source_artifact,
                        project=ab3_build,
                        action_name='AB3Build',
                        outputs=[image_artifact, manifest_artifact] #second output needs to be specified in buildspec
                        #outputs=[image_artifact]
                    )
                ]
            ),
            codepipeline.StageProps(
                stage_name="Deploy",
                actions=[
                    pipeline_actions.CodeDeployEcsDeployAction(
                        action_name="AB3Deploy",
                        deployment_group=ab3_deployment_group,
                        #app_spec_template_input=image_artifact,
                        #task_definition_template_input=image_artifact
                        app_spec_template_input=manifest_artifact,
                        task_definition_template_input=manifest_artifact,
                        container_image_inputs=[pipeline_actions.CodeDeployEcsContainerImageInput(
                            input=image_artifact,
                            task_definition_placeholder='IMAGE1_NAME'
                        )]
                    )
                ]
            )
        ]
        ) 

        '''
        ab3_ecs_application = codedeploy.EcsApplication(self, "AB3_Application",
            application_name="AB3_NGINX_Application"
        )

        ab3

        ab3_task_definition = ecs.FargateTaskDefinition(self, 'AB3_Task_Definition',
            cpu=256,
            memory_limit_mib=1024
            #task_role= ???,
            #execution_role= ????
        )

        ab3_task_definition.add_container('AB3_Container', 
            image= ecs.ContainerImage.from_ecr_repository(ab3_ecr_repo)

        )

        # Provide a Stage when creating a pipeline

        ecs_patterns.ApplicationLoadBalancedFargateService(self, "AB3Service",
            cluster=ab3_cluster,            # Required
            cpu=512,                    # Default is 256
            desired_count=6,            # Default is 1
            task_image_options=ecs_patterns.ApplicationLoadBalancedTaskImageOptions(
                image=ecs.ContainerImage.from_registry("amazon/amazon-ecs-sample")),
                #image=ecs.ContainerImage.from_ecr_repository(ab3_ecr_repo)),
            memory_limit_mib=2048,      # Default is 512
            public_load_balancer=True  # Default is True
        )
'''

这更像是一条评论而不是答案。考虑将其移动到评论中。如果您喜欢这个问题并希望它得到更多关注,可以点赞。 - Register Sole
很遗憾,看起来我缺乏声望积分以在他人的答案下发表评论。 - wsa225

0

我会检查目标组的健康状况,因为它正在等待替换任务变得更健康。您当前的ECS目标是否健康?如果不是,ALB将尝试弹回这些容器,以尝试刷新它们以通过健康检查。此外,您的CodeDeploy是否有权限部署到ECR?


可能有很多原因导致您的目标不健康。请确保您的ALB安全组允许对目标的出站流量,而您的ECS安全组允许来自ALB的入站流量。如果您还没有这样做,您可能希望启用对目标的直接访问,以便查看健康检查的响应。您能否增加应用程序的日志记录?您是否在目标实例的Cloudwatch日志中看到任何相关内容? - Chris Goldman
很好的建议,@ChrisGoldman。我也会检查集群日志或云监控日志,以查看在尝试部署时发生了什么,并查看是否有命令挂起。 - Cloud W.

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接