Kinesis Firehose - S3扩展目标配置是什么?

3

问题

S3扩展目标配置是什么,AWS文档中哪里清楚地解释了它的作用?

顾名思义,它必须与S3目标有关。然而,在AWS文档的S3目标部分中没有提到它。

如果有清晰解释的文章或博客,请提供指针。

我一直在以下文档中寻找线索,但通常情况下,AWS文档并不清楚。它看起来部分与输入记录转换或记录处理有关。

resource "aws_kinesis_firehose_delivery_stream" "extended_s3_stream" {
  name        = "terraform-kinesis-firehose-extended-s3-test-stream"
  destination = "extended_s3"

  extended_s3_configuration {
    role_arn   = "${aws_iam_role.firehose_role.arn}"
    bucket_arn = "${aws_s3_bucket.bucket.arn}"

    processing_configuration {
      enabled = "true"

      processors {
        type = "Lambda"

        parameters {
          parameter_name  = "LambdaArn"
          parameter_value = "${aws_lambda_function.lambda_processor.arn}:$LATEST"
        }
      }
    }
  }
}
4个回答

3

2
我担心 Kinesis Firehose 文档写得太糟糕了,我不知道人们怎么从文档中弄清楚如何使用 Firehose。
最初看起来,Firehose 只是将数据中继到 S3 存储桶中,并没有内置的转换机制,S3 目标配置中也没有处理配置,就像 AWS::KinesisFirehose::DeliveryStream S3DestinationConfiguration 中一样。
然后,就像 Amazon Kinesis Firehose Data Transformation with AWS Lambda 中介绍的那样,一个记录转换机制似乎在2017年初被引入,因此添加了 AWS::KinesisFirehose::DeliveryStream ExtendedS3DestinationConfiguration
显然,人们很难找到如何配置的方法:

嗯,我花了很多时间和文档搜寻才弄清楚。

有谁能仅通过阅读AWS文档弄清楚它呢?

Firehose扩展S3配置用于Lambda转换

从AWS文档中无法弄清楚,但是在互联网上实际实现后看起来需要的配置如下。

enter image description here


更新

根据Kevin Eid的建议。

s3_configuration - (可选)非S3目标必需。对于S3目标,请改用extended_s3_configuration。

The extended_s3_configuration object supports the same fields from s3_configuration as well as the following:

    data_format_conversion_configuration - (Optional) Nested argument for the serializer, deserializer, and schema for converting data from the JSON format to the Parquet or ORC format before writing it to Amazon S3. More details given below.
    error_output_prefix - (Optional) Prefix added to failed records before writing them to S3. This prefix appears immediately following the bucket name.
    processing_configuration - (Optional) The data processing configuration. More details are given below.
    s3_backup_mode - (Optional) The Amazon S3 backup mode. Valid values are Disabled and Enabled. Default value is Disabled.
    s3_backup_configuration - (Optional) The configuration for backup in Amazon S3. Required if s3_backup_mode is Enabled. Supports the same fields as s3_configuration object.

s3_configuration可能仍然存在是由于兼容性或遗留原因,因此只需要使用extended_s3_configuration,但AWS文档没有很好地解释。 AWS文档不能作为真实的信息来源,这非常遗憾。

1

0
这张小截图展示了ExtendedS3DestinationConfigurationS3DestinationConfiguration相比的新组件:

enter image description here

此外,如何定义扩展的S3配置以及其含义在API documentation 中有详细说明。
{
  "RoleARN": "string",
  "BucketARN": "string",
  "Prefix": "string",
  "ErrorOutputPrefix": "string",
  "BufferingHints": {
    "SizeInMBs": integer,
    "IntervalInSeconds": integer
  },
  "CompressionFormat": "UNCOMPRESSED"|"GZIP"|"ZIP"|"Snappy",
  "EncryptionConfiguration": {
    "NoEncryptionConfig": "NoEncryption",
    "KMSEncryptionConfig": {
      "AWSKMSKeyARN": "string"
    }
  },
  "CloudWatchLoggingOptions": {
    "Enabled": true|false,
    "LogGroupName": "string",
    "LogStreamName": "string"
  },
  "ProcessingConfiguration": {
    "Enabled": true|false,
    "Processors": [
      {
        "Type": "Lambda",
        "Parameters": [
          {
            "ParameterName": "LambdaArn"|"NumberOfRetries"|"RoleArn"|"BufferSizeInMBs"|"BufferIntervalInSeconds",
            "ParameterValue": "string"
          }
          ...
        ]
      }
      ...
    ]
  },
  "S3BackupMode": "Disabled"|"Enabled",
  "S3BackupUpdate": {
    "RoleARN": "string",
    "BucketARN": "string",
    "Prefix": "string",
    "ErrorOutputPrefix": "string",
    "BufferingHints": {
      "SizeInMBs": integer,
      "IntervalInSeconds": integer
    },
    "CompressionFormat": "UNCOMPRESSED"|"GZIP"|"ZIP"|"Snappy",
    "EncryptionConfiguration": {
      "NoEncryptionConfig": "NoEncryption",
      "KMSEncryptionConfig": {
        "AWSKMSKeyARN": "string"
      }
    },
    "CloudWatchLoggingOptions": {
      "Enabled": true|false,
      "LogGroupName": "string",
      "LogStreamName": "string"
    }
  },
  "DataFormatConversionConfiguration": {
    "SchemaConfiguration": {
      "RoleARN": "string",
      "CatalogId": "string",
      "DatabaseName": "string",
      "TableName": "string",
      "Region": "string",
      "VersionId": "string"
    },
    "InputFormatConfiguration": {
      "Deserializer": {
        "OpenXJsonSerDe": {
          "ConvertDotsInJsonKeysToUnderscores": true|false,
          "CaseInsensitive": true|false,
          "ColumnToJsonKeyMappings": {"string": "string"
            ...}
        },
        "HiveJsonSerDe": {
          "TimestampFormats": ["string", ...]
        }
      }
    },
    "OutputFormatConfiguration": {
      "Serializer": {
        "ParquetSerDe": {
          "BlockSizeBytes": integer,
          "PageSizeBytes": integer,
          "Compression": "UNCOMPRESSED"|"GZIP"|"SNAPPY",
          "EnableDictionaryCompression": true|false,
          "MaxPaddingBytes": integer,
          "WriterVersion": "V1"|"V2"
        },
        "OrcSerDe": {
          "StripeSizeBytes": integer,
          "BlockSizeBytes": integer,
          "RowIndexStride": integer,
          "EnablePadding": true|false,
          "PaddingTolerance": double,
          "Compression": "NONE"|"ZLIB"|"SNAPPY",
          "BloomFilterColumns": ["string", ...],
          "BloomFilterFalsePositiveProbability": double,
          "DictionaryKeyThreshold": double,
          "FormatVersion": "V0_11"|"V0_12"
        }
      }
    },
    "Enabled": true|false
  }
}

谢谢您的跟进。不过,这是谁需要使用以及要做什么呢? - mon
@mon 给你很多选项,比如压缩、加密、s3备份桶、日志记录等。例如,你可以将所有的流数据聚合在压缩格式中,以节省 s3 存储成本。你不必使用所有这些选项,但它们都在那里。 - Marcin

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接