我有一个数据框,它的格式如下所示。
从上述两行中,我想要创建一个字符串,其格式如下:
我希望将其创建为动态数据,因此如果第一列的第三个值存在,则我的字符串将具有一个以上以逗号分隔的列值。
我该如何在Scala中实现这一点。
以下是我创建数据框的方法。
添加完整模式。
+---------------------------------------------------------------------+
|value |
+---------------------------------------------------------------------+
|[WrappedArray(LineItem_organizationId, LineItem_lineItemId)] |
|[WrappedArray(OrganizationId, LineItemId, SegmentSequence_segmentId)]|
+---------------------------------------------------------------------+
从上述两行中,我想要创建一个字符串,其格式如下:
"LineItem_organizationId", "LineItem_lineItemId"
"OrganizationId", "LineItemId", "SegmentSequence_segmentId"
我希望将其创建为动态数据,因此如果第一列的第三个值存在,则我的字符串将具有一个以上以逗号分隔的列值。
我该如何在Scala中实现这一点。
以下是我创建数据框的方法。
val xmlFiles = "C://Users//u6034690//Desktop//SPARK//trfsmallfffile//XML"
val discriptorFileLOcation = "C://Users//u6034690//Desktop//SPARK//trfsmallfffile//FinancialLineItem//REFXML"
import sqlContext.implicits._
val dfDiscriptor = sqlContext.read.format("com.databricks.spark.xml").option("rowTag", "FlatFileDescriptor").load(discriptorFileLOcation)
dfDiscriptor.printSchema()
val firstColumn = dfDiscriptor.select($"FFFileType.FFRecord.FFField").as("FFField")
val FirstColumnOfHeaderFile = firstColumn.select(explode($"FFField")).as("ColumnsDetails").select(explode($"col")).first.get(0).toString().split(",")(5)
println(FirstColumnOfHeaderFile)
//dfDiscriptor.printSchema()
val primaryKeyColumnsFinancialLineItem = dfDiscriptor.select(explode($"FFFileType.FFRecord.FFPrimKey.FFPrimKeyCol"))
primaryKeyColumnsFinancialLineItem.show(false)
添加完整模式。
root
|-- FFColumnDelimiter: string (nullable = true)
|-- FFContentItem: struct (nullable = true)
| |-- _VALUE: string (nullable = true)
| |-- _ffMajVers: long (nullable = true)
| |-- _ffMinVers: double (nullable = true)
|-- FFFileEncoding: string (nullable = true)
|-- FFFileType: array (nullable = true)
| |-- element: struct (containsNull = true)
| | |-- FFPhysicalFile: array (nullable = true)
| | | |-- element: struct (containsNull = true)
| | | | |-- FFFileName: string (nullable = true)
| | | | |-- FFRowCount: long (nullable = true)
| | |-- FFRecord: struct (nullable = true)
| | | |-- FFField: array (nullable = true)
| | | | |-- element: struct (containsNull = true)
| | | | | |-- FFColumnNumber: long (nullable = true)
| | | | | |-- FFDataType: string (nullable = true)
| | | | | |-- FFFacets: struct (nullable = true)
| | | | | | |-- FFMaxLength: long (nullable = true)
| | | | | | |-- FFTotalDigits: long (nullable = true)
| | | | | |-- FFFieldIsOptional: boolean (nullable = true)
| | | | | |-- FFFieldName: string (nullable = true)
| | | | | |-- FFForKey: struct (nullable = true)
| | | | | | |-- FFForKeyCol: string (nullable = true)
| | | | | | |-- FFForKeyRecord: string (nullable = true)
| | | |-- FFPrimKey: struct (nullable = true)
| | | | |-- FFPrimKeyCol: array (nullable = true)
| | | | | |-- element: string (containsNull = true)
| | | |-- FFRecordType: string (nullable = true)
|-- FFHeaderRow: boolean (nullable = true)
|-- FFId: string (nullable = true)
|-- FFRowDelimiter: string (nullable = true)
|-- FFTimeStamp: string (nullable = true)
|-- _env: string (nullable = true)
|-- _ffMajVers: long (nullable = true)
|-- _ffMinVers: double (nullable = true)
|-- _ffPubstyle: string (nullable = true)
|-- _schemaLocation: string (nullable = true)
|-- _sr: string (nullable = true)
|-- _xmlns: string (nullable = true)
|-- _xsi: string (nullable = true)