rdd=sc.textFile(json or xml)
rdd.collect()
[u'{', u' "glossary": {', u' "title": "example glossary",', u'\t\t"GlossDiv": {', u' "title": "S",', u'\t\t\t"GlossList": {', u' "GlossEntry": {', u' "ID": "SGML",', u'\t\t\t\t\t"SortAs": "SGML",', u'\t\t\t\t\t"GlossTerm": "Standard Generalized Markup Language",', u'\t\t\t\t\t"Acronym": "SGML",', u'\t\t\t\t\t"Abbrev": "ISO 8879:1986",', u'\t\t\t\t\t"GlossDef": {', u' "para": "A meta-markup language, used to create markup languages such as DocBook.",', u'\t\t\t\t\t\t"GlossSeeAlso": ["GML", "XML"]', u' },', u'\t\t\t\t\t"GlossSee": "markup"', u' }', u' }', u' }', u' }', u'}', u'']
但是我的输出应该是一行中的所有内容。
{"glossary": {"title": "example glossary","GlossDiv": {"title": "S","GlossList":.....}}