将XML数据加载到Hive表中:org.apache.hadoop.hive.ql.metadata.HiveException

5

我正在尝试将XML数据加载到Hive中,但是我遇到了一个错误:

java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive在处理行{"xmldata":""}时出现运行时错误

我使用的xml文件是:

<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>

The hive query i have used is :

1) Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/user/xmlfile.xml' OVERWRITE INTO TABLE xmltable;

2) CREATE VIEW xmlview (id,genre,price)
AS SELECT
xpath(xmldata, '/catalog[1]/book[1]/id'),
xpath(xmldata, '/catalog[1]/book[1]/genre'),
xpath(xmldata, '/catalog[1]/book[1]/price')
FROM xmltable;

3) CREATE TABLE xmlfinal AS SELECT * FROM xmlview;

4) SELECT * FROM xmlfinal WHERE id ='11

直到第二个查询都很好,但当我执行第三个查询时,它会给我一个错误:
以下是错误信息:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.ExecMapper.map(ExecMapper.java:159)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:417)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:332)
    at org.apache.hadoop.mapred.Child$4.run(Child.java:268)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:415)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1438)
    at org.apache.hadoop.mapred.Child.main(Child.java:262)
 Caused by: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error    while processing row {"xmldata":"<?xml version=\"1.0\" encoding=\"UTF-8\"?>"}
    at org.apache.hadoop.hive.ql.exec.MapOperator.process(MapOperator.java:675)
    at org.apache.hadoop.hive.ql.exec

FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.MapRedTask

所以出了什么问题?我也在使用正确的XML文件。谢谢,Shree。

以上帖子有任何更新吗? - shree11
希望你在Hive终端上收到了[Fatal Error]:n:nn:文件过早结束的错误提示。 - vijay kumar
6个回答

4
错误原因: 1)情况一:您的情况)- 将XML内容逐行传送到Hive中。 输入XML:
<?xml version="1.0" encoding="UTF-8"?>
<catalog>
<book>
  <id>11</id>
  <genre>Computer</genre>
  <price>44</price>
</book>
<book>
  <id>44</id>
  <genre>Fantasy</genre>
  <price>5</price>
</book>
</catalog>  

在Hive中检查:

select count(*) from xmltable;  // return 13 rows - means each line in individual row with col xmldata  

错误原因:

XML被读取为13个不统一的部分,因此无效的XML

2) 情况-2:xml内容应作为单个字符串提供给Hive - XpathUDFs 可用, 参考语法:所有函数都遵循以下形式:xpath_(xml_string,xpath_expression_string)。*来源

input.xml

<?xml version="1.0" encoding="UTF-8"?><catalog><book><id>11</id><genre>Computer</genre><price>44</price></book><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog>

在Hive中检查:

select count(*) from xmltable; // returns 1 row - XML is properly read as complete XML.

意思:

xmldata   = <?xml version="1.0" encoding="UTF-8"?><catalog><book> ...... </catalog>

然后像这样应用您的xpathUDF
select xpath(xmldata, 'xpath_expression_string' ) from xmltable

嗨,你说得对,我应该将xmldata作为单个字符串提供,现在我能够创建xmlview而没有任何错误。但是我没有得到正确的结果。我使用了我上面发布的相同查询。当我执行第四个查询即SELECT * FROM xmlfinal时,结果为[] [] [] - shree11
如果我使用 xpath_string 而不是 xpath,则只会得到第一行输出,即 11 Computer 44。但我希望返回两行作为结果。为什么 XPATH 没有返回任何结果? - shree11
你想要这样的输出吗?行:1 11电脑44,行:2 44幻想五 - vijay kumar
是的,我想要输出为行1,行2... 11 Computer 44 44 fantasy 5。但我该如何实现它? - shree11
嗨维杰,我有一个疑问。就像你建议添加jar包一样。我没明白?我需要创建jar文件还是需要添加现有的jar文件?我不明白那一部分。你可以告诉我怎么做吗?还有我需要在哪里添加那个jar包(位置)? - shree11
下载 - https://github.com/klout/brickhouse/archive/master.zip,解压缩,进入brickhouse目录,执行mvn package命令,它将在brickhouse/target/brickhouse-0.7.0-SNAPSHOT.jar中创建brickhouse-0.7.0-SNAPSHOT.jar文件。将此jar文件放置在hive机器上,我将其放置在/home/vijay/目录下。按答案将jar文件添加到hive终端中。 - vijay kumar

4
在这里找到Jar文件 --> Brickhouse
示例代码在此处 --> Example Stack Overflow中的类似示例 - 在这里解决方案:
--Load xml data to table
DROP table xmltable;
Create TABLE xmltable(xmldata string) STORED AS TEXTFILE;
LOAD DATA lOCAL INPATH '/home/vijay/data-input.xml' OVERWRITE INTO TABLE xmltable;

-- check contents
SELECT * from xmltable;

-- create view
Drop view  MyxmlView;
CREATE VIEW MyxmlView(id, genre, price) AS
SELECT
 xpath(xmldata, 'catalog/book/id/text()'),
 xpath(xmldata, 'catalog/book/genre/text()'),
 xpath(xmldata, 'catalog/book/price/text()')
FROM xmltable;

-- check view
SELECT id, genre,price FROM MyxmlView;


ADD jar /home/vijay/brickhouse-0.7.0-SNAPSHOT.jar;  --Add brickhouse jar 

CREATE TEMPORARY FUNCTION array_index AS 'brickhouse.udf.collect.ArrayIndexUDF';
CREATE TEMPORARY FUNCTION numeric_range AS 'brickhouse.udf.collect.NumericRange';

SELECT 
   array_index( id, n ) as my_id,
   array_index( genre, n ) as my_genre,
   array_index( price, n ) as my_price
from MyxmlView
lateral view numeric_range( size( id )) MyxmlView as n;

输出:

hive > SELECT
     >    array_index( id, n ) as my_id,
     >    array_index( genre, n ) as my_genre,
     >    array_index( price, n ) as my_price
     > from MyxmlView
     > lateral view numeric_range( size( id )) MyxmlView as n;
Automatically selecting local only mode for query
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Execution log at: /tmp/vijay/.log
Job running in-process (local Hadoop)
Hadoop job information for null: number of mappers: 0; number of reducers: 0
2014-07-09 05:36:45,220 null map = 0%,  reduce = 0%
2014-07-09 05:36:48,226 null map = 100%,  reduce = 0%
Ended Job = job_local_0001
Execution completed successfully
Mapred Local Task Succeeded . Convert the Join into MapJoin
OK
my_id      my_genre      my_price
11      Computer        44
44      Fantasy 5

用时:8.541秒,获取了2行记录。

按照问题所有者的要求添加更多信息:

这里输入图片描述 这里输入图片描述


жҲ‘ж·»еҠ дәҶдҪ жҢҮе®ҡзҡ„еҝ«з…§jarпјҢеҪ“жҲ‘иҝҗиЎҢд»ҘдёӢд»Јз Ғж—¶SELECT array_index( id, n ) as my_id, array_index( genre, n ) as my_genre, array_index( price, n ) as my_price from xmlView lateral view numeric_range( size( id )) xmlView as n;еҮәзҺ°д»ҘдёӢй”ҷиҜҜгҖӮ FAILED: SemanticException [Error 10016]: Line 6:34 Argument type mismatch 'id': "map" or "list" is expected at function SIZE, but "int" is foundгҖӮеңЁеҲӣе»әVIEWж—¶пјҢжҲ‘дҪҝз”ЁдәҶXPATH_intдҪңдёәidе’ҢXPATH_stringдҪңдёәgenreе’ҢpriceгҖӮд№ҹе°қиҜ•иҝҮеҸӘз»ҷеҮә'XPATH'пјҢдҪҶд»Қ然еӯҳеңЁзӣёеҗҢзҡ„й”ҷиҜҜгҖӮ - shree11
官方的Hive文档指出:xpath()函数总是返回一个字符串数组。而xpath_string()函数则返回第一个匹配节点的文本 - 但我们需要所有书籍的信息,而不仅仅是一个。首先确保MyxmlView视图正常。你能否粘贴以下查询结果:SELECT id, genre, price FROM MyxmlView; - vijay kumar
我使用了XPATH,当我执行SELECT id,genre,price from xmlview时,输出结果为[] [] []。在添加jar并创建2个临时函数后,当我执行SELECT array_index( id, n ) as my_id, array_index( genre, n ) as my_genre, array_index( price, n ) as my_price from xmlView lateral view numeric_range( size( id )) xmlView as n;时,查询可以正常执行,但没有任何输出。 Job 0: Map: 1 Cumulative CPU: 2.86 sec HDFS Read: 402 HDFS Write: 0 SUCCESS Total MapReduce CPU Time Spent: 2 seconds 860 msec OK Time taken: 19.751 seconds - shree11
请检查快照。您的数据加载部分或视图似乎有问题。 - vijay kumar
SELECT * from xmltable; 也是空的吗? - vijay kumar
显示剩余8条评论

1

然后按照以下步骤进行操作,以获得您想要的解决方案,只需更改源数据即可。

 <catalog><book><id>11</id><genre>Computer</genre><price>44</price></book></catalog>
<catalog><book><id>44</id><genre>Fantasy</genre><price>5</price></book></catalog> 

现在尝试以下步骤:

select xpath(xmldata, '/catalog/book/id/text()')as id,
xpath(xmldata, '/catalog/book/genre/text()')as genre,
xpath(xmldata, '/catalog/book/price/text()')as price FROM xmltable;

现在你将会得到如下的答案:
["11"] ["Computer"] ["44"]
["44"] ["Fantasy"] ["5"]
如果你使用xpath_string、xpath_int和xpath_int UDFs,那么你将会得到如下的答案:
11 computer 44
44 Fantasy 5。
谢谢。

0

同时确保XML文件在最后一个闭合标签后没有任何空格。 在我的情况下,源文件有一个空格,每当我将文件加载到Hive中时,我的结果表中就会包含NULLS。 因此,每当我应用xpath函数时,结果会有一些[] [] [] [] [] []

虽然xpath_string函数可以工作,但xpath_double和xpath_int函数从未工作过。它不断抛出这个异常-

Diagnostic Messages for this Task:
java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: Hive Runtime Error while processing row {"line":""}

0

首先尝试加载文件my add file path-to-file,这将解决您的问题,因为在我的情况下已经解决了。


和我想的一样。 - Hafiz Muhammad Shafiq

0

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接