Hive表存储为ORC格式,如何删除列?

9

我有一个如下所示的Hive表:

create table alpha001(id int, name string) clustered by (id) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true')

我想要删除其中一列,比如“name”列。我尝试了以下方法:

ALTER TABLE alpha001 REPLACE COLUMNS (id int);

这将导致以下结果。
Exception thrown: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. Replace columns is not supported for table default.alpha001. SerDe may be incompatible.

并且以下

ALTER TABLE alpha001 DROP name;

Exception thrown : FAILED: ParseException line 1:26 mismatched input 'name' expecting PARTITION near 'DROP' in drop partition statement

取决于您使用的Hive版本。最新的Hive版本支持删除列。 - V Sree Harissh
我正在使用Hive 0.14。 - Ann
4个回答

8

很遗憾,你不能删除现有表中的列,唯一的方法是使用REPLACE COLUMNS关键字。但是这仅适用于具有本地SerDe(DynamicSerDe、MetadataTypedColumnsetSerDe、LazySimpleSerDe和ColumnarSerDe)的表。

最好的选择是重新创建模式,请按照以下步骤操作。

  1. Check if the table is external. If it isn't, use the following statement to make it external.

    alter table alpha001 set tblproperties('EXTERNAL'='TRUE');
    
  2. Drop the table. Since the table is an external table, you can drop it without dropping the actual table.

  3. Recreate the table with the new schema. You should be able to access the table with new schema.

下面是一个快速示例。

create table alpha001(id int, name string) clustered by (id) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');

--assuming your table is not EXTERNAL already
alter table alpha001 set tblproperties('EXTERNAL'='TRUE');

insert into alpha001 values(1,"A");

select * from alpha001;
OK
1       A
drop table alpha001;

create table alpha001(id int) clustered by (id) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true');

select * from alpha001;
OK
1
Time tak
希望这可以帮助到您!

2
这太不可思议了。仅仅为了删除一列就这么麻烦。 - Naitree

0

您正在错误地使用替换命令,需要在使用替换命令时省略您想要删除的列。

describe formatted bucketed_table;
OK 
col_name        data_type       comment
# col_name              data_type               comment

id                      int
firstname               string
last_name               string

hive>alter table bucketed_table drop column id;

MismatchedTokenException(58!=196)
        at org.antlr.runtime.BaseRecognizer.recoverFromMismatchedToken(BaseRecognizer.java:61                                                                                                7)
        at org.antlr.runtime.BaseRecognizer.match(BaseRecognizer.java:115)
        at org.apache.hadoop.hive.ql.parse.HiveParser_IdentifiersParser.dropPartitionSpec(Hiv                                                                                                eParser_IdentifiersParser.java:10839)
        at org.apache.hadoop.hive.ql.parse.HiveParser.dropPartitionSpec(HiveParser.java:49854                                                                                                )
        at org.apache.hadoop.hive.ql.parse.HiveParser.alterStatementSuffixDropPartitions(Hive                                                                                                Parser.java:13140)
        at org.apache.hadoop.hive.ql.parse.HiveParser.alterTableStatementSuffix(HiveParser.ja                                                                                                va:9370)
        at org.apache.hadoop.hive.ql.parse.HiveParser.alterStatement(HiveParser.java:8803)
        at org.apache.hadoop.hive.ql.parse.HiveParser.ddlStatement(HiveParser.java:4057)
        at org.apache.hadoop.hive.ql.parse.HiveParser.execStatement(HiveParser.java:1786)
        at org.apache.hadoop.hive.ql.parse.HiveParser.statement(HiveParser.java:1152)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:211)
        at org.apache.hadoop.hive.ql.parse.ParseDriver.parse(ParseDriver.java:171)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:438)
        at org.apache.hadoop.hive.ql.Driver.compile(Driver.java:321)
        at org.apache.hadoop.hive.ql.Driver.compileInternal(Driver.java:1224)
        at org.apache.hadoop.hive.ql.Driver.runInternal(Driver.java:1265)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1161)
        at org.apache.hadoop.hive.ql.Driver.run(Driver.java:1151)
        at org.apache.hadoop.hive.cli.CliDriver.processLocalCmd(CliDriver.java:217)
        at org.apache.hadoop.hive.cli.CliDriver.processCmd(CliDriver.java:169)
        at org.apache.hadoop.hive.cli.CliDriver.processLine(CliDriver.java:380)
        at org.apache.hadoop.hive.cli.CliDriver.executeDriver(CliDriver.java:740)
        at org.apache.hadoop.hive.cli.CliDriver.run(CliDriver.java:685)
        at org.apache.hadoop.hive.cli.CliDriver.main(CliDriver.java:625)
        at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
        at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
        at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:                                                                                                43)
        at java.lang.reflect.Method.invoke(Method.java:498)
        at org.apache.hadoop.util.RunJar.run(RunJar.java:233)
        at org.apache.hadoop.util.RunJar.main(RunJar.java:148)
FAILED: ParseException line 1:32 mismatched input 'column' expecting PARTITION near 'drop' in                                                                                                 drop partition statement


hive> alter table bucketed_table replace columns(firstname string , lastname string);
OK
Time taken: 0.164 seconds


hive> select * from bucketed_table;
OK
bucketed_table.firstname        bucketed_table.lastname
3       nikhil
3       nikhil
1       tutu
1       avi
4       tanvi
10      vivek
1       tutu
1       avi
4       tanvi
10      vivek
5       akshay
5       akshay
Time taken: 0.138 seconds, Fetched: 12 row(s)


hive> describe bucketed_table;
OK
col_name        data_type       comment
firstname               string
lastname                string

enter image description here

所以,即使您替换了列,如果对该列进行了分桶,则它仍将保留,但如果该列没有分桶,则它将从元数据和HDFS中删除。


0

在Hive表中,您无法删除列。 使用“ALTER TABLE alpha001 DROP name”会出错,因为在Hive中不支持删除列,但我们可以删除分区。因此,在这里,它会搜索附近的分区名称,而不是“name”列。

因此,要删除列,您必须执行以下操作: 1. 删除表并重新创建不包含该列的表(但如果它是内部表,则可能会导致数据丢失) 2. 您可以使用REPLACE命令(请参阅https://cwiki.apache.org/confluence/display/Hive/LanguageManual+DDL#LanguageManualDDL-AlterTable/Partition/Column


REPLACE命令只在我们创建表时才起作用,例如:create table xyz(id int, name string)。 - Ann
1
但是,如果我们创建的表格如下:create table alpha001(id int, name string) clustered by (id) into 2 buckets stored as orc TBLPROPERTIES ('transactional'='true'),那么我们就不能使用ALTER TABLE alpha001 REPLACE COLUMNS (id int)。它会抛出FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask。默认情况下,不支持替换列。SerDe可能不兼容。 - Ann

0

我刚刚发现我们无法删除存储为ORC的表中的列。唯一的方法是使用“创建表作为选择”选项创建一个具有所需列的新表。


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接