Tag Archives: Hive SemanticException:Expression not in GROUP BY

Hive SemanticException:Expression not in GROUP BY

With multi-dimensional model as the core, let the factory digital transformation and upgrading “within reach”>>>

Problem description

Execute the following SQL in hive:

select av_seq, cp_seq  from dw.smbrandcp group by av_seq;

Prompt error:

FAILED: SemanticException [Error 10025]: Line 1:15 Expression not in GROUP BY key 'cp_seq'

Error analysis

Obviously, the error message has told us that this is a syntax error. When using the group by clause, the select statement can only contain fields in the group by clause. Of course, there can be multiple aggregate functions (such as count) in the select statement

In this example, the smbrandcp table is roughly as follows:

av_ seq cp_ seq
944076 CC200940
944076 CC200743
944076 CC200777

When executing the following query:

select av_seq, cp_seq  from dw.smbrandcp group by av_seq;

Because it’s only in AV_ If you do aggregation on the SEQ field, then for the aggregation operation AV_ seq = 944076, cp_ What value should SEQ take?This creates a variety of possible options. Hive’s way to deal with this kind of SQL whose field is in the select statement but not in the group by clause is to treat it as an invalid SQL, because what HQL wants is precision

References

Hive query and group by clause