Getting redundant data in window function in hive


#1

Table:Stocks

exchange String,
stock_symbol String,
stock_date String,
stock_price_open double,
stock_price_high double,
stock_price_low double,
stock_price_close double,
stock_volume double,
stock_price_adj_close double

select exchange_data,symbol_stock,count(*) over (partition by symbol_stock order by symbol_stock desc) as stock_cnt from static_db.stock_base;

Incorrect Output:

I am getting the redundant data in the output.

NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
NYSE QXM 698
Time taken: 27.442 seconds, Fetched: 3598 row(s)

I am expecting below output.

NYSE QRR 839
NYSE QTM 784
NYSE QXM 698


#2

Its not redundant, count(*) calculated for symbol_stock partition you mentioned for that window.

Either you have to apply distinct() or you have to modify the query as group by .

Window function really dont restrict the row count it will apply the function for that window you have specified in over partition.

select exchange_data,symbol_stock,count(*) as stock_cnt from static_db.stock_base
group by exchange_data, symbol_stock ;

or

select distinct exchange_data,symbol_stock,count(*) over (partition by symbol_stock order by symbol_stock desc) as stock_cnt from static_db.stock_base;