Sorting on a Column Defined as String

Hi,

I have created a table in Hive, where I have defined Order_id field as String. But it has Integer values.

Now when I run the following SQL, I get incorrect results.
select * from orders_string2 sort by order_id limit 10;
1 2013-07-25 00:00:00.0 11599 CLOSED
10 2013-07-25 00:00:00.0 5648 PENDING_PAYMENT
100 2013-07-25 00:00:00.0 12131 PROCESSING
1000 2013-07-30 00:00:00.0 2321 CLOSED
10000 2013-09-25 00:00:00.0 8983 PROCESSING
10001 2013-09-25 00:00:00.0 316 PENDING_PAYMENT
10002 2013-09-25 00:00:00.0 1530 COMPLETE
10003 2013-09-25 00:00:00.0 8099 COMPLETE
10004 2013-09-25 00:00:00.0 7768 CLOSED
10005 2013-09-25 00:00:00.0 541 COMPLETE

Following SQL gives correct results:
select * from orders_string2 sort by int(order_id) limit 10;
1 2013-07-25 00:00:00.0 11599 CLOSED
2 2013-07-25 00:00:00.0 256 PENDING_PAYMENT
3 2013-07-25 00:00:00.0 12111 COMPLETE
4 2013-07-25 00:00:00.0 8827 CLOSED
5 2013-07-25 00:00:00.0 11318 COMPLETE
6 2013-07-25 00:00:00.0 7130 COMPLETE
7 2013-07-25 00:00:00.0 4530 COMPLETE
8 2013-07-25 00:00:00.0 2911 PROCESSING
9 2013-07-25 00:00:00.0 5657 PENDING_PAYMENT
10 2013-07-25 00:00:00.0 5648 PENDING_PAYMENT

Now when I ran following SQL, i expected an incorrect result. But it worked fine.
select * from orders_string2 where order_id > 100 limit 10;
101 2013-07-25 00:00:00.0 5116 CLOSED
102 2013-07-25 00:00:00.0 8027 COMPLETE
103 2013-07-25 00:00:00.0 12256 PROCESSING
104 2013-07-25 00:00:00.0 7790 PENDING_PAYMENT
105 2013-07-26 00:00:00.0 8220 COMPLETE
106 2013-07-26 00:00:00.0 395 PROCESSING
107 2013-07-26 00:00:00.0 1845 COMPLETE
108 2013-07-26 00:00:00.0 12149 PROCESSING
109 2013-07-26 00:00:00.0 9345 PENDING_PAYMENT
110 2013-07-26 00:00:00.0 2746 COMPLETE

select count(*) from orders_string2 where order_id > 100
68783

Can someone plz shed light on this? Why WHERE and SORT BY clause is working differently?

You should use ‘order by’ by casting to int instead of using sort by. Sort by sorts only with in partition not over all input data

That’s right. Order by is the right clause but why WHERE clause does not need casting?

Are both of the following queries correct?

select * from orders_string2 order by int(order_id) limit 10;
select * from orders_string2 order by cast(order_id as int) limit 10;