Exercise 05 - Get revenue for each department per day

hive
#1

Problem:

  • Get only completed orders (hint: order_state = ‘COMPLETE’)
  • Get revenue for each department per day
  • Insert output into a new table

Hint: Join all tables except customers

Please provide following as output:

  • Number of records
  • Sample output of 10 records (It should have order_date, department_name and revenue for each department)
1 Like

#2

select o.order_date , d.department_name , sum(oi.order_item_subtotal) revenue_per_day
from orders o join order_items oi
on o.order_id=oi.order_item_order_id
join products p
on oi.order_item_product_id = p.product_id
join categories c
on p.product_category_id = c.category_id
join departments d
on c.category_department_id= d.department_id
where o.order_status='COMPLETE’
group by o.order_date, d.department_name limit 10;

Number of records:

Time taken: 27.273 seconds, Fetched: 2116 row(s)

2013-07-25 00:00:00.0 Apparel 3279.5701179504395
2013-07-25 00:00:00.0 Fan Shop 9798.69021987915
2013-07-25 00:00:00.0 Fitness 394.92999267578125
2013-07-25 00:00:00.0 Footwear 3899.6100540161133
2013-07-25 00:00:00.0 Golf 2029.720012664795
2013-07-25 00:00:00.0 Outdoors 627.799991607666
2013-07-26 00:00:00.0 Apparel 8828.750286102295
2013-07-26 00:00:00.0 Fan Shop 20847.68049621582
2013-07-26 00:00:00.0 Fitness 183.9800033569336
2013-07-26 00:00:00.0 Footwear 5129.419956207275
Time taken: 26.159 seconds, Fetched: 10 row(s)

0 Likes

#3

select o.order_date , d.department_name , sum(oi.order_item_subtotal) revenue_per_day
from orders o join order_items oi
on o.order_id=oi.order_item_order_id
join products p
on oi.order_item_product_id = p.product_id
join categories c
on p.product_category_id = c.category_id
join departments d
on c.category_department_id= d.department_id
where o.order_status='COMPLETE’
group by o.order_date, d.department_name limit 10;

Number of records:

Time taken: 27.273 seconds, Fetched: 2116 row(s)


o.order_date d.department_name revenue_per_day
2013-07-25 00:00:00.0 Apparel 3279.5701179504395
2013-07-25 00:00:00.0 Fan Shop 9798.69021987915
2013-07-25 00:00:00.0 Fitness 394.92999267578125
2013-07-25 00:00:00.0 Footwear 3899.6100540161133
2013-07-25 00:00:00.0 Golf 2029.720012664795
2013-07-25 00:00:00.0 Outdoors 627.799991607666
2013-07-26 00:00:00.0 Apparel 8828.750286102295
2013-07-26 00:00:00.0 Fan Shop 20847.68049621582
2013-07-26 00:00:00.0 Fitness 183.9800033569336
2013-07-26 00:00:00.0 Footwear 5129.419956207275
Time taken: 26.152 seconds, Fetched: 10 row(s)

0 Likes

#4

#Query

select o.order_date,d.department_name,sum(oi.order_item_subtotal) revene_per_day
from orders o join order_items oi
on oi.order_item_order_id=o.order_id
join products p
on p.product_id=oi.order_item_product_id
join categories c
on c.category_id=p.product_category_id
join departments d
on d.department_id=c.category_dept_id
where order_status='COMPLETE’
group by o.order_date,d.department_name
LIMIT 10;

#Number of records

Time taken: 30.378 seconds, Fetched: 2116 row(s)

#Sample output of 10 records (It should have order_date, department_name and revenue for each department)

o.order_date d.department_name revene_per_day
2013-07-25 00:00:00.0 Apparel 3279.5701179504395
2013-07-25 00:00:00.0 Fan Shop 9798.69021987915
2013-07-25 00:00:00.0 Fitness 394.92999267578125
2013-07-25 00:00:00.0 Footwear 3899.6100540161133
2013-07-25 00:00:00.0 Golf 2029.720012664795
2013-07-25 00:00:00.0 Outdoors 627.799991607666
2013-07-26 00:00:00.0 Apparel 8828.750286102295
2013-07-26 00:00:00.0 Fan Shop 20847.68049621582
2013-07-26 00:00:00.0 Fitness 183.9800033569336
2013-07-26 00:00:00.0 Footwear 5129.419956207275

0 Likes

#5

Sample output:

Total MapReduce CPU Time Spent: 9 seconds 260 msec
OK
2013-07-25 00:00:00.0 Apparel 3279.5699999999997
2013-07-25 00:00:00.0 Fan Shop 9798.689999999997
2013-07-25 00:00:00.0 Fitness 394.92999999999995
2013-07-25 00:00:00.0 Footwear 3899.609999999999
2013-07-25 00:00:00.0 Golf 2029.72
2013-07-25 00:00:00.0 Outdoors 627.8
2013-07-26 00:00:00.0 Apparel 8828.749999999993
2013-07-26 00:00:00.0 Fan Shop 20847.67999999999
2013-07-26 00:00:00.0 Fitness 183.98000000000002
2013-07-26 00:00:00.0 Footwear 5129.419999999998
Time taken: 29.452 seconds, Fetched: 10 row(s)

===================================================

hive> create table ecomm_join as select o.order_date , d.department_name , sum(oi.order_item_subtotal) revenue_per_day
from orders o join order_items oi
on o.order_id=oi.order_item_order_id
join products p
on oi.order_item_product_id = p.product_id
join categories c
on p.product_category_id = c.category_id
join departments d
on c.category_department_id= d.department_id
where o.order_status='COMPLETE’
group by o.order_date, d.department_name limit 10;

Table nagellarajashyam.ecomm_join stats: [numFiles=1, numRows=10, totalSize=463, rawDataSize=453]

0 Likes

#6

#Number of records:
All records with Revenue for each department per day is stored in table dep_rev_per_day :

CREATE TABLE **dep_rev_per_day** row format delimited fields terminated by ',' STORED AS avro 
AS select o.order_date as order_date, d.departments_name as department_name, sum(oi.order_item_subtotal) as revenue_per_day
     from orders o join order_items oi
     on o.order_id = oi.order_item_order_id
     join products p
     on oi.order_item_product_id = p.product_id
     join categories c
     on p.product_category_id = c.category_id
     join departments d
     on c.category_departhment_id = d. departments_id
     where o.order_status = 'COMPLETE'
     group by o.order_date, d.departments_name ;

So total number of records are :

    hive> select count(*) from dep_rev_per_day; 
    OK
    2116
    Time taken: 0.409 seconds, Fetched: 1 row(s)

#Sample output of 10 records (It should have order_date, department_name and revenue for each department):
hive> describe dep_rev_per_day;
OK
order_date string
department_name string
revenue_per_day double
Time taken: 0.355 seconds, Fetched: 3 row(s)

hive> select * from dep_rev_per_day limit 10;
OK
2013-07-25 00:00:00.0	Apparel	3279.5701179504395
2013-07-25 00:00:00.0	Fan Shop	9798.69021987915
2013-07-25 00:00:00.0	Fitness	394.92999267578125
2013-07-25 00:00:00.0	Footwear	3899.6100540161133
2013-07-25 00:00:00.0	Golf	2029.720012664795
2013-07-25 00:00:00.0	Outdoors	627.799991607666
2013-07-26 00:00:00.0	Apparel	8828.750286102295
2013-07-26 00:00:00.0	Fan Shop	20847.68049621582
2013-07-26 00:00:00.0	Fitness	183.9800033569336
2013-07-26 00:00:00.0	Footwear	5129.419956207275
Time taken: 0.182 seconds, Fetched: 10 row(s)
0 Likes

#7

Output is inserted into table ‘dept_revenue’:

create table dept_revenue row format delimited fields terminated by ','
as select od.order_date , d.department_name , sum(ori.order_item_subtotal) revenue_per_day
from orders od join order_items ori
on od.order_id=ori.order_item_order_id
join products p
on ori.order_item_product_id = p.product_id
join categories c
on p.product_category_id = c.category_id
join departments d
on c.category_department_id= d.department_id
where od.order_status='COMPLETE’
group by od.order_date, d.department_name;

Number of records:

hive> select count(*) from dept_revenue;
OK
2116
Time taken: 0.285 seconds, Fetched: 1 row(s)

Sample output of 10 records:

hive> select * from dept_revenue limit 10;
OK
dept_revenue.order_date dept_revenue.department_name dept_revenue.revenue_per_day
2013-07-25 00:00:00.0 Apparel 3279.5701179504395
2013-07-25 00:00:00.0 Fan Shop 9798.69021987915
2013-07-25 00:00:00.0 Fitness 394.92999267578125
2013-07-25 00:00:00.0 Footwear 3899.6100540161133
2013-07-25 00:00:00.0 Golf 2029.720012664795
2013-07-25 00:00:00.0 Outdoors 627.799991607666
2013-07-26 00:00:00.0 Apparel 8828.750286102295
2013-07-26 00:00:00.0 Fan Shop 20847.68049621582
2013-07-26 00:00:00.0 Fitness 183.9800033569336
2013-07-26 00:00:00.0 Footwear 5129.419956207275
Time taken: 0.19 seconds, Fetched: 10 row(s)

0 Likes

#8

--------HIVE QUERY
drop table IF EXISTS dep_day_rev;
create table dep_day_rev
row format delimited
fields terminated by ','
stored as textfile
as select
o.order_date as out_date,
d.department_name as out_dept,
sum (oi.order_item_subtotal) as out_revenue
from orders o join order_items oi
on o.order_id = oi.order_item_order_id
join products p
on oi.order_item_product_id = p.product_id
join categories c
on c.catagory_id = product_catagory_id
join departments d
on d.department_id = c.catagory_department_id
where o.order_status = 'COMPLETE’
group by o.order_date, d.department_name;

------ 10 records

hive> select * from dep_day_rev limit 10;
OK
2013-07-25 00:00:00 Apparel 3279.5701179504395
2013-07-25 00:00:00 Fan Shop 9798.69021987915
2013-07-25 00:00:00 Fitness 394.92999267578125
2013-07-25 00:00:00 Footwear 3899.6100540161133
2013-07-25 00:00:00 Golf 2029.720012664795
2013-07-25 00:00:00 Outdoors 627.799991607666
2013-07-26 00:00:00 Apparel 8828.750286102295
2013-07-26 00:00:00 Fan Shop 20847.68049621582
2013-07-26 00:00:00 Fitness 183.9800033569336
2013-07-26 00:00:00 Footwear 5129.419956207275
Time taken: 0.142 seconds, Fetched: 10 row(s)

– Total number of records :: 2116

0 Likes

#9

select count(*) from fact_rev_per_day;
2116

select * from fact_rev_per_day limit 10;
fact_rev_per_day.order_date fact_rev_per_day.department_name fact_rev_per_day.revenue_per_day
2013-07-25 00:00:00.0 Apparel 3279.5701179504395
2013-07-25 00:00:00.0 Fan Shop 6798.890110015869
2013-07-25 00:00:00.0 Fitness 394.92999267578125
2013-07-26 00:00:00.0 Apparel 8828.750286102295
2013-07-26 00:00:00.0 Fan Shop 15598.030303955078
2013-07-26 00:00:00.0 Fitness 94.0
2013-07-27 00:00:00.0 Apparel 5489.260177612305
2013-07-27 00:00:00.0 Fan Shop 12798.240238189697
2013-07-27 00:00:00.0 Fitness 234.9499969482422
2013-07-27 00:00:00.0 Footwear 549.989990234375

0 Likes

#10

create table date_dept_revenue row format delimited fields terminated by ','
as select od.order_date , d.department_name , sum(ori.order_item_subtotal) revenue_per_day
from orders od join order_items ori
on od.order_id=ori.order_item_order_id
join products p
on ori.order_item_product_id = p.product_id
join categories c
on p.product_category_id = c.category_id
join department d
on c.category_department_id= d.department_id
where od.order_status='COMPLETE’
group by od.order_date, d.department_name;

hive> select count(*) from date_dept_revenue;
OK
2116
Time taken: 0.232 seconds, Fetched: 1 row(s)
hive> select * from date_dept_revenue limit 5;
OK
2013-07-25 00:00:00.0 Apparel 3279.5701179504395
2013-07-25 00:00:00.0 Fan Shop 9798.69021987915
2013-07-25 00:00:00.0 Fitness 394.92999267578125
2013-07-25 00:00:00.0 Footwear 3899.6100540161133
2013-07-25 00:00:00.0 Golf 2029.720012664795
Time taken: 0.206 seconds, Fetched: 5 row(s)

0 Likes

#11

Query:

hive>
> CREATE TABLE IF NOT EXISTS rev_dep_per_day
> ROW FORMAT DELIMITED FIELDS TERMINATED BY “,” STORED AS textfile
> AS SELECT o.order_date, d.department_name, DOUBLE(sum(oi.order_item_subtotal)) AS revenue
> FROM orders o JOIN order_items oi ON o.order_id = oi.order_item_order_id
> JOIN products p ON oi.order_item_product_id = p.product_id
> JOIN categories c ON c.category_id = p.product_category_id
> JOIN departments d ON d.department_id = c.category_department_id
> WHERE o.order_status = ‘COMPLETE’
> GROUP BY o.order_date, d.department_name;

Number of records:
hive> select count(*) from rev_dep_per_day;
OK
2116

Sample output of 10 records:
hive> select * from rev_dep_per_day limit 10;
OK
2013-07-25 00:00:00.0 Apparel 3279.57
2013-07-25 00:00:00.0 Fan Shop 9798.69
2013-07-25 00:00:00.0 Fitness 394.93
2013-07-25 00:00:00.0 Footwear 3899.61
2013-07-25 00:00:00.0 Golf 2029.72
2013-07-25 00:00:00.0 Outdoors 627.8
2013-07-26 00:00:00.0 Apparel 8828.75
2013-07-26 00:00:00.0 Fan Shop 20847.68
2013-07-26 00:00:00.0 Fitness 183.98001
2013-07-26 00:00:00.0 Footwear 5129.42

0 Likes

#12

hive> select o.order_date , d.department_name , sum(oi.order_item_subtotal) as revenue_per_day
> from orders o
> join order_items oi on o.order_id=oi.order_item_order_id
> join products p on oi.order_item_product_id = p.product_id
> join categories c on p.product_category_id = c.category_id
> join departments d on c.category_department_id= d.department_id
> where o.order_status=‘COMPLETE’ group by o.order_date, d.department_name limit 25;
Query ID = chethan121212_20161214231611_a7d74651-34f6-4c1f-9600-cbbfd1236998
Total jobs = 1
Execution log at: /tmp/chethan121212/chethan121212_20161214231611_a7d74651-34f6-4c1f-9600-cbbfd1236998.log
2016-12-14 23:16:18 Starting to launch local task to process map join; maximum memory = 1046478848
2016-12-14 23:16:19 Dump the side-table for tag: 1 with group count: 6 into file: file:/tmp/chethan121212/b6f83526-a844-4f82-8234-3831245f7fae/hive_2016-12-14_23-16-11_632_5045578855139585968-1/-local-10008/HashTable-Stage-5/MapJoin-mapfile01–.hashtable
2016-12-14 23:16:20 Uploaded 1 File to: file:/tmp/chethan121212/b6f83526-a844-4f82-8234-3831245f7fae/hive_2016-12-14_23-16-11_632_5045578855139585968-1/-local-10008/HashTable-Stage-5/MapJoin-mapfile01–.hashtable (422 bytes)
2016-12-14 23:16:20 Dump the side-table for tag: 1 with group count: 58 into file: file:/tmp/chethan121212/b6f83526-a844-4f82-8234-3831245f7fae/hive_2016-12-14_23-16-11_632_5045578855139585968-1/-local-10008/HashTable-Stage-5/MapJoin-mapfile11–.hashtable
2016-12-14 23:16:20 Uploaded 1 File to: file:/tmp/chethan121212/b6f83526-a844-4f82-8234-3831245f7fae/hive_2016-12-14_23-16-11_632_5045578855139585968-1/-local-10008/HashTable-Stage-5/MapJoin-mapfile11–.hashtable (1425 bytes)
2016-12-14 23:16:20 Dump the side-table for tag: 1 with group count: 1345 into file: file:/tmp/chethan121212/b6f83526-a844-4f82-8234-3831245f7fae/hive_2016-12-14_23-16-11_632_5045578855139585968-1/-local-10008/HashTable-Stage-5/MapJoin-mapfile21–.hashtable
2016-12-14 23:16:20 Uploaded 1 File to: file:/tmp/chethan121212/b6f83526-a844-4f82-8234-3831245f7fae/hive_2016-12-14_23-16-11_632_5045578855139585968-1/-local-10008/HashTable-Stage-5/MapJoin-mapfile21–.hashtable (29611 bytes)
2016-12-14 23:16:20 Dump the side-table for tag: 0 with group count: 22899 into file: file:/tmp/chethan121212/b6f83526-a844-4f82-8234-3831245f7fae/hive_2016-12-14_23-16-11_632_5045578855139585968-1/-local-10008/HashTable-Stage-5/MapJoin-mapfile30–.hashtable
2016-12-14 23:16:20 Uploaded 1 File to: file:/tmp/chethan121212/b6f83526-a844-4f82-8234-3831245f7fae/hive_2016-12-14_23-16-11_632_5045578855139585968-1/-local-10008/HashTable-Stage-5/MapJoin-mapfile30–.hashtable (990780 bytes)
2016-12-14 23:16:20 End of local task; Time Taken: 1.763 sec.
Execution completed successfully
MapredLocal task succeeded
Launching Job 1 out of 1
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1480307771710_2865, Tracking URL = http://rm01.itversity.com:8088/proxy/application_1480307771710_2865/
Kill Command = /usr/hdp/2.5.0.0-1245/hadoop/bin/hadoop job -kill job_1480307771710_2865
Hadoop job information for Stage-5: number of mappers: 1; number of reducers: 1
2016-12-14 23:16:29,427 Stage-5 map = 0%, reduce = 0%
2016-12-14 23:16:34,658 Stage-5 map = 100%, reduce = 0%, Cumulative CPU 6.4 sec
2016-12-14 23:16:40,869 Stage-5 map = 100%, reduce = 100%, Cumulative CPU 9.6 sec
MapReduce Total cumulative CPU time: 9 seconds 600 msec
Ended Job = job_1480307771710_2865
MapReduce Jobs Launched:
Stage-Stage-5: Map: 1 Reduce: 1 Cumulative CPU: 9.6 sec HDFS Read: 5433568 HDFS Write: 1203 SUCCESS
Total MapReduce CPU Time Spent: 9 seconds 600 msec
OK
2013-07-25 00:00:00.0 Apparel 3279.5701179504395
2013-07-25 00:00:00.0 Fan Shop 9798.69021987915
2013-07-25 00:00:00.0 Fitness 394.92999267578125
2013-07-25 00:00:00.0 Footwear 3899.6100540161133
2013-07-25 00:00:00.0 Golf 2029.720012664795
2013-07-25 00:00:00.0 Outdoors 627.799991607666
2013-07-26 00:00:00.0 Apparel 8828.750286102295
2013-07-26 00:00:00.0 Fan Shop 20847.68049621582
2013-07-26 00:00:00.0 Fitness 183.9800033569336
2013-07-26 00:00:00.0 Footwear 5129.419956207275
2013-07-26 00:00:00.0 Golf 6189.370037078857
2013-07-26 00:00:00.0 Outdoors 986.6800136566162
2013-07-27 00:00:00.0 Apparel 5489.260177612305
2013-07-27 00:00:00.0 Fan Shop 16697.980381011963
2013-07-27 00:00:00.0 Fitness 234.9499969482422
2013-07-27 00:00:00.0 Footwear 4477.589975357056
2013-07-27 00:00:00.0 Golf 5149.690017700195
2013-07-27 00:00:00.0 Outdoors 1106.740005493164
2013-07-28 00:00:00.0 Apparel 6589.170246124268
2013-07-28 00:00:00.0 Fan Shop 12298.490280151367
2013-07-28 00:00:00.0 Fitness 59.9900016784668
2013-07-28 00:00:00.0 Footwear 3249.5900115966797
2013-07-28 00:00:00.0 Golf 4114.810012817383
2013-07-28 00:00:00.0 Outdoors 700.8600044250488
2013-07-29 00:00:00.0 Apparel 8168.860248565674
Time taken: 31.699 seconds, Fetched: 25 row(s)

0 Likes