Data Not Loading into orders_stage

hive
#1

As part of certification video i am trying to load data into retail_stage schema in orders_stage table from orders.psv file.
By doing ‘cat’ on the file i can see the data present in file properly But when i run this command :slight_smile:

hive>
    >
    > create table orders_stage (
    > order_id int,
    > order_date string,
    > order_customer_id int,
    > order_status string
    > )
    > ROW FORMAT DELIMITED FIELDS TERMINATED BY '|'
    > STORED AS TEXTFILE;
OK
Time taken: 0.075 seconds
hive>
    >
    > load data local inpath '/tmp/orders.psv' overwrite into table orders_stage;
Loading data to table retail_stage.orders_stage
Table retail_stage.orders_stage stats: [numFiles=1, numRows=0, totalSize=2862178, rawDataSize=0]
OK
Time taken: 0.513 seconds
hive>
    > select * from orders_stage;
OK
1       2013-07-25 00:00:00     11599   CLOSEDn2
Time taken: 0.072 seconds, Fetched: 1 row(s)

Don’t know why it is behaving like this … Any help on this …??

0 Likes

#2

Also when i do cat on the .psv files that we created from mysql data looks continuous & not per line

[cloudera@quickstart tmp]$ cat customers.psv | head -10
1|Richard|Her\na\ndez|XXXXXXXXX|XXXXXXXXX|6303 Heather Plaza|Brow\nsville|TX|78521n2|Mary|Barrett|XXXXXXXXX|XXXXXXXXX|9526 Noble Embers Ridge|Littleto\n|CO|80126n3|A\n\n|Smith|XXXXXXXXX|XXXXXXXXX|3422 Blue Pio\neer Be\nd|Caguas|PR|00725n4|Mary|Jo\nes|XXXXXXXXX|XXXXXXXXX|8324 Little Commo\n|Sa\n Marcos|CA|92069n5|Robert|Hudso\n|XXXXXXXXX|XXXXXXXXX|10 Crystal River Mall |Caguas|PR|00725n6|Mary|Smith|XXXXXXXXX|XXXXXXXXX|3151 Sleepy Quail Prome\nade|Passaic|NJ|07055n7|Melissa|Wilcox|XXXXXXXXX|XXXXXXXXX|9453 High Co\ncessio\n|Caguas|PR|00725n8|Mega\n|Smith|XXXXXXXXX|XXXXXXXXX|3047 Foggy Forest Plaza|Lawre\nce|MA|01841n9|Mary|Perez|XXXXXXXXX|XXXXXXXXX|3616 Quaki\ng Street|Caguas|PR|00725n10|Melissa|Smith|XXXXXXXXX|XXXXXXXXX|8598 Harvest Beaco\n Plaza|Stafford|VA|22554n11|Mary|Huffma\n|XXXXXXXXX|XXXXXXXXX|3169 Sto\ny Woods|Caguas|PR|00725n12|Christopher|Smith|XXXXXXXXX|XXXXXXXXX|5594 Jagged Embers By-pass|Sa\n A\nto\nio|TX|78227n13|Mary|Baldwi\n|XXXXXXXXX|XXXXXXXXX|7922 Iro\n Oak Garde\ns|Caguas|PR|00725n14|Katheri\ne|Smith|XXXXXXXXX|XXXXXXXXX|5666 Hazy Po\ny Square|Pico Rivera|CA|90660n15|Ja\ne|Lu\na|XXXXXXXXX|XXXXXXXXX|673 Bur\ni\ng Gle\n|Fo\nta\na|CA|92336n16|Tiffa\ny|Smith|XXXXXXXXX|XXXXXXXXX|6651 Iro\n Port|Caguas|PR|00725n17|Mary|Robi\nso\n|XXXXXXXXX|XXXXXXXXX|1325 Noble Pike|Taylor|MI|48180n18|Robert|Smith|XXXXXXXXX|XXXXXXXXX|2734 Hazy Butterfly Circle|Marti\nez|CA|94553n19|Stepha\nie|Mitchell|XXXXXXXXX|XXXXXXXXX|3543 Red Treasure Bay|Caguas|PR|00725n20|Mary|Ellis|XXXXXXXXX|XXXXXXXXX|4703 Old Route|West New York|NJ|07093n21|William|Zimmerma\n|XXXXXXXXX|XXXXXXXXX|3323 Old Willow Mall |Caguas|PR|00725n22|Joseph|Smith|XXXXXXXXX|XXXXXXXXX|7740 Broad Fox Villas|North Berge\n|NJ|07047n23|Be\njami\n|Duarte|XXXXXXXXX|
0 Likes

#3

Please answer below questions
From where did you extracted ‘.psv’ files? and how?
If you have extracted that data from mysql to ‘.psv’, instead of this you can directly import to Hive. To solve this kind of schema issue.

0 Likes

#4

I Tried again from start :confused: But i think there is something really wrong in the way we are putting data from mysql
See the output & Query in BOLD

hive>
> load data local inpath ‘/tmp/customers.psv’ overwrite into table retail_ods.customers;
Loading data to table retail_ods.customers
Table retail_ods.customers stats: [numFiles=1, numRows=0, totalSize=978453, rawDataSize=0]
OK
Time taken: 0.47 seconds
hive>
>
> select count(*) from customers;
Query ID = cloudera_20161229214444_ef967adf-f3b3-4a5a-9d56-849423e7c5ae
Total jobs = 1
Launching Job 1 out of 1
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=
In order to set a constant number of reducers:
set mapreduce.job.reduces=
Starting Job = job_1482966494656_0021, Tracking URL = http://quickstart.cloudera:8088/proxy/application_1482966494656_0021/
Kill Command = /usr/lib/hadoop/bin/hadoop job -kill job_1482966494656_0021
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 1
2016-12-29 21:45:01,802 Stage-1 map = 0%, reduce = 0%
2016-12-29 21:45:06,011 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.02 sec
2016-12-29 21:45:12,256 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 2.27 sec
MapReduce Total cumulative CPU time: 2 seconds 270 msec
Ended Job = job_1482966494656_0021
MapReduce Jobs Launched:
Stage-Stage-1: Map: 1 Reduce: 1 Cumulative CPU: 2.27 sec HDFS Read: 986211 HDFS Write: 2 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 270 msec
> OK
> 1
Time taken: 17.477 seconds, Fetched: 1 row(s)
> hive> select * from customers;
> OK
> 1 Richard Her\na\ndez XXXXXXXXX XXXXXXXXX 6303 Heather Plaza Brow\nsville TX 78521n2
> Time taken: 0.074 seconds, Fetched: 1 row(s)
hive>

0 Likes

#5

Yes I did it from MySQL same as shown in the videos…

0 Likes

#6

give database name as a prefix.

eg: Select * from retail_stage.orders_stage;

0 Likes