Pyspark transpose rdd

pyspark
#1

I have a small doubt in pyspark

I have a table em p in testing database of hive. I want to do transpose of the table in pyspark

Th table looks like below

1 ram 2000.0 101 market
2 shyam 3000.0 102 IT
3 sam 4000.0 103 finance
4 remo 1000.0 103 finance

I tried the below code in pyspark shell

test = sqlContext.sql(“select * from testing.emp”)
data = test.flatMap (lambda row: [Row (id=row [‘id’],name=row[‘name’],column_name=col,column_val=row [col]) for col in (‘sal’,‘dno’,‘dname’)])
emp = sqlContext.createDataFrame(data)
emp.registerTempTable(‘mytempTable’)
sqlContext.sql(‘create table testing.test(id int,name string,column_name string,column_val int) row format delimited fields terminated by “,”’)
sqlContext.sql(‘INSERT INTO TABlE testing.test select * from mytempTable’)

the expected output is

1 ram sal 2000
1 ram dno 101
1 ram dname market
2 shyam sal 3000
2 shyam dno 102
2 shyam dname IT
3 sam sal 4000
3 sam dno 103
3 sam dname finance
4 remo sal 1000
4 remo dno 103
4 remo dname finance

But the output I get is

NULL 2000.0 1 NULL
NULL NULL 1 NULL
NULL NULL 1 NULL
NULL 3000.0 2 NULL
NULL NULL 2 NULL
NULL NULL 2 NULL
NULL 4000.0 3 NULL
NULL NULL 3 NULL
NULL NULL 3 NULL
NULL 1000.0 4 NULL
NULL NULL 4 NULL
NULL NULL 4 NULL

Could you please explain and correct my mistakes

thank you

0 Likes

#2

You have to execute step by step and probably you will get the solution.

0 Likes

#3

Hi Actually I have done step by step but for posting here I have put the code together

0 Likes

#4

I do not understand why you are doing that way you can directly insert data into testing.test from testing.emp

This is not efficient way of loading from one table to another hive table.

0 Likes

#5

I want to transpose the columns and rows in spark, I have a bigger table than this.

0 Likes