Data frame operations

pyspark

#1

Hi Team,

I have a question, I have been given the list. I have to find the number of occurecnces for each of the number,ex 1 is repeated 6 times. I am trying to use the dataframe for the same but I am not able to resolve it. I am attaching the code as well as the error. Please help me to resolve this issue.

b=sc.parallelize([1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1])
from pyspark.sql import Row
from pyspark.sql.types import StructType
from pyspark.sql.types import StructField
from pyspark.sql.types import StringType
from pyspark.sql.types import IntegerType
schema=StructType([StructField(“name”,IntegerType(),True)])
sqlContext.createDataFrame(b,schema).registerTempTable(‘example’)

ERROR :- ValueError: Unexpected tuple ‘1’ with StructType


#2

@vibhoroffice Try this command for your expected output:

b=sc.parallelize([1,2,3,4,5,6,7,8,2,4,2,1,1,1,1,1])
b_map= b.map(lambda x: (x,))
from pyspark.sql.types import StructType
from pyspark.sql.types import StructField
from pyspark.sql.types import IntegerType
schema=StructType([StructField(“name”,IntegerType(),True)])
df=sqlContext.createDataFrame(b_map,schema).groupBy(‘name’).count().registerTempTable(‘example’)
sqlContext.sql(“select * from example”).show()

output:
image