Joins without schema in Pig

Is there any way in Pig to Join two data set without schema?
I have two data sets having more than 30 columns. Is there any way I can join these two data set without defining schema?.
I got this question in HDPCD exam

yes, you can join them.
First you need to generate the column names for the corresponding columns names specified in the question.

eg: if they specify 4 column names with their positions in the original table like 2nd column, 6th column , 8th column and 11th column , you have to generate them.
Assuming it as a comma separated file:

table = LOAD ‘/location/file.txt’ USING PigStorage(",")

dataneeded = foreach table generate $1 As xyz, $5 as ABC , $7 AS BGR, $10 AS PRJ

always remember to minus the column value provided in the question (e:g: 4th position means $3)

There would be a common column for both and that becomes the key.

you need to find out the common key and join them based on it.

Example : join A on key-column, B on key-column

Hope this helps.


1 Like