Create dataframe from parallel array in parquet file

dataframes

#1

I extracted 2 array columns from a parquet file and would like to combine them so they are an array of tuples.
data.show
±-------------------±-------------------±-------+
| latitude| longitude| id|
±-------------------±-------------------±-------+
|[47.5450190827668…|[-122.31598925411…| ID44|
|[45.5936572135771…|[-122.61848815929…| ID19|
|[19.8322560460109…|[-156.23200851665…| ID63|
|[44.8814967820617…|[-93.214937130311…| ID91|


Each row is two arrays that correspond to each id. I want to be able to pull out all lat/long points and just get a flat list of [(lat,long), (lat,long)…]. The latitude and longitude fields are the same sized WrappedArrays.

Any ideas?
Thanks


#2

I figured out a little. Its not pretty to get an array out.
import scala.collection.mutable
val row1 = pointsdf.take(1)(0)
val latRow1 = val lat1 = row1.getAsmutable.WrappedArray[Double]
for (lat <- lat1) println(lat)

val longRow1 = row1.getAsmutable.WrappedArray[Double]

So latRow1 is a regular array of doubles.
Can anyone help in getting the two into a tuple as in (lat, long) => (47.545,-122.315),(47.333,-123.444)…?

More editing here since no-one may get this:
Here is the code to take two WrappedArrays and make tuples out of them:
val tuples = latRow1 zip longRow1
// Thats it!!!