Why Spark Sql hash() returns the same hash value though the keys are different in some cases



Hello All,

I am calculating the hash value of few columns and determining whether its an Insert/Delete/Update Record but found a scenario which is little weird since some of the records returns same hash value though the key’s are totally different.

For the instance,

scala> spark.sql(“select hash(‘40514XXXXX’),hash(‘41751XXXX’)”).show()
| 976573657| 976573657|

scala> spark.sql(“select hash(‘14589’),hash(‘40004XXXX’)”).show()
| 777096871| 777096871|
I do understand that hash() returns an integer, are these reached the max value?.

Learn Spark 1.6.x or Spark 2.x on our state of the art big data labs

  • Click here for access to state of the art 13 node Hadoop and Spark Cluster