In this article we will load nested JSON data with an array into a Spark Dataframe using Scala.
The nested json file with an array is provided below for reference.
The SBT library dependencies are shown below for reference.
The Scala program is provided below.
The nested json file with an array is provided below for reference.
The SBT library dependencies are shown below for reference.
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
The Scala program is provided below.
Here is the output after running - df.show().import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ object NestedJsonReader extends App { System.setProperty("hadoop.home.dir","C:\\intellij.winutils") val spark = SparkSession.builder() .master("local") .appName("JsonFileReader") .getOrCreate() val df = spark.read .format("json") .option("multiline","true") .load("C:\\data\\nested-data.json") //show dataframe with truncated data values if they exceed default column display width df.show() //don't truncate the value, instead expand the column automatically so all the entire value can be viewed df.show(false) //explode the Employees array into rows df.withColumn("Employees", explode(col("Employees"))).show(false) //select each column from the exploded data frame df.withColumn("Employees", explode(col("Employees"))) .select("Department", "Employees.Id", "Employees.Age", "Employees.Name", "Employees.Name.FirstName", "Employees.Name.FirstName").show(false) }
Here is the output after running - df.show(false).
Here is the output after running - df.withColumn("Employees", explode(col("Employees"))).show(false).
Here is the output after running - df.withColumn("Employees", explode(col("Employees")))
.select("Department", "Employees.Id", "Employees.Age", "Employees.Name", "Employees.Name.FirstName", "Employees.Name.FirstName").show(false).
.select("Department", "Employees.Id", "Employees.Age", "Employees.Name", "Employees.Name.FirstName", "Employees.Name.FirstName").show(false).
Thanks. That is all for now!
No comments:
Post a Comment