In this article we will load nested JSON data with an array into a Spark Dataframe using Scala.
The nested json file with an array is provided below for reference.

The SBT library dependencies are shown below for reference.
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
The Scala program is provided below.
import org.apache.spark.sql.SparkSession
import org.apache.spark.sql.functions._
object NestedJsonReader extends App {
System.setProperty("hadoop.home.dir","C:\\intellij.winutils")
val spark = SparkSession.builder()
.master("local")
.appName("JsonFileReader")
.getOrCreate()
val df = spark.read
.format("json")
.option("multiline","true")
.load("C:\\data\\nested-data.json")
//show dataframe with truncated data values if they exceed default column display width df.show()
//don't truncate the value, instead expand the column automatically so all the entire value can be viewed df.show(false)
//explode the Employees array into rows df.withColumn("Employees", explode(col("Employees"))).show(false)
//select each column from the exploded data frame df.withColumn("Employees", explode(col("Employees")))
.select("Department", "Employees.Id", "Employees.Age", "Employees.Name", "Employees.Name.FirstName", "Employees.Name.FirstName").show(false)
}
Here is the output after running -
df.show().
Thanks. That is all for now!