The nested XML file is provided below for reference.
The SBT library dependencies are shown below for reference.
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
libraryDependencies += "com.databricks" %% "spark-xml" % "0.6.0"
The Scala program is provided below.import org.apache.spark.sql.SparkSession object NestedXMLReader extends App { System.setProperty("hadoop.home.dir","C:\\intellij.winutils") val spark = SparkSession.builder() .master("local") .appName("XMLFileReader") .getOrCreate() val df = spark.read .format("xml") .option("rowTag", "person") .load("C:\\data\\nested-data.xml") df.select("Id", "Age", "Name.FirstName", "Name.LastName").show()}
Here is the output after running the program.
Thanks. That is all for now!
No comments:
Post a Comment