In this article we will load AVRO data into a Spark Dataframe using Scala.
The AVRO file can be downloaded from: https://github.com/Teradata/kylo/blob/master/samples/sample-data/avro/userdata1.avro
The AVRO file can be downloaded from: https://github.com/Teradata/kylo/blob/master/samples/sample-data/avro/userdata1.avro
Simply rename the downloaded file to "data.avro" before using it with the below code.
The SBT library dependencies are shown below for reference.
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
libraryDependencies += "com.databricks" %% "spark-avro" % "3.2.0"
The Scala program is provided below.Here is the output after running the program.import org.apache.spark.sql.SparkSession object AVROReader extends App { System.setProperty("hadoop.home.dir","C:\\intellij.winutils") val spark = SparkSession.builder() .master("local") .appName("XMLFileReader") .getOrCreate() import spark.implicits._ val df = spark.read .format("com.databricks.spark.avro") .load("C:\\data\\data.avro") df.show() }
Thanks. That is all for now!
No comments:
Post a Comment