The ORC file is available here -> https://github.com/Teradata/kylo/blob/master/samples/sample-data/orc/userdata1_orc
After downloading the ORC file, simply rename the file to "data.orc".
The SBT library dependencies are shown below for reference.
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
The Scala program is provided below.
import org.apache.spark.sql.SparkSession object ORCReader extends App { val spark = SparkSession.builder() .master("local") .appName("ORCFileReader") .config("spark.sql.orc.impl", "native") .getOrCreate() import spark.implicits._ val df = spark .read .format("orc") .option("header", "true") .option("inferSchema", "true") .load("C:\\data\\data.orc") df.show() }Here is the output after running the program.
Thanks. That is all for now!
No comments:
Post a Comment