The ORC file is available here - https://github.com/Teradata/kylo/blob/master/samples/sample-data/parquet/userdata1.parquet
After downloading the PARQUET file, simply rename the file to "data.parquet" and place it in "c:\data".
The SBT library dependencies are shown below for reference.
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
The Scala program is provided below.
However, remember to use Jdk 1.8 with Spark 2.3.0.
import org.apache.spark.sql.SparkSession object ParquetReader extends App { val spark = SparkSession.builder() .master("local") .appName("ParquetFileReader") .getOrCreate() import spark.implicits._ val df = spark .read .format("parquet") .option("header", "true") .option("inferSchema", "true") .load("C:\\data\\data.parquet") df.show() }
Here is the output after running the program.
That's all!
No comments:
Post a Comment