The input JSON file is shown below.
The SBT library dependencies are shown below for reference.
scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
The Scala program is provided below.
The converted Parquet file is shown below.import org.apache.spark.sql.{SaveMode, SparkSession} object JSONToParquetConverter extends App { val spark = SparkSession.builder() .master("local") .appName("JSONToParquetConverter") .getOrCreate() val inputFile = "C:\\data\\data.json" val outputFile = "C:\\data\\out_data_json2parquet" val df = spark .read .format("json") .option("multiline", "true") .load(inputFile) df .write .mode(SaveMode.Overwrite) .option("header","true") .parquet(outputFile) }
That's all!
No comments:
Post a Comment