Saturday, August 1, 2020

Convert AVRO to ORC using Scala

In this article we will see how to convert an AVRO file to a ORC file using a Spark Dataframe using Scala.

The input AVRO file has been taken from:

The SBT library dependencies are shown below for reference.

scalaVersion := "2.11.12"
libraryDependencies += "org.apache.spark" %% "spark-core" % "2.3.0"
libraryDependencies += "org.apache.spark" %% "spark-sql" % "2.3.0"
libraryDependencies += "com.databricks" %% "spark-avro" % "3.2.0"
The Scala program is provided below.

import org.apache.spark.sql.{SaveMode, SparkSession}

object AvroToORCConverter extends App {


  val spark = SparkSession.builder()
    .config("spark.sql.orc.impl", "native")

  val inputFile = "C:\\data\\data.avro"  val outputFile = "C:\\data\\out_data_avro2orc"
  import spark.implicits._

  val df = spark    .read

  df    .write

The converted ORC file is shown below.

That's all!

