Tag: DataSet


While working in Apache Spark with Scala, we often need to convert RDD to DataFrame and Dataset as these provide more advantages over RDD. For instance, DataFrame is a distributed collection of data organised into named columns similar to Database Read more…


Spark RDD can be created in several ways using Scala & Pyspark languages. For example, It can be created by using sparkContext.parallelize() from text file from another RDD DataFrame DataSet Resilient Distributed Datasets (RDD) is the fundamental data structure of Read more…