In this article, we are going to discuss union,distinct,intersect,subtract transformations.

**Union:**

- Merging of two or more RDDs.
- rdd1.union(rdd2) which outputs a RDD which contains the data from both sources.
- If the input RDDs contain duplicate elements,resultant rdd from union operations also contain duplicate elements.

**Q-1 We have one dataset students_marks.csv and other also students marks dataset.How can we union these?
How can we perform union operations for more than two RDDs simultaneously?
**

val students_marks=spark.sparkContext.textFile(“/FileStore/tables/student_marks.csv”)

val students_rest_marks=spark.sparkContext.textFile(“/FileStore/tables/student_rest_marks.csv”)

val students_union=students_marks.union(students_rest_marks)

students_union.collect().foreach(println)

//We can also perform union operations for multiple RDDs simultaneously.

rdd1 = sc.parallelize([1, 2, 3])

rdd2 = sc.parallelize([4, 5, 6])

rdd3 = sc.parallelize([7, 8, 9])

rdd = sc.union([rdd1, rdd2, rdd3])

rdd.collect()

//Output

// [1, 2, 3, 4, 5, 6, 7, 8, 9]

**Distinct:** If we require unique elements in RDD, distinct function can help here.

**Q-2 Find out the distinct students names from students_marks.csv dataset?
**

val students_marks=spark.sparkContext.textFile(“/FileStore/tables/student_marks.csv”)

val students_name=students_marks.map(x=>(x.split(“,”)(0)))

val student_distinct=students_name.distinct()

student_distinct.collect().foreach(println)

//Output

//Tina

//Harry

//Jimmy

//Rocky

//Stephanie

//Ron

//Williamson

//Joseph

//Bob

**Subtract:** This transformation returns the RDD where elements exist in the first RDD not in the second RDD.

**Q-3 Find out the students details that exist in the first RDD and not in the second RDD?
**

val student_rest=students_union.subtract(students_marks)

student_rest.collect().foreach(println)

**Intersection:** The elements which exist in both the RDDs.It will return the unique elements,remove the duplicate, which exist in both the RDD.

**Q-4 Find out the elements which exist in both the RDDs?
**

val student_intersect=students_union.intersection(student_rest)

student_intersect.collect().foreach(println)

//Output

//Daniel,Hindi,99,100

//Daniel,Chemistry,98,100

//Daniel,Computer Science,99,100

//Daniel,English,50,100

//Daniel,Maths,98,100

//Daniel,Physics,98,100

## 0 Comments