Spark教程
作者: 时海 风自在
fold

fold需要指定每个分区的初始化值

1、指定分区数为2:

scala> var rdd1=sc.parallelize(Seq(1,2,3,4,5,6,7,8,9),2)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at parallelize at <console>:24

scala> rdd1.fold(1)((x,y)=>x+y)
res2: Int = 48
2、当分区数变为3时:

scala> var rdd1=sc.parallelize(Seq(1,2,3,4,5,6,7,8,9),3)
rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at parallelize at <console>:24

scala> rdd1.fold(1)((x,y)=>x+y)
res3: Int = 49

标签: rdd1、fold、parallelize、scala、rdd
一个创业中的苦逼程序员
  • 回复
隐藏