Spark教程
fold需要指定每个分区的初始化值
1、指定分区数为2:
scala> var rdd1=sc.parallelize(Seq(1,2,3,4,5,6,7,8,9),2) rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[4] at parallelize at <console>:24 scala> rdd1.fold(1)((x,y)=>x+y) res2: Int = 482、当分区数变为3时:
scala> var rdd1=sc.parallelize(Seq(1,2,3,4,5,6,7,8,9),3) rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[5] at parallelize at <console>:24 scala> rdd1.fold(1)((x,y)=>x+y) res3: Int = 49