Spark教程
aggregate用于计算总和及元素个数,进而可以求平均值
scala> var rdd1=sc.parallelize(Seq(1,2,3,4,5,6,7,8,9),3) rdd1: org.apache.spark.rdd.RDD[Int] = ParallelCollectionRDD[6] at parallelize at <console>:24 scala> rdd1.aggregate((0,0))((x,y)=>(x._1+y,x._2+1),(x,y)=>(x._1+y._1,x._2+y._2)) res5: (Int, Int) = (45,9)