A Java-friendly version of PowerIterationClustering.run.
A Java-friendly version of PowerIterationClustering.run.
Run the PIC algorithm.
Run the PIC algorithm.
an RDD of (i, j, sij) tuples representing the affinity matrix, which is the matrix A in the PIC paper. The similarity sij must be nonnegative. This is a symmetric matrix and hence sij = sji. For any (i, j) with nonzero similarity, there should be either (i, j, sij) or (j, i, sji) in the input. Tuples with i = j are ignored, because we assume sij = 0.0.
a PowerIterationClusteringModel that contains the clustering result
Run the PIC algorithm on Graph.
Run the PIC algorithm on Graph.
an affinity matrix represented as graph, which is the matrix A in the PIC paper. The similarity sij represented as the edge between vertices (i, j) must be nonnegative. This is a symmetric matrix and hence sij = sji. For any (i, j) with nonzero similarity, there should be either (i, j, sij) or (j, i, sji) in the input. Tuples with i = j are ignored, because we assume sij = 0.0.
a PowerIterationClusteringModel that contains the clustering result
Set the initialization mode.
Set the initialization mode. This can be either "random" to use a random vector as vertex properties, or "degree" to use normalized sum similarities. Default: random.
Set the number of clusters.
Set the number of clusters.
Set maximum number of iterations of the power iteration loop
Set maximum number of iterations of the power iteration loop
Power Iteration Clustering (PIC), a scalable graph clustering algorithm developed by Lin and Cohen. From the abstract: PIC finds a very low-dimensional embedding of a dataset using truncated power iteration on a normalized pair-wise similarity matrix of the data.
Spectral clustering (Wikipedia)