Partitioner (Spark 2.2.3 JavaDoc)

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method

Object
- org.apache.spark.Partitioner

All Implemented Interfaces:

java.io.Serializable

Direct Known Subclasses:

HashPartitioner, RangePartitioner
```
public abstract class Partitioner
extends Object
implements scala.Serializable
```
An object that defines how the elements in a key-value pair RDD are partitioned by key. Maps each key to a partition ID, from 0 to numPartitions - 1.
Note that, partitioner must be deterministic, i.e. it must return the same partition id given the same partition key.

See Also:

Serialized Form

Constructor Summary

Constructors
Constructor and Description

Partitioner()

Method Summary

All Methods Static Methods Instance Methods Abstract Methods Concrete Methods
Modifier and Type	Method and Description
`static Partitioner`	`defaultPartitioner(RDD<?> rdd, scala.collection.Seq<RDD<?>> others)` Choose a partitioner to use for a cogroup-like operation between a number of RDDs.
`abstract int`	`getPartition(Object key)`
`abstract int`	`numPartitions()`

Methods inherited from class Object
equals, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

- Constructor Detail
  - Partitioner
```
public Partitioner()
```
- Method Detail
  - defaultPartitioner
```
public static Partitioner defaultPartitioner(RDD<?> rdd,
                                             scala.collection.Seq<RDD<?>> others)
```
    Choose a partitioner to use for a cogroup-like operation between a number of RDDs.
    If any of the RDDs already has a partitioner, choose that one.
    Otherwise, we use a default HashPartitioner. For the number of partitions, if spark.default.parallelism is set, then we'll use the value from SparkContext defaultParallelism, otherwise we'll use the max number of upstream partitions.
    Unless spark.default.parallelism is set, the number of partitions will be the same as the number of partitions in the largest upstream RDD, as this should be least likely to cause out-of-memory errors.
    We use two method parameters (rdd, others) to enforce callers passing at least 1 RDD.
    
    Parameters:
    
    rdd - (undocumented)
    
    others - (undocumented)
    
    Returns:
    
    (undocumented)
  - numPartitions
```
public abstract int numPartitions()
```
  - getPartition
```
public abstract int getPartition(Object key)
```

Skip navigation links

All Classes

Summary:
Nested |
Field |
Constr |
Method

Detail:
Field |
Constr |
Method