pyspark.sql.functions.tuple_sketch_agg_double#
- pyspark.sql.functions.tuple_sketch_agg_double(key, summary, lgNomEntries=None, mode=None)[source]#
Aggregate function: returns the compact binary representation of the Datasketches TupleSketch with double summaries built from the key and summary columns.
New in version 4.2.0.
- Parameters
- key
Columnor column name The column containing key values
- summary
Columnor column name The column containing double summary values
- lgNomEntries
Columnor int, optional The log-base-2 of nominal entries (must be between 4 and 26, defaults to 12)
- mode
Columnor str, optional The summary mode: “sum” (default), “min”, “max”, or “alwaysone”
- key
- Returns
ColumnThe binary representation of the TupleSketch.
See also
Examples
>>> from pyspark.sql import functions as sf >>> df = spark.createDataFrame([(1, 10.0), (2, 20.0), (2, 30.0)], ["key", "value"]) >>> df.agg(sf.tuple_sketch_estimate_double( ... sf.tuple_sketch_agg_double("key", "value"))).show() +--------------------------------------------------------------------------+ |tuple_sketch_estimate_double(tuple_sketch_agg_double(key, value, 12, sum))| +--------------------------------------------------------------------------+ | 2.0| +--------------------------------------------------------------------------+