pyspark.sql.functions.tuple_intersection_theta_integer#

pyspark.sql.functions.tuple_intersection_theta_integer(col1, col2, mode=None)[source]#

Intersects a Datasketches TupleSketch with integer summaries with a ThetaSketch.

New in version 4.2.0.

Parameters
col1Column or column name

The TupleSketch column with integer summaries

col2Column or column name

The ThetaSketch column

modeColumn or str, optional

The summary mode: “sum” (default), “min”, “max”, or “alwaysone”

Returns
Column

The binary representation of the intersected TupleSketch.

Examples

>>> from pyspark.sql import functions as sf
>>> df = spark.createDataFrame([(1, 1, 1), (2, 2, 2), (3, 3, 4)], ["key1", "v1", "key2"])  # noqa
>>> df = df.agg(
...     sf.tuple_sketch_agg_integer("key1", "v1").alias("sketch1"),
...     sf.theta_sketch_agg("key2").alias("sketch2")
... )
>>> df.select(sf.tuple_sketch_estimate_integer(sf.tuple_intersection_theta_integer(df.sketch1, "sketch2"))).show()  # noqa
+--------------------------------------------------------------------------------------+
|tuple_sketch_estimate_integer(tuple_intersection_theta_integer(sketch1, sketch2, sum))|
+--------------------------------------------------------------------------------------+
|                                                                                   2.0|
+--------------------------------------------------------------------------------------+