public class HLL extends Object implements Cloneable
long
elements. Useful for computing
the approximate cardinality of a stream of data in very small storage.
A modified version of the
'HyperLogLog' data structure and algorithm is used, which combines both
probabilistic and nonprobabilistic techniques to improve the accuracy and
storage requirements of the original algorithm.
More specifically, initializing and storing a new HLL
will
allocate a sentinel value symbolizing the empty set (HLLType.EMPTY
).
After adding the first few values, a sorted list of unique integers is
stored in a HLLType.EXPLICIT
hash set. When configured, accuracy can
be sacrificed for memory footprint: the values in the sorted list are
"promoted" to a "HLLType.SPARSE
" mapbased HyperLogLog structure.
Finally, when enough registers are set, the mapbased HLL will be converted
to a bitpacked "HLLType.FULL
" HyperLogLog structure.
This data structure is interoperable with the implementations found at:
Modifier and Type  Field and Description 

static int 
MAXIMUM_EXPLICIT_THRESHOLD 
static int 
MAXIMUM_EXPTHRESH_PARAM 
static int 
MAXIMUM_LOG2M_PARAM 
static int 
MAXIMUM_REGWIDTH_PARAM 
static int 
MINIMUM_EXPTHRESH_PARAM 
static int 
MINIMUM_LOG2M_PARAM 
static int 
MINIMUM_REGWIDTH_PARAM 
Constructor and Description 

HLL(int log2m,
int regwidth)
Construct an empty HLL with the given
log2m and regwidth . 
HLL(int log2m,
int regwidth,
int expthresh,
boolean sparseon,
HLLType type)
NOTE: Arguments here are named and structured identically to those in the
PostgreSQL implementation, which can be found
here.

Modifier and Type  Method and Description 

void 
addRaw(long rawValue)
Adds
rawValue directly to the HLL. 
long 
cardinality()
Computes the cardinality of the HLL.

void 
clear()
Clears the HLL.

HLL 
clone()
Create a deep copy of this HLL.

static HLL 
fromBytes(byte[] bytes)
Deserializes the HLL (in
toBytes(ISchemaVersion) format) serialized
into bytes . 
HLLType 
getType() 
byte[] 
toBytes()
Serializes the HLL to an array of bytes in correspondence with the format
of the default schema version,
SerializationUtil.DEFAULT_SCHEMA_VERSION . 
byte[] 
toBytes(ISchemaVersion schemaVersion)
Serializes the HLL to an array of bytes in correspondence with the format
of the specified schema version.

void 
union(HLL other)
Computes the union of HLLs and stores the result in this instance.

public static final int MINIMUM_LOG2M_PARAM
public static final int MAXIMUM_LOG2M_PARAM
public static final int MINIMUM_REGWIDTH_PARAM
public static final int MAXIMUM_REGWIDTH_PARAM
public static final int MINIMUM_EXPTHRESH_PARAM
public static final int MAXIMUM_EXPTHRESH_PARAM
public static final int MAXIMUM_EXPLICIT_THRESHOLD
public HLL(int log2m, int regwidth, int expthresh, boolean sparseon, HLLType type)
log2m
 logbase2 of the number of registers used in the HyperLogLog
algorithm. Must be at least 4 and at most 30.regwidth
 number of bits used per register in the HyperLogLog
algorithm. Must be at least 1 and at most 8.expthresh
 tunes when the HLLType.EXPLICIT
to
HLLType.SPARSE
promotion occurs,
based on the set's cardinality. Must be at least 1 and at most 18.sparseon
 Flag indicating if the HLLType.SPARSE
representation should be used.type
 the type in the promotion hierarchy which this instance should
start at. This cannot be null
.public HLL(int log2m, int regwidth)
log2m
and regwidth
.
This is equivalent to calling HLL(log2m, regwidth, 1, true, HLLType.EMPTY)
.log2m
 logbase2 of the number of registers used in the HyperLogLog
algorithm. Must be at least 4 and at most 30.regwidth
 number of bits used per register in the HyperLogLog
algorithm. Must be at least 1 and at most 8.HLL(int, int, int, boolean, HLLType)
public HLLType getType()
null
.public void addRaw(long rawValue)
rawValue
directly to the HLL.rawValue
 the value to be added. It is very important that this
value already be hashed with a strong (but not
necessarily cryptographic) hash function. For instance, the
Murmur3 implementation in
Google's Guava library is an excellent hash function for this
purpose and, for seeds greater than zero, matches the output
of the hash provided in the PostgreSQL implementation.public long cardinality()
public void clear()
addRaw(long)
, clear
does NOT handle
transitions between HLLType
s  a probabilistic type will remain
probabilistic after being cleared.public void union(HLL other)
other
 the other HLL
instance to union into this one. This
cannot be null
.public byte[] toBytes()
SerializationUtil.DEFAULT_SCHEMA_VERSION
.null
or empty.public byte[] toBytes(ISchemaVersion schemaVersion)
schemaVersion
 the schema version dictating the serialization formatnull
or empty.public static HLL fromBytes(byte[] bytes)
toBytes(ISchemaVersion)
format) serialized
into bytes
.bytes
 the serialized bytes of new HLLnull
.toBytes(ISchemaVersion)
public HLL clone() throws CloneNotSupportedException
clone
in class Object
CloneNotSupportedException
Object.clone()
Copyright © 20002017 Apache Software Foundation. All Rights Reserved.