public class DFRSimilarityFactory extends SimilarityFactory
DFRSimilarity
You must specify the implementations for all three components of DFR (strings). In general the models are parameter-free, but two of the normalizations take floating point parameters (see below):
basicModel
: Basic model of information content:
Be
: Limiting form of Bose-Einstein
G
: Geometric approximation of Bose-Einstein
P
: Poisson approximation of the Binomial
D
: Divergence approximation of the Binomial
I(n)
: Inverse document frequency
I(ne)
: Inverse expected document
frequency [mixture of Poisson and IDF]
I(F)
: Inverse term frequency
[approximation of I(ne)]
afterEffect
: First normalization of information
gain:
normalization
: Second (length) normalization:
H1
: Uniform distribution of term
frequency
1
H2
: term frequency density inversely
related to length
1
H3
: term frequency normalization
provided by Dirichlet prior
800
Z
: term frequency normalization provided
by a Zipfian relation
A/(A+1)
where A measures the specificity of the language.
The default is 0.3
none
: no second normalization
Optional settings:
SimilarityBase.setDiscountOverlaps(boolean)
CLASS_NAME, params
Constructor and Description |
---|
DFRSimilarityFactory() |
Modifier and Type | Method and Description |
---|---|
Similarity |
getSimilarity() |
void |
init(SolrParams params) |
getClassArg, getNamedPropertyValues, getParams
public void init(SolrParams params)
init
in class SimilarityFactory
public Similarity getSimilarity()
getSimilarity
in class SimilarityFactory
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.