public class TextProfileSignature extends MD5Signature
This implementation is copied from Apache Nutch.
An implementation of a page signature. It calculates an MD5 hash of a plain text "profile" of a page.
The algorithm to calculate a page "profile" takes the plain text version of a page and performs the following steps:
QUANT = QUANT_RATE * maxFreq
, where QUANT_RATE
is 0.01f
by default, and maxFreq
is the maximum token frequency). If
maxFreq
is higher than 1, then QUANT is always higher than 2 (which
means that tokens with frequency 1 are always discarded).log
Constructor and Description |
---|
TextProfileSignature() |
public void init(SolrParams params)
public byte[] getSignature()
getSignature
in class MD5Signature
public void add(String content)
add
in class MD5Signature
Copyright © 2000-2014 Apache Software Foundation. All Rights Reserved.