Class DocTermOrds
- java.lang.Object
- 
- org.apache.solr.uninverting.DocTermOrds
 
- 
- All Implemented Interfaces:
- org.apache.lucene.util.Accountable
 - Direct Known Subclasses:
- UnInvertedField
 
 public class DocTermOrds extends Object implements org.apache.lucene.util.Accountable This class enables fast access to multiple term ords for a specified field across all docIDs.Like FieldCache, it uninverts the index and holds a packed data structure in RAM to enable fast access. Unlike FieldCache, it can handle multi-valued fields, and, it does not hold the term bytes in RAM. Rather, you must obtain a TermsEnum from the getOrdTermsEnum(org.apache.lucene.index.LeafReader)method, and then seek-by-ord to get the term's bytes.While normally term ords are type long, in this API they are int as the internal representation here cannot address more than MAX_INT unique terms. Also, typically this class is used on fields with relatively few unique terms vs the number of documents. A previous internal limit (16 MB) on how many bytes each chunk of documents may consume has been increased to 2 GB. Deleted documents are skipped during uninversion, and if you look them up you'll get 0 ords. The returned per-document ords do not retain their original order in the document. Instead they are returned in sorted (by ord, ie term's BytesRef comparator) order. They are also de-dup'd (ie if doc has same term more than once in this field, you'll only get that ord back once). This class will create its own term index internally, allowing to create a wrapped TermsEnum that can handle ord. The getOrdTermsEnum(org.apache.lucene.index.LeafReader)method then provides this wrapped enum.The RAM consumption of this class can be high! - WARNING: This API is experimental and might change in incompatible ways in the next release.
 
- 
- 
Field SummaryFields Modifier and Type Field Description protected booleancheckForDocValuesIf true, check and throw an exception if the field has docValues enabled.static intDEFAULT_INDEX_INTERVAL_BITSEvery 128th term is indexed, by default.protected StringfieldField we are uninverting.protected int[]indexHolds the per-document ords or a pointer to the ords.protected org.apache.lucene.util.BytesRef[]indexedTermsArrayHolds the indexed (by default every 128th) terms.protected intmaxTermDocFreqDon't uninvert terms that exceed this count.protected intnumTermsInFieldNumber of terms in the field.protected intordBaseOrdinal of the first term in the field, or 0 if thePostingsFormatdoes not implementTermsEnum.ord().protected intphase1_timeTime for phase1 of the uninvert process.protected org.apache.lucene.index.PostingsEnumpostingsEnumUsed while uninverting.protected org.apache.lucene.util.BytesRefprefixIf non-null, only terms matching this prefix were indexed.protected longsizeOfIndexedStringsTotal bytes (sum of term lengths) for all indexed terms.protected longtermInstancesTotal number of references to term numbers.protected byte[][]tnumsHolds term ords for documents.protected inttotal_timeTotal time to uninvert the field.
 - 
Constructor SummaryConstructors Modifier Constructor Description protectedDocTermOrds(String field, int maxTermDocFreq, int indexIntervalBits)Subclass inits w/ this, but be sure you then call uninvert, only onceDocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field)Inverts all terms.DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix)Inverts only terms starting w/ prefixDocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix, int maxTermDocFreq)Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreqDocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix, int maxTermDocFreq, int indexIntervalBits)Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq, with a custom indexing interval (default is every 128nd term).
 - 
Method SummaryAll Methods Instance Methods Concrete Methods Modifier and Type Method Description org.apache.lucene.index.TermsEnumgetOrdTermsEnum(org.apache.lucene.index.LeafReader reader)Returns a TermsEnum that implements ord, or null if no terms in field.booleanisEmpty()Returnstrueif no terms were indexed.org.apache.lucene.index.SortedSetDocValuesiterator(org.apache.lucene.index.LeafReader reader)Returns a SortedSetDocValues view of this instanceorg.apache.lucene.util.BytesReflookupTerm(org.apache.lucene.index.TermsEnum termsEnum, int ord)Returns the term (BytesRef) corresponding to the provided ordinal.intnumTerms()Returns the number of terms in this fieldlongramBytesUsed()Returns total bytes used.protected voidsetActualDocFreq(int termNum, int df)Invoked duringuninvert(org.apache.lucene.index.LeafReader,Bits,BytesRef)to record the document frequency for each uninverted term.protected voiduninvert(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, org.apache.lucene.util.BytesRef termPrefix)Call this only once (if you subclass!)protected voidvisitTerm(org.apache.lucene.index.TermsEnum te, int termNum)Subclass can override this
 
- 
- 
- 
Field Detail- 
DEFAULT_INDEX_INTERVAL_BITSpublic static final int DEFAULT_INDEX_INTERVAL_BITS Every 128th term is indexed, by default.- See Also:
- Constant Field Values
 
 - 
maxTermDocFreqprotected final int maxTermDocFreq Don't uninvert terms that exceed this count.
 - 
fieldprotected final String field Field we are uninverting.
 - 
numTermsInFieldprotected int numTermsInField Number of terms in the field.
 - 
termInstancesprotected long termInstances Total number of references to term numbers.
 - 
total_timeprotected int total_time Total time to uninvert the field.
 - 
phase1_timeprotected int phase1_time Time for phase1 of the uninvert process.
 - 
indexprotected int[] index Holds the per-document ords or a pointer to the ords.
 - 
tnumsprotected byte[][] tnums Holds term ords for documents.
 - 
sizeOfIndexedStringsprotected long sizeOfIndexedStrings Total bytes (sum of term lengths) for all indexed terms.
 - 
indexedTermsArrayprotected org.apache.lucene.util.BytesRef[] indexedTermsArray Holds the indexed (by default every 128th) terms.
 - 
prefixprotected org.apache.lucene.util.BytesRef prefix If non-null, only terms matching this prefix were indexed.
 - 
ordBaseprotected int ordBase Ordinal of the first term in the field, or 0 if thePostingsFormatdoes not implementTermsEnum.ord().
 - 
postingsEnumprotected org.apache.lucene.index.PostingsEnum postingsEnum Used while uninverting.
 - 
checkForDocValuesprotected boolean checkForDocValues If true, check and throw an exception if the field has docValues enabled. Normally, docValues should be used in preference to DocTermOrds.
 
- 
 - 
Constructor Detail- 
DocTermOrdspublic DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field) throws IOExceptionInverts all terms.- Throws:
- IOException
 
 - 
DocTermOrdspublic DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix) throws IOExceptionInverts only terms starting w/ prefix- Throws:
- IOException
 
 - 
DocTermOrdspublic DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix, int maxTermDocFreq) throws IOExceptionInverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq- Throws:
- IOException
 
 - 
DocTermOrdspublic DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix, int maxTermDocFreq, int indexIntervalBits) throws IOExceptionInverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq, with a custom indexing interval (default is every 128nd term).- Throws:
- IOException
 
 - 
DocTermOrdsprotected DocTermOrds(String field, int maxTermDocFreq, int indexIntervalBits) Subclass inits w/ this, but be sure you then call uninvert, only once
 
- 
 - 
Method Detail- 
ramBytesUsedpublic long ramBytesUsed() Returns total bytes used.- Specified by:
- ramBytesUsedin interface- org.apache.lucene.util.Accountable
 
 - 
getOrdTermsEnumpublic org.apache.lucene.index.TermsEnum getOrdTermsEnum(org.apache.lucene.index.LeafReader reader) throws IOExceptionReturns a TermsEnum that implements ord, or null if no terms in field.we build a "private" terms index internally (WARNING: consumes RAM) and use that index to implement ord. This also enables ord on top of a composite reader. The returned TermsEnum is unpositioned. This returns null if there are no terms. NOTE: you must pass the same reader that was used when creating this class - Throws:
- IOException
 
 - 
numTermspublic int numTerms() Returns the number of terms in this field
 - 
isEmptypublic boolean isEmpty() Returnstrueif no terms were indexed.
 - 
visitTermprotected void visitTerm(org.apache.lucene.index.TermsEnum te, int termNum) throws IOExceptionSubclass can override this- Throws:
- IOException
 
 - 
setActualDocFreqprotected void setActualDocFreq(int termNum, int df) throws IOExceptionInvoked duringuninvert(org.apache.lucene.index.LeafReader,Bits,BytesRef)to record the document frequency for each uninverted term.- Throws:
- IOException
 
 - 
uninvertprotected void uninvert(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, org.apache.lucene.util.BytesRef termPrefix) throws IOExceptionCall this only once (if you subclass!)- Throws:
- IOException
 
 - 
lookupTermpublic org.apache.lucene.util.BytesRef lookupTerm(org.apache.lucene.index.TermsEnum termsEnum, int ord) throws IOExceptionReturns the term (BytesRef) corresponding to the provided ordinal.- Throws:
- IOException
 
 - 
iteratorpublic org.apache.lucene.index.SortedSetDocValues iterator(org.apache.lucene.index.LeafReader reader) throws IOExceptionReturns a SortedSetDocValues view of this instance- Throws:
- IOException
 
 
- 
 
-