java.lang.Object
- org.apache.solr.uninverting.DocTermOrds

All Implemented Interfaces:

org.apache.lucene.util.Accountable

Direct Known Subclasses:

UnInvertedField
```
public class DocTermOrds
extends Object
implements org.apache.lucene.util.Accountable
```
This class enables fast access to multiple term ords for a specified field across all docIDs. Like FieldCache, it uninverts the index and holds a packed data structure in RAM to enable fast access. Unlike FieldCache, it can handle multi-valued fields, and, it does not hold the term bytes in RAM. Rather, you must obtain a TermsEnum from the getOrdTermsEnum(org.apache.lucene.index.LeafReader) method, and then seek-by-ord to get the term's bytes. While normally term ords are type long, in this API they are int as the internal representation here cannot address more than MAX_INT unique terms. Also, typically this class is used on fields with relatively few unique terms vs the number of documents. A previous internal limit (16 MB) on how many bytes each chunk of documents may consume has been increased to 2 GB. Deleted documents are skipped during uninversion, and if you look them up you'll get 0 ords. The returned per-document ords do not retain their original order in the document. Instead they are returned in sorted (by ord, ie term's BytesRef comparator) order. They are also de-dup'd (ie if doc has same term more than once in this field, you'll only get that ord back once). This class will create its own term index internally, allowing to create a wrapped TermsEnum that can handle ord. The getOrdTermsEnum(org.apache.lucene.index.LeafReader) method then provides this wrapped enum. The RAM consumption of this class can be high!

WARNING: This API is experimental and might change in incompatible ways in the next release.

Field Summary

Fields
Modifier and Type	Field	Description
`protected boolean`	`checkForDocValues`	If true, check and throw an exception if the field has docValues enabled.
`static int`	`DEFAULT_INDEX_INTERVAL_BITS`	Every 128th term is indexed, by default.
`protected String`	`field`	Field we are uninverting.
`protected int[]`	`index`	Holds the per-document ords or a pointer to the ords.
`protected org.apache.lucene.util.BytesRef[]`	`indexedTermsArray`	Holds the indexed (by default every 128th) terms.
`protected int`	`maxTermDocFreq`	Don't uninvert terms that exceed this count.
`protected int`	`numTermsInField`	Number of terms in the field.
`protected int`	`ordBase`	Ordinal of the first term in the field, or 0 if the `PostingsFormat` does not implement `TermsEnum.ord()`.
`protected int`	`phase1_time`	Time for phase1 of the uninvert process.
`protected org.apache.lucene.index.PostingsEnum`	`postingsEnum`	Used while uninverting.
`protected org.apache.lucene.util.BytesRef`	`prefix`	If non-null, only terms matching this prefix were indexed.
`protected long`	`sizeOfIndexedStrings`	Total bytes (sum of term lengths) for all indexed terms.
`protected long`	`termInstances`	Total number of references to term numbers.
`protected byte[][]`	`tnums`	Holds term ords for documents.
`protected int`	`total_time`	Total time to uninvert the field.

Constructor Summary

Constructors
Modifier	Constructor	Description
`protected`	`DocTermOrds(String field, int maxTermDocFreq, int indexIntervalBits)`	Subclass inits w/ this, but be sure you then call uninvert, only once
	`DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field)`	Inverts all terms.
	`DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix)`	Inverts only terms starting w/ prefix
	`DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix, int maxTermDocFreq)`	Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq
	`DocTermOrds(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, String field, org.apache.lucene.util.BytesRef termPrefix, int maxTermDocFreq, int indexIntervalBits)`	Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq, with a custom indexing interval (default is every 128nd term).

Method Summary

All Methods Instance Methods Concrete Methods
Modifier and Type	Method	Description
`org.apache.lucene.index.TermsEnum`	`getOrdTermsEnum(org.apache.lucene.index.LeafReader reader)`	Returns a TermsEnum that implements ord, or null if no terms in field.
`boolean`	`isEmpty()`	Returns `true` if no terms were indexed.
`org.apache.lucene.index.SortedSetDocValues`	`iterator(org.apache.lucene.index.LeafReader reader)`	Returns a SortedSetDocValues view of this instance
`org.apache.lucene.util.BytesRef`	`lookupTerm(org.apache.lucene.index.TermsEnum termsEnum, int ord)`	Returns the term (`BytesRef`) corresponding to the provided ordinal.
`int`	`numTerms()`	Returns the number of terms in this field
`long`	`ramBytesUsed()`	Returns total bytes used.
`protected void`	`setActualDocFreq(int termNum, int df)`	Invoked during `uninvert(org.apache.lucene.index.LeafReader,Bits,BytesRef)` to record the document frequency for each uninverted term.
`protected void`	`uninvert(org.apache.lucene.index.LeafReader reader, org.apache.lucene.util.Bits liveDocs, org.apache.lucene.util.BytesRef termPrefix)`	Call this only once (if you subclass!)
`protected void`	`visitTerm(org.apache.lucene.index.TermsEnum te, int termNum)`	Subclass can override this

Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait

Methods inherited from interface org.apache.lucene.util.Accountable
getChildResources

Field Detail
- DEFAULT_INDEX_INTERVAL_BITS
```
public static final int DEFAULT_INDEX_INTERVAL_BITS
```
  Every 128th term is indexed, by default.
  
  See Also:
  
  Constant Field Values
- maxTermDocFreq
```
protected final int maxTermDocFreq
```
  Don't uninvert terms that exceed this count.
- field
```
protected final String field
```
  Field we are uninverting.
- numTermsInField
```
protected int numTermsInField
```
  Number of terms in the field.
- termInstances
```
protected long termInstances
```
  Total number of references to term numbers.
- total_time
```
protected int total_time
```
  Total time to uninvert the field.
- phase1_time
```
protected int phase1_time
```
  Time for phase1 of the uninvert process.
- index
```
protected int[] index
```
  Holds the per-document ords or a pointer to the ords.
- tnums
```
protected byte[][] tnums
```
  Holds term ords for documents.
- sizeOfIndexedStrings
```
protected long sizeOfIndexedStrings
```
  Total bytes (sum of term lengths) for all indexed terms.
- indexedTermsArray
```
protected org.apache.lucene.util.BytesRef[] indexedTermsArray
```
  Holds the indexed (by default every 128th) terms.
- prefix
```
protected org.apache.lucene.util.BytesRef prefix
```
  If non-null, only terms matching this prefix were indexed.
- ordBase
```
protected int ordBase
```
  Ordinal of the first term in the field, or 0 if the PostingsFormat does not implement TermsEnum.ord().
- postingsEnum
```
protected org.apache.lucene.index.PostingsEnum postingsEnum
```
  Used while uninverting.
- checkForDocValues
```
protected boolean checkForDocValues
```
  If true, check and throw an exception if the field has docValues enabled. Normally, docValues should be used in preference to DocTermOrds.

Constructor Detail

DocTermOrds

public DocTermOrds(org.apache.lucene.index.LeafReader reader,
                   org.apache.lucene.util.Bits liveDocs,
                   String field)
            throws IOException

Inverts all terms.

Throws:: IOException

DocTermOrds

public DocTermOrds(org.apache.lucene.index.LeafReader reader,
                   org.apache.lucene.util.Bits liveDocs,
                   String field,
                   org.apache.lucene.util.BytesRef termPrefix)
            throws IOException

Inverts only terms starting w/ prefix

Throws:: IOException

DocTermOrds

public DocTermOrds(org.apache.lucene.index.LeafReader reader,
                   org.apache.lucene.util.Bits liveDocs,
                   String field,
                   org.apache.lucene.util.BytesRef termPrefix,
                   int maxTermDocFreq)
            throws IOException

Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq

Throws:: IOException

DocTermOrds

public DocTermOrds(org.apache.lucene.index.LeafReader reader,
                   org.apache.lucene.util.Bits liveDocs,
                   String field,
                   org.apache.lucene.util.BytesRef termPrefix,
                   int maxTermDocFreq,
                   int indexIntervalBits)
            throws IOException

Inverts only terms starting w/ prefix, and only terms whose docFreq (not taking deletions into account) is <= maxTermDocFreq, with a custom indexing interval (default is every 128nd term).

Throws:: IOException

DocTermOrds

protected DocTermOrds(String field,
                      int maxTermDocFreq,
                      int indexIntervalBits)

Subclass inits w/ this, but be sure you then call uninvert, only once

Method Detail

ramBytesUsed
```
public long ramBytesUsed()
```
Returns total bytes used.

Specified by:

ramBytesUsed in interface org.apache.lucene.util.Accountable

getOrdTermsEnum
```
public org.apache.lucene.index.TermsEnum getOrdTermsEnum(org.apache.lucene.index.LeafReader reader)
                                                  throws IOException
```
Returns a TermsEnum that implements ord, or null if no terms in field.
we build a "private" terms index internally (WARNING: consumes RAM) and use that index to implement ord. This also enables ord on top of a composite reader. The returned TermsEnum is unpositioned. This returns null if there are no terms.

NOTE: you must pass the same reader that was used when creating this class

Throws:

IOException

numTerms
```
public int numTerms()
```
Returns the number of terms in this field

isEmpty
```
public boolean isEmpty()
```
Returns true if no terms were indexed.

visitTerm

protected void visitTerm(org.apache.lucene.index.TermsEnum te,
                         int termNum)
                  throws IOException

Subclass can override this

Throws:: IOException

setActualDocFreq
```
protected void setActualDocFreq(int termNum,
                                int df)
                         throws IOException
```
Invoked during uninvert(org.apache.lucene.index.LeafReader,Bits,BytesRef) to record the document frequency for each uninverted term.

Throws:

IOException

uninvert

protected void uninvert(org.apache.lucene.index.LeafReader reader,
                        org.apache.lucene.util.Bits liveDocs,
                        org.apache.lucene.util.BytesRef termPrefix)
                 throws IOException

Call this only once (if you subclass!)

Throws:: IOException

lookupTerm

public org.apache.lucene.util.BytesRef lookupTerm(org.apache.lucene.index.TermsEnum termsEnum,
                                                  int ord)
                                           throws IOException

Returns the term (BytesRef) corresponding to the provided ordinal.

Throws:: IOException

iterator

public org.apache.lucene.index.SortedSetDocValues iterator(org.apache.lucene.index.LeafReader reader)
                                                    throws IOException

Returns a SortedSetDocValues view of this instance

Throws:: IOException

Class DocTermOrds

Field Summary

Constructor Summary

Method Summary

Methods inherited from class java.lang.Object

Methods inherited from interface org.apache.lucene.util.Accountable

Field Detail

DEFAULT_INDEX_INTERVAL_BITS

maxTermDocFreq

field

numTermsInField

termInstances

total_time

phase1_time

index

tnums

sizeOfIndexedStrings

indexedTermsArray

prefix

ordBase

postingsEnum

checkForDocValues

Constructor Detail

DocTermOrds

DocTermOrds

DocTermOrds

DocTermOrds

DocTermOrds

Method Detail

ramBytesUsed

getOrdTermsEnum

numTerms

isEmpty

visitTerm

setActualDocFreq

uninvert

lookupTerm

iterator