public abstract class OffsetCorrector extends Object
Modifier and Type | Field and Description |
---|---|
protected String |
docText
Document text.
|
protected com.carrotsearch.hppc.IntArrayList |
nonTaggableOffsets
Disjoint start and end span offsets (inclusive) of non-taggable sections.
|
protected int[] |
offsetPair |
protected com.carrotsearch.hppc.IntArrayList |
parentChangeIds
tag id; parallel array to parentChangeOffsets
|
protected com.carrotsearch.hppc.IntArrayList |
parentChangeOffsets
offsets of parent tag id change (ascending order)
|
protected com.carrotsearch.hppc.IntArrayList |
tagInfo
Array of tag info comprised of 5 int fields:
[int parentTag, int openStartOff, int openEndOff, int closeStartOff, int closeEndOff].
|
Modifier | Constructor and Description |
---|---|
protected |
OffsetCorrector(String docText,
boolean hasNonTaggable)
Initialize based on the document text.
|
Modifier and Type | Method and Description |
---|---|
protected int |
correctEndOffsetForCloseElement(int endOffset)
Correct endOffset for adjacent element at the right side.
|
int[] |
correctPair(int leftOffset,
int rightOffset)
Corrects the start and end offset pair.
|
protected int |
getCloseEndOff(int tag) |
protected int |
getCloseStartOff(int tag) |
protected int |
getOpenEndOff(int tag) |
protected int |
getOpenStartOff(int tag) |
protected int |
getParentTag(int tag) |
protected boolean |
hasNonWhitespace(int start,
int end) |
protected int |
lookupTag(int off) |
protected boolean |
spansNonTaggable(int startOff,
int endOff) |
protected boolean |
tagEnclosesOffset(int tag,
int off) |
protected final String docText
protected final com.carrotsearch.hppc.IntArrayList tagInfo
protected final com.carrotsearch.hppc.IntArrayList parentChangeOffsets
protected final com.carrotsearch.hppc.IntArrayList parentChangeIds
protected final int[] offsetPair
protected final com.carrotsearch.hppc.IntArrayList nonTaggableOffsets
protected OffsetCorrector(String docText, boolean hasNonTaggable)
docText
- non-null structured content.hasNonTaggable
- if there may be "non-taggable" tags to trackpublic int[] correctPair(int leftOffset, int rightOffset)
Note that the returned array is internally reused; just use it to examine the response.
protected int correctEndOffsetForCloseElement(int endOffset)
foo</tag>and this method pulls the end offset left to the '<'. This is necessary for use with
HTMLStripCharFilter
.
See https://issues.apache.org/jira/browse/LUCENE-5734protected boolean hasNonWhitespace(int start, int end)
protected boolean tagEnclosesOffset(int tag, int off)
protected int getParentTag(int tag)
protected int getOpenStartOff(int tag)
protected int getOpenEndOff(int tag)
protected int getCloseStartOff(int tag)
protected int getCloseEndOff(int tag)
protected int lookupTag(int off)
protected boolean spansNonTaggable(int startOff, int endOff)
Copyright © 2000-2019 Apache Software Foundation. All Rights Reserved.