Class ExtractingDocumentLoader.MostlyPassthroughHtmlMapper
- java.lang.Object
-
- org.apache.solr.handler.extraction.ExtractingDocumentLoader.MostlyPassthroughHtmlMapper
-
- All Implemented Interfaces:
org.apache.tika.parser.html.HtmlMapper
- Enclosing class:
- ExtractingDocumentLoader
public static class ExtractingDocumentLoader.MostlyPassthroughHtmlMapper extends Object implements org.apache.tika.parser.html.HtmlMapper
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.tika.parser.html.HtmlMapper
INSTANCE
-
Constructor Summary
Constructors Constructor Description MostlyPassthroughHtmlMapper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description boolean
isDiscardElement(String name)
Keep all elements and their content.String
mapSafeAttribute(String elementName, String attributeName)
Lowercases the attribute nameString
mapSafeElement(String name)
Lowercases the element name, but returns null for <BR>, which suppresses the start-element event for lt;BR> tags.
-
-
-
Method Detail
-
isDiscardElement
public boolean isDiscardElement(String name)
Keep all elements and their content.Apparently <SCRIPT> and <STYLE> elements are blocked elsewhere
- Specified by:
isDiscardElement
in interfaceorg.apache.tika.parser.html.HtmlMapper
-
mapSafeAttribute
public String mapSafeAttribute(String elementName, String attributeName)
Lowercases the attribute name- Specified by:
mapSafeAttribute
in interfaceorg.apache.tika.parser.html.HtmlMapper
-
mapSafeElement
public String mapSafeElement(String name)
Lowercases the element name, but returns null for <BR>, which suppresses the start-element event for lt;BR> tags. This also suppresses the <BODY> tags because those are handled internally by Tika's XHTMLContentHandler.- Specified by:
mapSafeElement
in interfaceorg.apache.tika.parser.html.HtmlMapper
-
-