Class ExtractingDocumentLoader.MostlyPassthroughHtmlMapper
- java.lang.Object
-
- org.apache.solr.handler.extraction.ExtractingDocumentLoader.MostlyPassthroughHtmlMapper
-
- All Implemented Interfaces:
org.apache.tika.parser.html.HtmlMapper
- Enclosing class:
- ExtractingDocumentLoader
public static class ExtractingDocumentLoader.MostlyPassthroughHtmlMapper extends Object implements org.apache.tika.parser.html.HtmlMapper
-
-
Field Summary
Fields Modifier and Type Field Description static org.apache.tika.parser.html.HtmlMapperINSTANCE
-
Constructor Summary
Constructors Constructor Description MostlyPassthroughHtmlMapper()
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description booleanisDiscardElement(String name)Keep all elements and their content.StringmapSafeAttribute(String elementName, String attributeName)Lowercases the attribute nameStringmapSafeElement(String name)Lowercases the element name, but returns null for <BR>, which suppresses the start-element event for lt;BR> tags.
-
-
-
Method Detail
-
isDiscardElement
public boolean isDiscardElement(String name)
Keep all elements and their content.Apparently <SCRIPT> and <STYLE> elements are blocked elsewhere
- Specified by:
isDiscardElementin interfaceorg.apache.tika.parser.html.HtmlMapper
-
mapSafeAttribute
public String mapSafeAttribute(String elementName, String attributeName)
Lowercases the attribute name- Specified by:
mapSafeAttributein interfaceorg.apache.tika.parser.html.HtmlMapper
-
mapSafeElement
public String mapSafeElement(String name)
Lowercases the element name, but returns null for <BR>, which suppresses the start-element event for lt;BR> tags. This also suppresses the <BODY> tags because those are handled internally by Tika's XHTMLContentHandler.- Specified by:
mapSafeElementin interfaceorg.apache.tika.parser.html.HtmlMapper
-
-