Package org.apache.solr.update.processor
Class URLClassifyProcessor
- java.lang.Object
-
- org.apache.solr.update.processor.UpdateRequestProcessor
-
- org.apache.solr.update.processor.URLClassifyProcessor
-
- All Implemented Interfaces:
Closeable
,AutoCloseable
public class URLClassifyProcessor extends UpdateRequestProcessor
Update processor which examines a URL and outputs to various other fields characteristics of that URL, including length, number of path levels, whether it is a top level URL (levels==0), whether it looks like a landing/index page, a canonical representation of the URL (e.g. stripping index.html), the domain and path parts of the URL etc.This processor is intended used in connection with processing web resources, and helping to produce values which may be used for boosting or filtering later.
-
-
Field Summary
-
Fields inherited from class org.apache.solr.update.processor.UpdateRequestProcessor
next
-
-
Constructor Summary
Constructors Constructor Description URLClassifyProcessor(SolrParams parameters, SolrQueryRequest request, SolrQueryResponse response, UpdateRequestProcessor nextProcessor)
-
Method Summary
All Methods Instance Methods Concrete Methods Modifier and Type Method Description URL
getCanonicalUrl(URL url)
Gets a canonical form of the URL for use as main URLURL
getNormalizedURL(String url)
boolean
isEnabled()
boolean
isLandingPage(URL url)
Calculates whether the URL is a landing page or notboolean
isTopLevelPage(URL url)
Calculates whether a URL is a top level pageint
length(URL url)
Calculates the length of the URL in charactersint
levels(URL url)
Calculates the number of path levels in the given URLvoid
processAdd(AddUpdateCommand command)
void
setEnabled(boolean enabled)
-
Methods inherited from class org.apache.solr.update.processor.UpdateRequestProcessor
close, doClose, finish, processCommit, processDelete, processMergeIndexes, processRollback
-
-
-
-
Constructor Detail
-
URLClassifyProcessor
public URLClassifyProcessor(SolrParams parameters, SolrQueryRequest request, SolrQueryResponse response, UpdateRequestProcessor nextProcessor)
-
-
Method Detail
-
processAdd
public void processAdd(AddUpdateCommand command) throws IOException
- Overrides:
processAdd
in classUpdateRequestProcessor
- Throws:
IOException
-
getCanonicalUrl
public URL getCanonicalUrl(URL url)
Gets a canonical form of the URL for use as main URL- Parameters:
url
- The input url- Returns:
- The URL object representing the canonical URL
-
length
public int length(URL url)
Calculates the length of the URL in characters- Parameters:
url
- The input URL- Returns:
- the length of the URL
-
levels
public int levels(URL url)
Calculates the number of path levels in the given URL- Parameters:
url
- The input URL- Returns:
- the number of levels, where a top-level URL is 0
-
isTopLevelPage
public boolean isTopLevelPage(URL url)
Calculates whether a URL is a top level page- Parameters:
url
- The input URL- Returns:
- true if page is a top level page
-
isLandingPage
public boolean isLandingPage(URL url)
Calculates whether the URL is a landing page or not- Parameters:
url
- The input URL- Returns:
- true if URL represents a landing page (index page)
-
getNormalizedURL
public URL getNormalizedURL(String url) throws MalformedURLException, URISyntaxException
-
isEnabled
public boolean isEnabled()
-
setEnabled
public void setEnabled(boolean enabled)
-
-