|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |
java.lang.Object org.apache.solr.update.processor.UpdateRequestProcessor org.apache.solr.update.processor.URLClassifyProcessor
public class URLClassifyProcessor
Update processor which examines a URL and outputs to various other fields characteristics of that URL, including length, number of path levels, whether it is a top level URL (levels==0), whether it looks like a landing/index page, a canonical representation of the URL (e.g. stripping index.html), the domain and path parts of the URL etc.
This processor is intended used in connection with processing web resources, and helping to produce values which may be used for boosting or filtering later.
Field Summary |
---|
Fields inherited from class org.apache.solr.update.processor.UpdateRequestProcessor |
---|
next |
Constructor Summary | |
---|---|
URLClassifyProcessor(SolrParams parameters,
SolrQueryRequest request,
SolrQueryResponse response,
UpdateRequestProcessor nextProcessor)
|
Method Summary | |
---|---|
URL |
getCanonicalUrl(URL url)
Gets a canonical form of the URL for use as main URL |
URL |
getNormalizedURL(String url)
|
boolean |
isEnabled()
|
boolean |
isLandingPage(URL url)
Calculates whether the URL is a landing page or not |
boolean |
isTopLevelPage(URL url)
Calculates whether a URL is a top level page |
int |
length(URL url)
Calculates the length of the URL in characters |
int |
levels(URL url)
Calculates the number of path levels in the given URL |
void |
processAdd(AddUpdateCommand command)
|
void |
setEnabled(boolean enabled)
|
Methods inherited from class org.apache.solr.update.processor.UpdateRequestProcessor |
---|
finish, processCommit, processDelete, processMergeIndexes, processRollback |
Methods inherited from class java.lang.Object |
---|
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait |
Constructor Detail |
---|
public URLClassifyProcessor(SolrParams parameters, SolrQueryRequest request, SolrQueryResponse response, UpdateRequestProcessor nextProcessor)
Method Detail |
---|
public void processAdd(AddUpdateCommand command) throws IOException
processAdd
in class UpdateRequestProcessor
IOException
public URL getCanonicalUrl(URL url)
url
- The input url
public int length(URL url)
url
- The input URL
public int levels(URL url)
url
- The input URL
public boolean isTopLevelPage(URL url)
url
- The input URL
public boolean isLandingPage(URL url)
url
- The input URL
public URL getNormalizedURL(String url) throws MalformedURLException, URISyntaxException
MalformedURLException
URISyntaxException
public boolean isEnabled()
public void setEnabled(boolean enabled)
|
|||||||||
PREV CLASS NEXT CLASS | FRAMES NO FRAMES | ||||||||
SUMMARY: NESTED | FIELD | CONSTR | METHOD | DETAIL: FIELD | CONSTR | METHOD |