Package org.apache.solr.cli
Class PostTool
- java.lang.Object
-
- org.apache.solr.cli.ToolBase
-
- org.apache.solr.cli.PostTool
-
-
Nested Class Summary
Nested Classes Modifier and Type Class Description static class
PostTool.PageFetcherResult
Utility class to hold the result form a page fetch
-
Field Summary
Fields Modifier and Type Field Description static String
DEFAULT_CONTENT_TYPE
static String
DEFAULT_FILE_TYPES
-
Constructor Summary
Constructors Constructor Description PostTool()
PostTool(PrintStream stdout)
-
Method Summary
All Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static String
appendParam(String url, String param)
Appends a URL query parameter to a URLvoid
commit()
Does a simple commit operationprotected static String
computeFullUrl(URL baseUrl, String link)
Computes the full URL based on a base url and a possibly relative link found in the href param of an HTML anchor.void
execute(String mode)
After initialization, call execute to start the post job.FileFilter
getFileFilterFromFileTypes(String fileTypes)
String
getName()
Defines the interface to a Solr tool that can be run from this command-line app.static NodeList
getNodesFromXP(Node n, String xpath)
Gets all nodes matching an XPathList<org.apache.commons.cli.Option>
getOptions()
static String
getXP(Node n, String xpath, boolean concatAll)
Gets the string content of the matching an XPathprotected static String
guessType(File file)
Guesses the type of file, based on file name suffix Returns "application/octet-stream" if no corresponding mimeMap type.static Document
makeDom(byte[] in)
Takes a string as input and returns a DOMprotected static String
normalizeUrlEnding(String link)
Normalizes a URL string by removing anchor part and trailing slashvoid
optimize()
Does a simple optimize operationboolean
postData(InputStream data, Long length, OutputStream output, String type, URI uri)
Reads data from the data stream and posts it to solr, writes to the response to outputvoid
postFile(File file, OutputStream output, String type)
Opens the file and posts its contents to the solrUrl, writes to response to output.int
postFiles(String[] args, int startIndexInArgs, OutputStream out, String type)
Post all filenames provided in argsint
postWebPages(String[] args, int startIndexInArgs, OutputStream out)
This method takes as input a list of start URL strings for crawling, converts the URL strings to URI strings and adds each one to a backlog and then starts crawlingvoid
runImpl(org.apache.commons.cli.CommandLine cli)
static InputStream
stringToStream(String s)
Converts a string to an input streamprotected boolean
typeSupported(String type)
Uses the mime-type map to reverse lookup whether the file ending for our type is supported by the fileTypes optionprotected int
webCrawl(int level, OutputStream out)
A very simple crawler, pulling URLs to fetch from a backlog and then recurses N levels deep if recursive>0.-
Methods inherited from class org.apache.solr.cli.ToolBase
echo, echoIfVerbose, isVerbose, runTool
-
-
-
-
Field Detail
-
DEFAULT_FILE_TYPES
public static final String DEFAULT_FILE_TYPES
- See Also:
- Constant Field Values
-
DEFAULT_CONTENT_TYPE
public static final String DEFAULT_CONTENT_TYPE
- See Also:
- Constant Field Values
-
-
Constructor Detail
-
PostTool
public PostTool()
-
PostTool
public PostTool(PrintStream stdout)
-
-
Method Detail
-
getName
public String getName()
Description copied from interface:Tool
Defines the interface to a Solr tool that can be run from this command-line app.
-
getOptions
public List<org.apache.commons.cli.Option> getOptions()
-
runImpl
public void runImpl(org.apache.commons.cli.CommandLine cli) throws Exception
-
execute
public void execute(String mode) throws org.apache.solr.client.solrj.SolrServerException, IOException
After initialization, call execute to start the post job. This method delegates to the correct mode method.- Throws:
org.apache.solr.client.solrj.SolrServerException
IOException
-
postFiles
public int postFiles(String[] args, int startIndexInArgs, OutputStream out, String type)
Post all filenames provided in args- Parameters:
args
- array of file namesstartIndexInArgs
- offset to startout
- output stream to post data totype
- default content-type to use when posting (this may be overridden in auto mode)- Returns:
- number of files posted
-
postWebPages
public int postWebPages(String[] args, int startIndexInArgs, OutputStream out)
This method takes as input a list of start URL strings for crawling, converts the URL strings to URI strings and adds each one to a backlog and then starts crawling- Parameters:
args
- the raw input args from main()startIndexInArgs
- offset for where to startout
- outputStream to write results to- Returns:
- the number of web pages posted
-
normalizeUrlEnding
protected static String normalizeUrlEnding(String link)
Normalizes a URL string by removing anchor part and trailing slash- Returns:
- the normalized URL string
-
webCrawl
protected int webCrawl(int level, OutputStream out)
A very simple crawler, pulling URLs to fetch from a backlog and then recurses N levels deep if recursive>0. Links are parsed from HTML through first getting an XHTML version using SolrCell with extractOnly, and followed if they are local. The crawler pauses for a default delay of 10 seconds between each fetch, this can be configured in the delay variable. This is only meant for test purposes, as it does not respect robots or anything else fancy :)- Parameters:
level
- which level to crawlout
- output stream to write to- Returns:
- number of pages crawled on this level and below
-
computeFullUrl
protected static String computeFullUrl(URL baseUrl, String link) throws MalformedURLException, URISyntaxException
Computes the full URL based on a base url and a possibly relative link found in the href param of an HTML anchor.- Parameters:
baseUrl
- the base url from where the link was foundlink
- the absolute or relative link- Returns:
- the string version of the full URL
- Throws:
MalformedURLException
URISyntaxException
-
typeSupported
protected boolean typeSupported(String type)
Uses the mime-type map to reverse lookup whether the file ending for our type is supported by the fileTypes option- Parameters:
type
- what content-type to lookup- Returns:
- true if this is a supported content type
-
commit
public void commit() throws IOException, org.apache.solr.client.solrj.SolrServerException
Does a simple commit operation- Throws:
IOException
org.apache.solr.client.solrj.SolrServerException
-
optimize
public void optimize() throws IOException, org.apache.solr.client.solrj.SolrServerException
Does a simple optimize operation- Throws:
IOException
org.apache.solr.client.solrj.SolrServerException
-
appendParam
public static String appendParam(String url, String param)
Appends a URL query parameter to a URL- Parameters:
url
- the original URLparam
- the parameter(s) to append, separated by "&"- Returns:
- the string version of the resulting URL
-
postFile
public void postFile(File file, OutputStream output, String type) throws MalformedURLException, URISyntaxException
Opens the file and posts its contents to the solrUrl, writes to response to output.
-
guessType
protected static String guessType(File file)
Guesses the type of file, based on file name suffix Returns "application/octet-stream" if no corresponding mimeMap type.- Parameters:
file
- the file- Returns:
- the content-type guessed
-
postData
public boolean postData(InputStream data, Long length, OutputStream output, String type, URI uri)
Reads data from the data stream and posts it to solr, writes to the response to output- Returns:
- true if success
-
stringToStream
public static InputStream stringToStream(String s)
Converts a string to an input stream- Parameters:
s
- the string- Returns:
- the input stream
-
getFileFilterFromFileTypes
public FileFilter getFileFilterFromFileTypes(String fileTypes)
-
getNodesFromXP
public static NodeList getNodesFromXP(Node n, String xpath) throws XPathExpressionException
Gets all nodes matching an XPath- Throws:
XPathExpressionException
-
getXP
public static String getXP(Node n, String xpath, boolean concatAll) throws XPathExpressionException
Gets the string content of the matching an XPath- Parameters:
n
- the node (or doc)xpath
- the xpath stringconcatAll
- if true, text from all matching nodes will be concatenated, else only the first returned- Throws:
XPathExpressionException
-
makeDom
public static Document makeDom(byte[] in) throws SAXException, IOException, ParserConfigurationException
Takes a string as input and returns a DOM
-
-