Package org.apache.solr.util
Class SimplePostTool
- java.lang.Object
- 
- org.apache.solr.util.SimplePostTool
 
- 
 public class SimplePostTool extends Object A simple utility class for posting raw updates to a Solr server, has a main method so it can be run on the command line. View this not as a best-practice code example, but as a standalone example built with an explicit purpose of not having external jar dependencies.
- 
- 
Nested Class SummaryNested Classes Modifier and Type Class Description static classSimplePostTool.BAOSstatic classSimplePostTool.PageFetcherResultUtility class to hold the result form a page fetch
 - 
Constructor SummaryConstructors Constructor Description SimplePostTool()SimplePostTool(String mode, URL url, boolean auto, String type, String format, int recursive, int delay, String fileTypes, OutputStream out, boolean commit, boolean optimize, String[] args)Constructor which takes in all mandatory input for the tool to work.
 - 
Method SummaryAll Methods Static Methods Instance Methods Concrete Methods Modifier and Type Method Description static StringappendParam(String url, String param)Appends a URL query parameter to a URLprotected static URLappendUrlPath(URL url, String append)Appends to the path of the URLvoidcommit()Does a simple commit operationprotected StringcomputeFullUrl(URL baseUrl, String link)Computes the full URL based on a base url and a possibly relative link found in the href param of an HTML anchor.voiddoGet(String url)Performs a simple get on the given URLvoiddoGet(URL url)Performs a simple get on the given URLvoidexecute()After initialization, call execute to start the post job.FileFiltergetFileFilterFromFileTypes(String fileTypes)static NodeListgetNodesFromXP(Node n, String xpath)Gets all nodes matching an XPathstatic StringgetXP(Node n, String xpath, boolean concatAll)Gets the string content of the matching an XPathprotected static StringguessType(File file)Guesses the type of file, based on file name suffix Returns "application/octet-stream" if no corresponding mimeMap type.static ByteBufferinputStreamToByteArray(InputStream is)static ByteBufferinputStreamToByteArray(InputStream is, long maxSize)Reads an input stream into a byte arrayprotected static booleanisOn(String property)Tests if a string is either "true", "on", "yes" or "1"static voidmain(String[] args)See usage() for valid command line usagestatic DocumentmakeDom(byte[] in)Takes a string as input and returns a DOMprotected static StringnormalizeUrlEnding(String link)Normalizes a URL string by removing anchor part and trailing slashvoidoptimize()Does a simple optimize operationprotected static SimplePostToolparseArgsAndInit(String[] args)Parses incoming arguments and system params and initializes the toolbooleanpostData(InputStream data, Long length, OutputStream output, String type, URL url)Reads data from the data stream and posts it to solr, writes to the response to outputvoidpostFile(File file, OutputStream output, String type)Opens the file and posts its contents to the solrUrl, writes to response to output.intpostFiles(File[] files, int startIndexInArgs, OutputStream out, String type)Post all filenames provided in argsintpostFiles(String[] args, int startIndexInArgs, OutputStream out, String type)Post all filenames provided in argsintpostWebPages(String[] args, int startIndexInArgs, OutputStream out)This method takes as input a list of start URL strings for crawling, converts the URL strings to URI strings and adds each one to a backlog and then starts crawlingstatic InputStreamstringToStream(String s)Converts a string to an input streamprotected booleantypeSupported(String type)Uses the mime-type map to reverse lookup whether the file ending for our type is supported by the fileTypes optionprotected intwebCrawl(int level, OutputStream out)A very simple crawler, pulling URLs to fetch from a backlog and then recurses N levels deep if recursive>0.
 
- 
- 
- 
Constructor Detail- 
SimplePostToolpublic SimplePostTool(String mode, URL url, boolean auto, String type, String format, int recursive, int delay, String fileTypes, OutputStream out, boolean commit, boolean optimize, String[] args) Constructor which takes in all mandatory input for the tool to work. Also see usage() for further explanation of the params.- Parameters:
- mode- whether to post files, web pages, params or stdin
- url- the Solr base Url to post to, should end with /update
- auto- if true, we'll guess type and add resourcename/url
- type- content-type of the data you are posting
- recursive- number of levels for file/web mode, or 0 if one file only
- delay- if recursive then delay will be the wait time between posts
- fileTypes- a comma separated list of file-name endings to accept for file/web
- out- an OutputStream to write output to, e.g. stdout to print to console
- commit- if true, will commit at end of posting
- optimize- if true, will optimize at end of posting
- args- a String[] of arguments, varies between modes
 
 - 
SimplePostToolpublic SimplePostTool() 
 
- 
 - 
Method Detail- 
mainpublic static void main(String[] args) See usage() for valid command line usage- Parameters:
- args- the params on the command line
 
 - 
executepublic void execute() After initialization, call execute to start the post job. This method delegates to the correct mode method.
 - 
parseArgsAndInitprotected static SimplePostTool parseArgsAndInit(String[] args) Parses incoming arguments and system params and initializes the tool- Parameters:
- args- the incoming cmd line args
- Returns:
- an instance of SimplePostTool
 
 - 
postFilespublic int postFiles(String[] args, int startIndexInArgs, OutputStream out, String type) Post all filenames provided in args- Parameters:
- args- array of file names
- startIndexInArgs- offset to start
- out- output stream to post data to
- type- default content-type to use when posting (may be overridden in auto mode)
- Returns:
- number of files posted
 
 - 
postFilespublic int postFiles(File[] files, int startIndexInArgs, OutputStream out, String type) Post all filenames provided in args- Parameters:
- files- array of Files
- startIndexInArgs- offset to start
- out- output stream to post data to
- type- default content-type to use when posting (may be overridden in auto mode)
- Returns:
- number of files posted
 
 - 
postWebPagespublic int postWebPages(String[] args, int startIndexInArgs, OutputStream out) This method takes as input a list of start URL strings for crawling, converts the URL strings to URI strings and adds each one to a backlog and then starts crawling- Parameters:
- args- the raw input args from main()
- startIndexInArgs- offset for where to start
- out- outputStream to write results to
- Returns:
- the number of web pages posted
 
 - 
normalizeUrlEndingprotected static String normalizeUrlEnding(String link) Normalizes a URL string by removing anchor part and trailing slash- Returns:
- the normalized URL string
 
 - 
webCrawlprotected int webCrawl(int level, OutputStream out)A very simple crawler, pulling URLs to fetch from a backlog and then recurses N levels deep if recursive>0. Links are parsed from HTML through first getting an XHTML version using SolrCell with extractOnly, and followed if they are local. The crawler pauses for a default delay of 10 seconds between each fetch, this can be configured in the delay variable. This is only meant for test purposes, as it does not respect robots or anything else fancy :)- Parameters:
- level- which level to crawl
- out- output stream to write to
- Returns:
- number of pages crawled on this level and below
 
 - 
inputStreamToByteArraypublic static ByteBuffer inputStreamToByteArray(InputStream is) throws IOException - Throws:
- IOException
 
 - 
inputStreamToByteArraypublic static ByteBuffer inputStreamToByteArray(InputStream is, long maxSize) throws IOException Reads an input stream into a byte array- Parameters:
- is- the input stream
- Returns:
- the byte array
- Throws:
- IOException- If there is a low-level I/O error.
 
 - 
computeFullUrlprotected String computeFullUrl(URL baseUrl, String link) Computes the full URL based on a base url and a possibly relative link found in the href param of an HTML anchor.- Parameters:
- baseUrl- the base url from where the link was found
- link- the absolute or relative link
- Returns:
- the string version of the full URL
 
 - 
typeSupportedprotected boolean typeSupported(String type) Uses the mime-type map to reverse lookup whether the file ending for our type is supported by the fileTypes option- Parameters:
- type- what content-type to lookup
- Returns:
- true if this is a supported content type
 
 - 
isOnprotected static boolean isOn(String property) Tests if a string is either "true", "on", "yes" or "1"- Parameters:
- property- the string to test
- Returns:
- true if "on"
 
 - 
commitpublic void commit() Does a simple commit operation
 - 
optimizepublic void optimize() Does a simple optimize operation
 - 
appendParampublic static String appendParam(String url, String param) Appends a URL query parameter to a URL- Parameters:
- url- the original URL
- param- the parameter(s) to append, separated by "&"
- Returns:
- the string version of the resulting URL
 
 - 
postFilepublic void postFile(File file, OutputStream output, String type) Opens the file and posts its contents to the solrUrl, writes to response to output.
 - 
appendUrlPathprotected static URL appendUrlPath(URL url, String append) throws MalformedURLException Appends to the path of the URL- Parameters:
- url- the URL
- append- the path to append
- Returns:
- the final URL version
- Throws:
- MalformedURLException
 
 - 
guessTypeprotected static String guessType(File file) Guesses the type of file, based on file name suffix Returns "application/octet-stream" if no corresponding mimeMap type.- Parameters:
- file- the file
- Returns:
- the content-type guessed
 
 - 
doGetpublic void doGet(String url) Performs a simple get on the given URL
 - 
doGetpublic void doGet(URL url) Performs a simple get on the given URL
 - 
postDatapublic boolean postData(InputStream data, Long length, OutputStream output, String type, URL url) Reads data from the data stream and posts it to solr, writes to the response to output- Returns:
- true if success
 
 - 
stringToStreampublic static InputStream stringToStream(String s) Converts a string to an input stream- Parameters:
- s- the string
- Returns:
- the input stream
 
 - 
getFileFilterFromFileTypespublic FileFilter getFileFilterFromFileTypes(String fileTypes) 
 - 
getNodesFromXPpublic static NodeList getNodesFromXP(Node n, String xpath) throws XPathExpressionException Gets all nodes matching an XPath- Throws:
- XPathExpressionException
 
 - 
getXPpublic static String getXP(Node n, String xpath, boolean concatAll) throws XPathExpressionException Gets the string content of the matching an XPath- Parameters:
- n- the node (or doc)
- xpath- the xpath string
- concatAll- if true, text from all matching nodes will be concatenated, else only the first returned
- Throws:
- XPathExpressionException
 
 - 
makeDompublic static Document makeDom(byte[] in) throws SAXException, IOException, ParserConfigurationException Takes a string as input and returns a DOM
 
- 
 
-