Post Tool
Solr includes a simple command line tool for POSTing various types of content to a Solr server.
The tool is bin/post
. The bin/post tool is a Unix shell script; for Windows (non-Cygwin) usage, see the section Post Tool Windows Support below.
To run it, open a window and enter:
bin/post -c gettingstarted example/films/films.json
This will contact the server at localhost:8983
. Specifying the collection/core name
is mandatory. The -help
(or simply -h
) option will output information on its usage (i.e., bin/post -help)
.
Using the bin/post Tool
Specifying either the collection/core name
or the full update url
is mandatory when using bin/post
.
The basic usage of bin/post
is:
$ bin/post -h
Usage: post -c <collection> [OPTIONS] <files|directories|urls|-d ["...",...]>
or post -help
collection name defaults to DEFAULT_SOLR_COLLECTION if not specified
OPTIONS
=======
Solr options:
-url <base Solr update URL> (overrides collection, host, and port)
-host <host> (default: localhost)
-p or -port <port> (default: 8983)
-commit yes|no (default: yes)
-u or -user <user:pass> (sets BasicAuth credentials)
Web crawl options:
-recursive <depth> (default: 1)
-delay <seconds> (default: 10)
Directory crawl options:
-delay <seconds> (default: 0)
stdin/args options:
-type <content/type> (default: application/xml)
Other options:
-filetypes <type>[,<type>,...] (default: xml,json,csv,pdf,doc,docx,ppt,pptx,xls,xlsx,odt,odp,ods,ott,otp,ots,rtf,htm,html,txt,log)
-params "<key>=<value>[&<key>=<value>...]" (values must be URL-encoded; these pass through to Solr update request)
-out yes|no (default: no; yes outputs Solr response to console)
...
Examples Using bin/post
There are several ways to use bin/post
. This section presents several examples.
Indexing XML
Add all documents with file extension .xml
to collection or core named gettingstarted
.
bin/post -c gettingstarted *.xml
Add all documents with file extension .xml
to the gettingstarted
collection/core on Solr running on port 8984
.
bin/post -c gettingstarted -p 8984 *.xml
Send XML arguments to delete a document from gettingstarted
.
bin/post -c gettingstarted -d '<delete><id>42</id></delete>'
Indexing CSV
Index all CSV files into gettingstarted
:
bin/post -c gettingstarted *.csv
Index a tab-separated file into gettingstarted
:
bin/post -c signals -params "separator=%09" -type text/csv data.tsv
The content type (-type
) parameter is required to treat the file as the proper type, otherwise it will be ignored and a WARNING logged as it does not know what type of content a .tsv file is. The CSV handler supports the separator
parameter, and is passed through using the -params
setting.
Indexing JSON
Index all JSON files into gettingstarted
.
bin/post -c gettingstarted *.json
Indexing Rich Documents (PDF, Word, HTML, etc.)
Index a PDF file into gettingstarted
.
bin/post -c gettingstarted a.pdf
Automatically detect content types in a folder, and recursively scan it for documents for indexing into gettingstarted
.
bin/post -c gettingstarted afolder/
Automatically detect content types in a folder, but limit it to PPT and HTML files and index into gettingstarted
.
bin/post -c gettingstarted -filetypes ppt,html afolder/
Indexing to a Password Protected Solr (Basic Auth)
Index a PDF as the user "solr" with password "SolrRocks":
bin/post -u solr:SolrRocks -c gettingstarted a.pdf
Post Tool Windows Support
bin/post
is a Unix shell script and as such cannot be used directly on Windows.
However it delegates its work to a cross-platform capable Java program called "SimplePostTool" or post.jar
, that can be used in Windows environments.
The argument syntax differs significantly from bin/post
, so your first step should be to print the SimplePostTool help text.
$ java -jar example\exampledocs\post.jar -h
This command prints information about all the arguments and System properties available to SimplePostTool users. There are also examples showing how to post files, crawl a website or file system folder, and send update commands (deletes, etc.) directly to Solr.
Most usage involves passing both Java System properties and program arguments on the command line. Consider the example below:
$ java -jar -Dc=gettingstarted -Dauto example\exampledocs\post.jar example\exampledocs\*
This indexes the contents of the exampledocs
directory into a collection called gettingstarted
.
The -Dauto
System property governs whether or not Solr sends the document type to Solr during extraction.