Post Tool
Solr includes a simple command line tool for POSTing various types of content to a Solr server that is part of the bin/solr
CLI.
This tool is meant for use by new users exploring Solr’s capabilities, and is not intended as a robust solution to be used for indexing documents into production systems. |
You may be familiar with SimplePostTool and the bin/post Unix shell script. While this is still available, it is deprecated and will be removed in Solr 10.
|
To run it, open a window and enter:
$ bin/solr post -url http://localhost:8983/gettingstarted/update example/films/films.json
This will contact the server at localhost:8983
.
The --help
(or simply -h
) option will output information on its usage (i.e., bin/solr post -h)
.
Using the bin/solr post Tool
You must either specify url
that is the full path to the update handler or provide a c
collection/core name when using bin/solr post
.
This specifies the same target collection: -url http://localhost:8983/gettingstarted/update
or -c gettingstarted
.
The basic usage of bin/solr post
is:
usage: post
-c,--name <NAME> Name of the collection.
-d,--delay <delay> If recursive then delay
will be the wait time
between posts. default:
10 for web, 0 for files
--dry-run Performs a dry run of
the posting process
without actually sending
documents to Solr. Only
works with files mode.
-f,--format sends application/json
content as Solr commands
to /update instead of
/update/json/docs.
-ft,--filetypes <<type>[,<type>,...]> default:
xml,json,jsonl,csv,pdf,d
oc,docx,ppt,pptx,xls,xls
x,odt,odp,ods,ott,otp,ot
s,rtf,htm,html,txt,log
-h,--help Print this message.
--mode <mode> Which mode the Post tool
is running in, 'files'
crawls local directory,
'web' crawls website,
'args' processes input
args, and 'stdin' reads
a command from standard
in. default: files.
-o,--optimize Issue an optimize at end
of posting documents.
--out sends Solr response
outputs to console.
-p,--params <<key>=<value>[&<key>=<value>...]> values must be
URL-encoded; these pass
through to Solr update
request.
-r,--recursive <recursive> For web crawl, how deep
to go. default: 1
--skip-commit Do not 'commit', and
thus changes won't be
visible till a commit
occurs.
-t,--type <content-type> Specify a specific
mimetype to use, such as
application/json.
-u,--credentials <credentials> Credentials in the
format
username:password.
Example: --credentials
solr:SolrRocks
-url,--solr-update-url <UPDATEURL> Solr Update URL, the
full url to the update
handler, including the
/update.
-v,--verbose Enable more verbose
command output.
Examples Using bin/solr post
There are several ways to use bin/solr post
.
This section presents several examples.
Indexing JSON
Index all JSON files into gettingstarted
.
$ bin/solr post -url http://localhost:8983/solr/gettingstarted/update *.json
Indexing XML
Add all documents with file extension .xml
to the collection named gettingstarted
.
$ bin/solr post -url http://localhost:8983/solr/gettingstarted/update *.xml
Add all documents starting with article
with file extension .xml
to the gettingstarted
collection on Solr running on port 8984
.
$ bin/solr post -url http://localhost:8984/solr/gettingstarted/update article*.xml
Send XML arguments to delete a document from gettingstarted
.
$ bin/solr post -url http://localhost:8983/solr/gettingstarted/update --mode args --type application/xml '<delete><id>42</id></delete>'
Indexing CSV and JSON
Index all CSV and JSON files into gettingstarted
from current directory:
$ bin/solr post -c gettingstarted --filetypes json,csv .
Index a tab-separated file into gettingstarted
:
$ bin/solr post -url http://localhost:8984/solr/signals/update --params "separator=%09" --type text/csv data.tsv
The content type (-type
) parameter is required to treat the file as the proper type, otherwise it will be ignored and a WARNING logged as it does not know what type of content a .tsv file is.
The CSV handler supports the separator
parameter, and is passed through using the -params
setting.
Indexing Rich Documents (PDF, Word, HTML, etc.)
Index a PDF file into gettingstarted
.
$ bin/solr post -url http://localhost:8983/solr/gettingstarted/update a.pdf
Automatically detect content types in a folder, and recursively scan it for documents for indexing into gettingstarted
.
$ bin/solr post -url http://localhost:8983/solr/gettingstarted/update afolder/
Automatically detect content types in a folder, but limit it to PPT and HTML files and index into gettingstarted
.
$ bin/solr post -url http://localhost:8983/solr/gettingstarted/update --filetypes ppt,html afolder/
Indexing to a Password Protected Solr (Basic Auth)
Index a PDF as the user "solr" with password "SolrRocks":
$ bin/solr post -u solr:SolrRocks -url http://localhost:8983/solr/gettingstarted/update a.pdf
Crawling a Website to Index Documents
Crawl the Apache Solr website going one layer deep and indexing the pages into Solr.
See Trying Out Solr Cell to learn more about setting up Solr for extracting content from web pages.
$ bin/solr post --mode web -c gettingstarted --recursive 1 --delay 1 https://solr.apache.org/