Indexing Nested Documents
Solr supports indexing nested documents, described here, and ways to search and retrieve them very efficiently.
By way of examples: nested documents in Solr can be used to bind a blog post (parent document) with comments (child documents) — or as a way to model major product lines as parent documents, with multiple types of child documents representing individual SKUs (with unique sizes / colors) and supporting documentation (either directly nested under the products, or under individual SKUs.
The "top most" parent with all children is referred to as a "root" document (or formerly "block document"), and it explains some of the nomenclature of related features.
At query time, the Block Join Query Parser can search these relationships, and the [child
] Document Transformer can attach child (or other "descendent") documents to the result documents.
In terms of performance, indexing the relationships between documents usually yields much faster queries than an equivalent "query time join",
since the relationships are already stored in the index and do not need to be computed.
However, nested documents are less flexible than query time joins as it imposes rules that some applications may not be able to accept. Nested documents may be indexed via either the XML or JSON data syntax, and is also supported by SolrJ with javabin.
Reindexing Considerations
With the exception of in-place updates, Solr must internally reindex an entire nested document tree if there are updates to it. For some applications this may result in a lot of extra indexing overhead that may not be worth the performance gains at query time versus other modeling approaches. |
In the examples on this page, the IDs of child documents are always provided. However, you need not generate such IDs; you can let Solr populate them automatically. It will concatenate the ID of its parent with a separator and path information that should be unique. Try it out for yourself!
Example Indexing Syntax: Pseudo-Fields
This example shows what it looks like to index two root "product" documents, each containing two different types of child documents specified in "pseudo-fields": "skus" and "manuals". Two of the "sku" type documents have their own nested child "manuals" documents…
Even though the child documents in these examples are provided syntactically as field values, this is simply a matter of syntax and as such |
-
JSON
-
XML
-
SolrJ
[{ "id": "P11!prod",
"name_s": "Swingline Stapler",
"description_t": "The Cadillac of office staplers ...",
"skus": [ { "id": "P11!S21",
"color_s": "RED",
"price_i": 42,
"manuals": [ { "id": "P11!D41",
"name_s": "Red Swingline Brochure",
"pages_i":1,
"content_t": "..."
} ]
},
{ "id": "P11!S31",
"color_s": "BLACK",
"price_i": 3
} ],
"manuals": [ { "id": "P11!D51",
"name_s": "Quick Reference Guide",
"pages_i":1,
"content_t": "How to use your stapler ..."
},
{ "id": "P11!D61",
"name_s": "Warranty Details",
"pages_i":42,
"content_t": "... lifetime guarantee ..."
} ]
},
{ "id": "P22!prod",
"name_s": "Mont Blanc Fountain Pen",
"description_t": "A Premium Writing Instrument ...",
"skus": [ { "id": "P22!S22",
"color_s": "RED",
"price_i": 89,
"manuals": [ { "id": "P22!D42",
"name_s": "Red Mont Blanc Brochure",
"pages_i":1,
"content_t": "..."
} ]
},
{ "id": "P22!S32",
"color_s": "BLACK",
"price_i": 67
} ],
"manuals": [ { "id": "P22!D52",
"name_s": "How To Use A Pen",
"pages_i":42,
"content_t": "Start by removing the cap ..."
} ]
} ]
The |
<add>
<doc>
<field name="id">P11!prod</field>
<field name="name_s">Swingline Stapler</field>
<field name="description_t">The Cadillac of office staplers ...</field>
<field name="skus">
<doc>
<field name="id">P11!S21</field>
<field name="color_s">RED</field>
<field name="price_i">42</field>
<field name="manuals">
<doc>
<field name="id">P11!D41</field>
<field name="name_s">Red Swingline Brochure</field>
<field name="pages_i">1</field>
<field name="content_t">...</field>
</doc>
</field>
</doc>
<doc>
<field name="id">P11!S31</field>
<field name="color_s">BLACK</field>
<field name="price_i">3</field>
</doc>
</field>
<field name="manuals">
<doc>
<field name="id">P11!D51</field>
<field name="name_s">Quick Reference Guide</field>
<field name="pages_i">1</field>
<field name="content_t">How to use your stapler ...</field>
</doc>
<doc>
<field name="id">P11!D61</field>
<field name="name_s">Warranty Details</field>
<field name="pages_i">42</field>
<field name="content_t">... lifetime guarantee ...</field>
</doc>
</field>
</doc>
<doc>
<field name="id">P22!prod</field>
<field name="name_s">Mont Blanc Fountain Pen</field>
<field name="description_t">A Premium Writing Instrument ...</field>
<field name="skus">
<doc>
<field name="id">P22!S22</field>
<field name="color_s">RED</field>
<field name="price_i">89</field>
<field name="manuals">
<doc>
<field name="id">P22!D42</field>
<field name="name_s">Red Mont Blanc Brochure</field>
<field name="pages_i">1</field>
<field name="content_t">...</field>
</doc>
</field>
</doc>
<doc>
<field name="id">P22!S32</field>
<field name="color_s">BLACK</field>
<field name="price_i">67</field>
</doc>
</field>
<field name="manuals">
<doc>
<field name="id">P22!D52</field>
<field name="name_s">How To Use A Pen</field>
<field name="pages_i">42</field>
<field name="content_t">Start by removing the cap ...</field>
</doc>
</field>
</doc>
</add>
try (SolrClient client = getSolrClient()) {
final SolrInputDocument p1 = new SolrInputDocument();
p1.setField("id", "P11!prod");
p1.setField("name_s", "Swingline Stapler");
p1.setField("description_t", "The Cadillac of office staplers ...");
{
final SolrInputDocument s1 = new SolrInputDocument();
s1.setField("id", "P11!S21");
s1.setField("color_s", "RED");
s1.setField("price_i", 42);
{
final SolrInputDocument m1 = new SolrInputDocument();
m1.setField("id", "P11!D41");
m1.setField("name_s", "Red Swingline Brochure");
m1.setField("pages_i", 1);
m1.setField("content_t", "...");
s1.setField("manuals", m1);
}
final SolrInputDocument s2 = new SolrInputDocument();
s2.setField("id", "P11!S31");
s2.setField("color_s", "BLACK");
s2.setField("price_i", 3);
p1.setField("skus", Arrays.asList(s1, s2));
}
{
final SolrInputDocument m1 = new SolrInputDocument();
m1.setField("id", "P11!D51");
m1.setField("name_s", "Quick Reference Guide");
m1.setField("pages_i", 1);
m1.setField("content_t", "How to use your stapler ...");
final SolrInputDocument m2 = new SolrInputDocument();
m2.setField("id", "P11!D61");
m2.setField("name_s", "Warranty Details");
m2.setField("pages_i", 42);
m2.setField("content_t", "... lifetime guarantee ...");
p1.setField("manuals", Arrays.asList(m1, m2));
}
final SolrInputDocument p2 = new SolrInputDocument();
p2.setField("id", "P22!prod");
p2.setField("name_s", "Mont Blanc Fountain Pen");
p2.setField("description_t", "A Premium Writing Instrument ...");
{
final SolrInputDocument s1 = new SolrInputDocument();
s1.setField("id", "P22!S22");
s1.setField("color_s", "RED");
s1.setField("price_i", 89);
{
final SolrInputDocument m1 = new SolrInputDocument();
m1.setField("id", "P22!D42");
m1.setField("name_s", "Red Mont Blanc Brochure");
m1.setField("pages_i", 1);
m1.setField("content_t", "...");
s1.setField("manuals", m1);
}
final SolrInputDocument s2 = new SolrInputDocument();
s2.setField("id", "P22!S32");
s2.setField("color_s", "BLACK");
s2.setField("price_i", 67);
p2.setField("skus", Arrays.asList(s1, s2));
}
{
final SolrInputDocument m1 = new SolrInputDocument();
m1.setField("id", "P22!D52");
m1.setField("name_s", "How To Use A Pen");
m1.setField("pages_i", 42);
m1.setField("content_t", "Start by removing the cap ...");
p2.setField("manuals", m1);
}
client.add(Arrays.asList(p1, p2));
Schema Configuration
Indexing nested documents requires an indexed field named _root_
:
<field name="_root_" type="string" indexed="true" stored="false" docValues="false" />
Do not add this field to an index that already has data! You must reindex.
-
Solr automatically populates this field in all documents with the
id
value of it’s root document — its highest ancestor, possibly itself. -
This field must be indexed (
indexed="true"
) but doesn’t need to be either stored (stored="true"
) or use doc values (docValues="true"
), however you are free to do so if you find it useful. If you want to useuniqueBlock(_root_)
field type limitation, then you should enable docValues.
Preferably, you will also define _nest_path_
which adds features and ease-of-use:
<fieldType name="_nest_path_" class="solr.NestPathField" />
<field name="_nest_path_" type="_nest_path_" />`
-
Solr automatically populates this field for any child document but not root documents.
-
This field enables Solr to properly record & reconstruct the named and nested relationship of documents when using the
[child
] doc transformer.-
If this field does not exist, the
[child]
transformer will return all descendent child documents as a flattened list — just as if they had been indexed as anonymous children.
-
-
If you do not use
_nest_path_
it is strongly recommended that every document have some field that differentiates root documents from their nested children — and differentiates different "types" of child documents. This is not strictly necessary, so long as it’s possible to write a "filter" query that can be used to isolate and select only parent documents for use in the Block Join Query Parser and[child
] doc transformer -
It’s possible to query on this field, although at present it’s only documented how to in the context of
[child]’s `childFilter
parameter.
You might optionally want to define _nest_parent_
to store parent IDs:
<field name="_nest_parent_" type="string" indexed="true" stored="true" />
-
Solr automatically populates this field in child documents but not root documents.
Finally, understand that nested child documents are very much documents in their own right even if certain nested documents hold different information from the parent or other child documents, therefore:
-
All field names in the schema can only be configured in one — different types of child documents can not have the same field name configured in different ways.
-
It may be infeasible to use
required
for any field names that aren’t required for all types of documents. -
Even child documents need a globally unique
id
.
When using SolrCloud it is a VERY good idea to use prefix based compositeIds with a common prefix for all documents in the nested document tree. This makes it much easier to apply atomic updates to individual child documents. |
Maintaining Integrity with Updates, Deletes, and Shard Splits
Nested document trees can be modified with
atomic updates to
manipulate any document in a nested tree, and even to add new child documents.
This aspect isn’t different from updating any normal document — Solr internally deletes the old
nested document tree and adds the newly modified one.
Just be mindful to add a root
field if the partial update is to a child doc so that Solr
knows which Root doc it’s related to.
Solr demands that the id
of all documents in a collection be unique.
Solr enforces this for root documents within a shard, but not for child documents to avoid the expense of checking.
Clients should be very careful to never violate this.
To delete an entire nested document tree, you can simply delete-by-ID using the id
of the root document.
Delete-by-ID will not work with the id
of a child document, since only root document IDs are considered.
Instead, use delete-by-query (most efficient) or atomic updates to remove the child document from its parent.
If you use Solr’s delete-by-query APIs, you MUST be careful to ensure that any deletion query is structured to ensure no descendent children remain of any documents that are being deleted. Doing otherwise will violate integrity assumptions that Solr expects.
When splitting shards, child documents get transmitted and routed independently.
If you use the default router.field
, this is handled correctly for you.
If your collection has a custom router.field
, then you must ensure that all documents in a hierarchy have the same value for that field.
Indexing Anonymous Children
Although not recommended, it is also possible to index child documents "anonymously":
-
JSON
-
XML
-
SolrJ
[{ "id": "P11!prod",
"name_s": "Swingline Stapler",
"type_s": "PRODUCT",
"description_t": "The Cadillac of office staplers ...",
"_childDocuments_": [
{ "id": "P11!S21",
"type_s": "SKU",
"color_s": "RED",
"price_i": 42,
"_childDocuments_": [
{ "id": "P11!D41",
"type_s": "MANUAL",
"name_s": "Red Swingline Brochure",
"pages_i":1,
"content_t": "..."
} ]
},
{ "id": "P11!S31",
"type_s": "SKU",
"color_s": "BLACK",
"price_i": 3
},
{ "id": "P11!D51",
"type_s": "MANUAL",
"name_s": "Quick Reference Guide",
"pages_i":1,
"content_t": "How to use your stapler ..."
},
{ "id": "P11!D61",
"type_s": "MANUAL",
"name_s": "Warranty Details",
"pages_i":42,
"content_t": "... lifetime guarantee ..."
}
]
} ]
<add>
<doc>
<field name="id">P11!prod</field>
<field name="type_s">PRODUCT</field>
<field name="name_s">Swingline Stapler</field>
<field name="description_t">The Cadillac of office staplers ...</field>
<doc>
<field name="id">P11!S21</field>
<field name="type_s">SKU</field>
<field name="color_s">RED</field>
<field name="price_i">42</field>
<doc>
<field name="id">P11!D41</field>
<field name="type_s">MANUAL</field>
<field name="name_s">Red Swingline Brochure</field>
<field name="pages_i">1</field>
<field name="content_t">...</field>
</doc>
</doc>
<doc>
<field name="id">P11!S31</field>
<field name="type_s">SKU</field>
<field name="color_s">BLACK</field>
<field name="price_i">3</field>
</doc>
<doc>
<field name="id">P11!D51</field>
<field name="type_s">MANUAL</field>
<field name="name_s">Quick Reference Guide</field>
<field name="pages_i">1</field>
<field name="content_t">How to use your stapler ...</field>
</doc>
<doc>
<field name="id">P11!D61</field>
<field name="type_s">MANUAL</field>
<field name="name_s">Warranty Details</field>
<field name="pages_i">42</field>
<field name="content_t">... lifetime guarantee ...</field>
</doc>
</doc>
</add>
try (SolrClient client = getSolrClient()) {
final SolrInputDocument p1 = new SolrInputDocument();
p1.setField("id", "P11!prod");
p1.setField("type_s", "PRODUCT");
p1.setField("name_s", "Swingline Stapler");
p1.setField("description_t", "The Cadillac of office staplers ...");
{
final SolrInputDocument s1 = new SolrInputDocument();
s1.setField("id", "P11!S21");
s1.setField("type_s", "SKU");
s1.setField("color_s", "RED");
s1.setField("price_i", 42);
{
final SolrInputDocument m1 = new SolrInputDocument();
m1.setField("id", "P11!D41");
m1.setField("type_s", "MANUAL");
m1.setField("name_s", "Red Swingline Brochure");
m1.setField("pages_i", 1);
m1.setField("content_t", "...");
s1.addChildDocument(m1);
}
final SolrInputDocument s2 = new SolrInputDocument();
s2.setField("id", "P11!S31");
s2.setField("type_s", "SKU");
s2.setField("color_s", "BLACK");
s2.setField("price_i", 3);
final SolrInputDocument m1 = new SolrInputDocument();
m1.setField("id", "P11!D51");
m1.setField("type_s", "MANUAL");
m1.setField("name_s", "Quick Reference Guide");
m1.setField("pages_i", 1);
m1.setField("content_t", "How to use your stapler ...");
final SolrInputDocument m2 = new SolrInputDocument();
m2.setField("id", "P11!D61");
m2.setField("type_s", "MANUAL");
m2.setField("name_s", "Warranty Details");
m2.setField("pages_i", 42);
m2.setField("content_t", "... lifetime guarantee ...");
p1.addChildDocuments(Arrays.asList(s1, s2, m1, m2));
}
client.add(p1);
This simplified approach was common in older versions of Solr, and can still be used with "Root-Only" schemas that do not contain any other nested related fields apart from _root_
.
Many schemas in existence are this way simply because default configsets are this way, even if the application isn’t using nested documents.
This approach should NOT be used when schemas include a _nest_path_
field, as the existence of that field triggers assumptions and changes in behavior in various query time functionality, such as [child
], that will not work when nested documents do not have any intrinsic "nested path" information.
The results of indexing anonymous nested children with a "Root-Only" schema are similar to what happens if you attempt to index "pseudo field" nested documents using a "Root-Only" schema.
Notably: since there is no nested path information for the [child
] transformer to use to reconstruct the structure of a nest of documents, it returns all matching children as a flat list, similar in structure to how they were originally indexed:
-
JSON
-
XML
$ curl --globoff 'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q=id:P11!prod&fl=*,[child%20parentFilter=%22type_s:PRODUCT%22]'
{
"response":{"numFound":1,"start":0,"maxScore":0.7002023,"numFoundExact":true,"docs":[
{
"id":"P11!prod",
"name_s":"Swingline Stapler",
"type_s":"PRODUCT",
"description_t":"The Cadillac of office staplers ...",
"_version_":1673055562829398016,
"_childDocuments_":[
{
"id":"P11!D41",
"type_s":"MANUAL",
"name_s":"Red Swingline Brochure",
"pages_i":1,
"content_t":"...",
"_version_":1673055562829398016},
{
"id":"P11!S21",
"type_s":"SKU",
"color_s":"RED",
"price_i":42,
"_version_":1673055562829398016},
{
"id":"P11!S31",
"type_s":"SKU",
"color_s":"BLACK",
"price_i":3,
"_version_":1673055562829398016},
{
"id":"P11!D51",
"type_s":"MANUAL",
"name_s":"Quick Reference Guide",
"pages_i":1,
"content_t":"How to use your stapler ...",
"_version_":1673055562829398016},
{
"id":"P11!D61",
"type_s":"MANUAL",
"name_s":"Warranty Details",
"pages_i":42,
"content_t":"... lifetime guarantee ...",
"_version_":1673055562829398016}]}]
}}
$ curl --globoff 'http://localhost:8983/solr/gettingstarted/select?omitHeader=true&q=id:P11!prod&fl=*,[child%20parentFilter=%22type_s:PRODUCT%22]&wt=xml'
<?xml version="1.0" encoding="UTF-8"?>
<response>
<result name="response" numFound="1" start="0" maxScore="0.7002023" numFoundExact="true">
<doc>
<str name="id">P11!prod</str>
<str name="name_s">Swingline Stapler</str>
<str name="type_s">PRODUCT</str>
<str name="description_t">The Cadillac of office staplers ...</str>
<long name="_version_">1673055562829398016</long>
<doc>
<str name="id">P11!D41</str>
<str name="type_s">MANUAL</str>
<str name="name_s">Red Swingline Brochure</str>
<int name="pages_i">1</int>
<str name="content_t">...</str>
<long name="_version_">1673055562829398016</long></doc>
<doc>
<str name="id">P11!S21</str>
<str name="type_s">SKU</str>
<str name="color_s">RED</str>
<int name="price_i">42</int>
<long name="_version_">1673055562829398016</long></doc>
<doc>
<str name="id">P11!S31</str>
<str name="type_s">SKU</str>
<str name="color_s">BLACK</str>
<int name="price_i">3</int>
<long name="_version_">1673055562829398016</long></doc>
<doc>
<str name="id">P11!D51</str>
<str name="type_s">MANUAL</str>
<str name="name_s">Quick Reference Guide</str>
<int name="pages_i">1</int>
<str name="content_t">How to use your stapler ...</str>
<long name="_version_">1673055562829398016</long></doc>
<doc>
<str name="id">P11!D61</str>
<str name="type_s">MANUAL</str>
<str name="name_s">Warranty Details</str>
<int name="pages_i">42</int>
<str name="content_t">... lifetime guarantee ...</str>
<long name="_version_">1673055562829398016</long></doc></doc>
</result>
</response>