Add dynamic geojson fields to Solr

Solr Spatial

How to add dynamic geojson fields to Solr 6 to index and search geospatial data.

Versions used: Solr 6.6.3.

Solr supports indexing spatial fields on industry standards: WKT and GeoJSON. In this post, I’ll be going through indexing spatial fields on GeoJSON format, especially to index polygons.

First, you need to choose between SpatialRecursivePrefixTreeFieldType (RPT) or RptWithGeometrySpatialField (RPT with Geometry). Both support indexing polygons but there are a couple of technical differences that can suit your needs better. I chose RPT with Geometry.

For polygon support, you will need JTS Topology Suite. Navigate to this URL, click on required version and download file jts-core-VERSION#.jar. This file should be copied to: Solr Installation Folder\server\solr-webapp\webapp\WEB-INF\lib\

Add the new field type to your Solr index schema.

1
2
3
4
<!-- RPT -->
<fieldType name="geojson_rpt" class="solr.SpatialRecursivePrefixTreeFieldType" spatialContextFactory="JTS" geo="true" format="GeoJSON" autoIndex="true" validationRule="repairBuffer0" distErrPct="0.025" maxDistErr="0.001" distanceUnits="kilometers" />
<!-- RPT with geometry -->
<fieldType name="geojson_rptgeom" class="solr.RptWithGeometrySpatialField" spatialContextFactory="JTS" geo="true" format="GeoJSON" autoIndex="true" validationRule="repairBuffer0" distErrPct="0.15" maxDistErr="0.001" distanceUnits="kilometers"/>

Then, add a dynamic field that maps to the newly defined field type.

1
2
<dynamicField name="*_grpt" type="geojson_rpt" indexed="true" stored="true" allowMultiOverlap="true"/>
<dynamicField name="*_grptgeom" type="geojson_rptgeom" indexed="true" stored="true" allowMultiOverlap="true"/>

Once you have done this, you can store GeoJson data on these fields and run Spatial searches against them. The best way to test if it works is by storing a value. If it’s invalid, Solr will throw and error than can be seen in the lgos.

e.g. The following filter query will return documents that intersect the given coordinate.

1
polygon_grpt:"Intersects(19.0497665 47.4916717)"

Notes:

  • distErrPct and maxDistErr can be tweaked for less accuracy and faster performance.
  • Solr recommends to not store these field types as it would be redundant and will increase index size.

    When using this field type, you will likely not want to mark the field as stored because it’s redundant with the DocValues data and surely larger because of the formatting (be it WKT or GeoJSON). Source

  • On the same link, Solr mentions an in-memory cache that can be enabled to improve performance.

    An optional in-memory cache can be defined in solrconfig.xml, which should be done when the data tends to have shapes with many vertices.


Please let me know what you think and/or if you can spot any errors.
/eom

Share