insights by infactory in-factory GmbH Opensearch: Full text search in a relational contact database

One of our customers in the automotive sector had a requirement to implement a full text search on a relational Oracle database. Different hits, weighted according to so-called “scores”, were to be delivered as results.

The existing system is an Oracle DB (DataVault architecture). The full width of a contact therefore requires a join across multiple tables.

As possible technologies for the implementation, Oracle’s on-board tools were considered as well as alternative solutions. Since the customer has a strong focus on Free and Open Source Software (FOSS) components, the Opensearch Suite was selected for the implementation of this requirement.

Overview Opensearch

Opensearch is an open source distributed search and analytics suite that provides a highly scalable, highly available system for rapidly accessing large amounts of data. Opensearch is based on the Apache Lucene search library and is a branch of Elasticsearch 7.10.2.

Opensearch was further developed as a fork of the last ALv2 (Apache License version 2) version of Elasticsearch after the licensing strategy of Elastic NV was changed.

The Opensearch Suite essentially consists of three components:

– Logstash, an ETL tool, is used to import data.
– Opensearch analyzes/indexes/stores the data and provides the actual search engine.
– Opensearch dashboard provides visualization and management tools.

Project implementation

System architecture

Opensearch+Schematic

The requirements were implemented as follows:

– Logstash regularly imports all new contacts via JDBC (Java Database Connectivity) using Oracle Select statements against the relational (rel.) DB. By means of a persistence marker Logstash remembers until “where” this job has already been done, so that after the initial load only deltas are loaded.
– Within the Logstash pipeline individual fields can be manipulated and filtered.
– Logstash’s export transfers the data to Opensearch which “indexes” them and provides the search engine’s REST (Representational State Transfer) web service endpoint.

Particular features

The following particular features must be observed:

– Analysis and preparation of the data to be indexed and the search query: If you search for MuelLeR, for example, you expect “müller”, “Müller”, “mueller”, etc. This can be achieved in Opensearch by country-specific indexes to which different templates with different analyzers are applied. These parse the data into individual terms, German Umlauts are normalized, the use of synonyms is enabled, “stemming” is performed (“ging” is converted to “gehen”, for example, so that one can search for all inflections of “gehen”).
– Timeliness of the index: If you delete contacts in the relational DB, they can still be found in the search index of Opensearch. I.e. you may need different logstash pipelines to take care of the insert and delete use cases. The goal is, of course, that the search index contains an up-to-date image of the relational database’s dataset.

Author: Sascha Hirschoff

ANY QUESTIONS? WE HAVE THE ANSWERS!

Write us. We are looking forward to your message!

MAIL TO