insights by infactory in-factory GmbH Process JSON messages in traditional staging area

Many companies are currently not exploiting the potential of their data, but there is an awareness and appreciation of data treasures – but as we all know, it is not always easy to find a treasure and then to retrieve it.

At in-factory, our experience is that our customers are aware of the challenges and necessary expertise. That is why they make use of our services. We have been bringing the experience and know-how to such projects for over 25 years. It is clear to us that companies in Germany and Switzerland have recognized the potential and are deliberately focusing on data integration projects in the context of digital transformation to achieve their corporate goals.

Today we will show you a practical example of how we support our customers from the automotive industry in their transformation project to achieve their dealer and customer experience goals.

Our objective

JavaScript Object Notation (JSON) messages are to be read in the classic staging area of a data warehouse (DWH). This structure was previously used to process flat files. The incoming JSON messages are to be read-in continuously and the data stored in tables of the staging area.

The data should then be via batch-load transfered from the staging area into the Data Warehouse on a regular basis.
The functions used here are available as of Oracle Database 12c Release 1 (12.1.0.2).

Our solution

Data storage

For example, the messages are inserted from Kafka into the database as JSON.

It makes sense to store the payload in a column with the data type Binary Large Object (BLOB) or Character Large Object (Clob). Oracle recommends the use of BLOB, since here no conversion to character must be carried out and less memory is used depending on the character set. The syntax of the JSON can be validated implicitly already during the insert. To do this, the constraint “IS JSON” must be applied to the column with the STRICT or LAX option. The default option is LAX, this corresponds to JavaScript syntax for object fields, Boolean and Null Values are not case sensitive.

Basic functions

JSON_TABLE creates a row of relational data for each object within a JSON array. JSON values from this object are output as single SQL column values.

JSON_VALUE finds a specified scalar JSON value in JSON data and returns it as SQL value.

JSON_QUERY finds one or more specified JSON values in JSON data and returns the values as characters.

Practical implementation

Demo preparation

For the example, the following database (DB) objects must be created:
Table STG_CTRL_DELIVERY:

CREATE TABLE STG_CTRL_DELIVERY
(
DELIVERY_ID NUMBER,
DELIVERY_TIMESTAMP TIMESTAMP(6),
JSON_DOCUMENT CLOB,
STATUS VARCHAR2(200 BYTE)
);

View V_STG_CTRL_DELIVERY_CONTACT:

CREATE OR REPLACE VIEW V_STG_CTRL_DELIVERY_CONTACT
AS
(SELECT STG.DELIVERY_ID,
Salutation,
Firstname,
Lastname,
DateOfBirth,
CON_DATA.ROW_NUMBER AS CON_ROWNUM
FROM HARSCAX.STG_CTRL_DELIVERY STG,
JSON_TABLE (
STG.JSON_DOCUMENT,
'$."Payload".Data.Contacts.Contact[*]'
COLUMNS (
ROW_NUMBER FOR ORDINALITY,
NESTED PATH '$.ContactData.IndividualPerson[*]'
COLUMNS (
Salutation VARCHAR2 (300) PATH '$.Salutation',
Firstname VARCHAR2 (300) PATH '$.Firstname',
Lastname VARCHAR2 (300) PATH '$.Lastname',
DateOfBirth VARCHAR2 (300) PATH '$.DateOfBirth')))
CON_DATA
WHERE STG.STATUS = 'to-process');

View V_STG_CTRL_DELIVERY_CONTACT_EMAIL:

CREATE OR REPLACE VIEW V_STG_CTRL_DELIVERY_CONTACT_EMAIL
AS
SELECT STG.DELIVERY_ID,
ChannelType,
ChannelValue,
ROW_NUMBER AS con_rownum,
con_data.row_number_sub AS con_com_rownum
FROM HARSCAX.STG_CTRL_DELIVERY STG,
JSON_TABLE (
stg.JSON_DOCUMENT,
'$."Payload".Data.Contacts.Contact[*]'
COLUMNS (
ROW_NUMBER FOR ORDINALITY,
NESTED PATH '$.CommunicationChannel[*]'
COLUMNS (
row_number_sub FOR ORDINALITY,
ChannelType VARCHAR2 (300) PATH '$.ChannelType',
ChannelValue VARCHAR2 (300) PATH '$.ChannelValue')))
con_data
WHERE STG.STATUS = 'to-process';

In addition, the sample payload must be inserted into the STG_CTRL_DELIVERY table – as shown in the following code example:

Insert into STG_CTRL_DELIVERY
(DELIVERY_ID, DELIVERY_TIMESTAMP,JSON_DOCUMENT,STATUS)
Values
(1, SYSDATE, 'JSON_DOCUMENT', 'to-process');

Where ‘JSON_DOCUMENT’ must be replaced by the content of the JSON file:

{
"Payload": {
"Data": {
"Contacts": {
"Contact": [{
"ContactData": {
"ContactType": "I",
"CreationDate": "2017-01-04T08:10:45.765Z",
"ChangeDate": "2017-02-15T07:21:31.991Z",
"IndividualPerson": {
"Salutation": "Ms.",
"Firstname": "1_FIRSTNAME",
"Lastname": "1_LASTNAME",
"DateOfBirth": "1996-06-06"
}
},
"CommunicationChannel": [ {
"ActionType": "I",
"ChannelType": "Private eMail",
"ChannelValue": "1_FIRSTNAME.1_LASTNAME@test.de"
}
],
}, {
"ContactData": {
"CreationDate": "2017-01-04T08:10:45.765Z",
"ChangeDate": "2017-02-15T07:21:31.991Z",
"IndividualPerson": {
"Salutation": "Mr.",
"Firstname": "2_FIRSTNAME",
"Lastname": "2_LASTNAME",
"DateOfBirth": "1987-01-03"
}
},
"CommunicationChannel": [ {
"ChannelType": "Private eMail",
"ChannelValue": "2_FIRSTNAME.2_LASTNAME@test.net"
},
{
"ChannelType": "additional eMail",
"ChannelValue": "MAIL2_2_FIRSTNAME.2_LASTNAME@test.net"
}
],
}
] }
}
}
}

Data selection

In this demo, data from JSON messages is read from this payload using Views.

These messages are stored in a control table. An external process delivers the messages and sets the status to “to-process”. This status is the scope of the Views used here.

The views read the required attributes for each group in the JSON. Oracle JSON functions are used for this purpose. The relations between the objects are represented via generated IDs.

The data of the Views can then be transferred to tables and processed classically from there. The status must be updated after the transfer for the processed IDs to remove them from the Scope of the View.The example payload in the control table contains two contacts with various attributes, both contacts are assigned email addresses.

Steuerungs-Tabelle

Reading out the data via the Views:

Auslesen der Basis-Informationen der Kontakte
Auslesen der Email-Adressen

The most important thing here is the relation between the main object and the sub object.

Contact 1 (CON_ROWNUM 1) has an e-mail address (CON_COM_ROWNUM 1).
Contact 2 (CON_ROWNUM 2) has two e-mail addresses (CON_COM_ROWNUM 1+2).
On each level an ID is generated here (CON_ROWNUM + CON_COM_ROWNUM). This is realized via the option “ROW_NUMBER FOR ORDINALITY”.

CREATE OR REPLACE VIEW V_STG_CTRL_DELIVERY_CONTACT_EMAIL
AS
SELECT STG.DELIVERY_ID,
ChannelType,
ChannelValue,
ROW_NUMBER AS con_rownum,
con_data.row_number_sub AS con_com_rownum
FROM HARSCAX.STG_CTRL_DELIVERY STG,
JSON_TABLE (
stg.JSON_DOCUMENT,
'$."Payload".Data.Contacts.Contact[*]'
COLUMNS (
ROW_NUMBER FOR ORDINALITY,
NESTED PATH '$.CommunicationChannel[*]'
COLUMNS (
row_number_sub FOR ORDINALITY,
ChannelType VARCHAR2 (300) PATH '$.ChannelType',
ChannelValue VARCHAR2 (300) PATH '$.ChannelValue')))
con_data
WHERE STG.STATUS = 'to-process';

Further documentation is available on the Oracle website.

Author: Axel Harsch

ANY QUESTIONS? WE HAVE THE ANSWERS!

Write us. We are looking forward to your message!

MAIL TO