Process JSON messages in traditional staging area

Back to overview

Many companies are not yet exploiting the potential of their data, but the awareness and appreciation of data treasures is there - but as we all know, it is not always easy to find a treasure and then recover it.

At in-factory, we have found that our customers are aware of the challenges and expertise required. That is why they make use of our services. We have over 25 years of experience and know-how in such projects. It is clear to us that companies in Germany have recognized the potential and are consciously relying on data integration projects to achieve their corporate goals in the context of digital transformation.

JavaScript Object Notation (JSON) messages are to be read in the classic staging area of a data warehouse (DWH). This structure was previously used to process flat files. The incoming JSON messages are to be read in continuously and the data stored in tables in the staging area.

The data should then be regularly loaded from the staging area into the data warehouse via batch load.
The functions used here are available from Oracle Database 12c Release 1 (12.1.0.2).

Our solution approach

Data storage

The messages are inserted into the database as JSON from Kafka, for example.
The payload is best stored in a column with the data type Binary Large Object (BLOB) or Character Large Object (Clob). Oracle recommends the use of BLOB, as no conversion to character is required and, depending on the character set, less memory is used. The syntax of the JSON can be validated implicitly during the insert. To do this, the "IS JSON" constraint must be applied to the column with the STRICT or LAX option. The default option is LAX, which corresponds to JavaScript syntax for object fields; Boolean and null values are not case-sensitive.

Basic functions

JSON_TABLE creates a row with relational data for each object within a JSON array. JSON values from this object are output as individual SQL column values.

JSON_VALUE finds a specified scalar JSON value in JSON data and returns it as an SQL value.

JSON_QUERY finds one or more specified JSON values in JSON data and returns the values as characters.

Practical implementation

Demo preparation

The following database (DB) objects must be created for the example:
The table STG_CTRL_DELIVERY:

CREATE TABLE STG_CTRL_DELIVERY

(

DELIVERY_ID NUMBER,

DELIVERY_TIMESTAMP TIMESTAMP(6),

JSON_DOCUMENT CLOB,

STATUS VARCHAR2(200 BYTE)

);

The view V_STG_CTRL_DELIVERY_CONTACT:

CREATE OR REPLACE VIEW V_STG_CTRL_DELIVERY_CONTACT

AS

(SELECT STG.DELIVERY_ID,

Salutation,

Firstname,

Lastname,

DateOfBirth,

CON_DATA.ROW_NUMBER AS CON_ROWNUM

FROM HARSCAX.STG_CTRL_DELIVERY STG,

JSON_TABLE (

STG.JSON_DOCUMENT,

'$."Payload".Data.Contacts.Contact[*]'

COLUMNS (

ROW_NUMBER FOR ORDINALITY,

NESTED PATH '$.ContactData.IndividualPerson[*]'

COLUMNS (

Salutation VARCHAR2 (300) PATH '$.Salutation',

Firstname VARCHAR2 (300) PATH '$.Firstname',

Lastname VARCHAR2 (300) PATH '$.Lastname',

DateOfBirth VARCHAR2 (300) PATH '$.DateOfBirth')))

CON_DATA

WHERE STG.STATUS = 'to-process');

And the view V_STG_CTRL_DELIVERY_CONTACT_EMAIL:

CREATE OR REPLACE VIEW V_STG_CTRL_DELIVERY_CONTACT_EMAIL

AS

SELECT STG.DELIVERY_ID,

ChannelType,

ChannelValue,

ROW_NUMBER AS con_rownum,

con_data.row_number_sub AS con_com_rownum

FROM HARSCAX.STG_CTRL_DELIVERY STG,

JSON_TABLE (

stg.JSON_DOCUMENT,

'$."Payload".Data.Contacts.Contact[*]'

COLUMNS (

ROW_NUMBER FOR ORDINALITY,

NESTED PATH '$.CommunicationChannel[*]'

COLUMNS (

row_number_sub FOR ORDINALITY,

ChannelType VARCHAR2 (300) PATH '$.ChannelType',

ChannelValue VARCHAR2 (300) PATH '$.ChannelValue')))

con_data

WHERE STG.STATUS = 'to-process';

In addition, the sample payload must be inserted into the STG_CTRL_DELIVERY table as shown in the following code example:

Insert into STG_CTRL_DELIVERY

(DELIVERY_ID, DELIVERY_TIMESTAMP,JSON_DOCUMENT,STATUS)

Values

(1, SYSDATE, 'JSON_DOCUMENT','to-process');

Where 'JSON_DOCUMENT' must be replaced by the content of the JSON file:

{

"Payload": {

"Data": {

"Contacts": {

"Contact": [{

"ContactData": {

"ContactType": "I",

"CreationDate": "2017-01-04T08:10:45.765Z",

"ChangeDate": "2017-02-15T07:21:31.991Z",

"IndividualPerson": {

"Salutation": "Ms.",

"Firstname": "1_FIRSTNAME",

"Lastname": "1_LASTNAME",

"DateOfBirth": "1996-06-06"

}

},

"CommunicationChannel": [ {

"ActionType": "I",

"ChannelType": "Private eMail",

"ChannelValue": "1_FIRSTNAME.1_LASTNAME@test.de"

}

],

}, {

"ContactData":

"CreationDate": "2017-01-04T08:10:45.765Z",

"ChangeDate": "2017-02-15T07:21:31.991Z",

"IndividualPerson": {

"Salutation": "Mr.",

"Firstname": "2_FIRSTNAME",

"Lastname": "2_LASTNAME",

"DateOfBirth": "1987-01-03"

}

},

"CommunicationChannel": [ {

"ChannelType": "Private eMail",

"ChannelValue": "2_FIRSTNAME.2_LASTNAME@test.net"

},

{

"ChannelType": "additional eMail",

"ChannelValue": "MAIL2_2_FIRSTNAME.2_LASTNAME@test.net"

}

],

}
] }
}
}
{

Data selection

In this demo, data from JSON messages is read from this payload using views.

These messages are stored in a control table. An external process delivers the messages and sets the status to "to-process". This status is the scope of the views used here.

The views read the required attributes for each group in the JSON. Oracle JSON functions are used for this purpose. The relationships between the objects are represented via generated IDs.

The view data can then be transferred to tables and processed from there in the classic way. The status must be updated for the processed IDs after the transfer in order to remove them from the scope of the view.

The example payload in the control table contains two contacts with various attributes, both contacts are assigned e-mail addresses.

The example payload in the control table contains two contacts with various attributes, both contacts are assigned e-mail addresses.

The relationship between the main object and the sub-object is particularly important here.
Contact 1 (CON_ROWNUM 1) has one e-mail address (CON_COM_ROWNUM 1).
Contact 2 (CON_ROWNUM 2) has two e-mail addresses (CON_COM_ROWNUM 1+2).
An ID is generated at each level here (CON_ROWNUM + CON_COM_ROWNUM). This is realized via the option "ROW_NUMBER FOR ORDINALITY".


CREATE OR REPLACE VIEW V_STG_CTRL_DELIVERY_CONTACT_EMAIL
AS
SELECT STG.DELIVERY_ID,
ChannelType,
ChannelValue,
ROW_NUMBER AS con_rownum,
con_data.row_number_sub AS con_com_rownum
FROM HARSCAX.STG_CTRL_DELIVERY STG,
JSON_TABLE (
stg.JSON_DOCUMENT,
'$."Payload".Data.Contacts.Contact[*]'
COLUMNS (
ROW_NUMBER FOR ORDINALITY,
NESTED PATH '$.CommunicationChannel[*]'
COLUMNS (
row_number_sub FOR ORDINALITY,
ChannelType VARCHAR2 (300) PATH '$.ChannelType',
ChannelValue VARCHAR2 (300) PATH '$.ChannelValue')))
con_data
WHERE STG.STATUS = 'to-process';

Further documentation is available on the Oracle website.

Do you have any questions? We have the answers!

Please write to us. We look forward to hearing from you!