Last time I’ve suggested two ways of how to perform string concatenation.

Unfortunately, both approaches have serious drawbacks that you might experience.

In this post I’ll explain one approach which is not Oracle version dependent, which corrects both shortcomings of previously described methods:

  • size of concatenated string
  • speed of data processing

Part I you can find on the following link:

https://www.josip-pojatina.com/en/string-aggregation-part-i/

 

Method 3 – cast/collect

I’ll again concatenate CACHE column from my TEST table.

As CACHE column is of type Number, first I need to create appropriate TYPE:

create or replace type name_coll_type as table of number;

 

Next I need to create a function that will accept collection (previously created table of number) as input parameter and return CLOB.

There s nothing special with that function, that would require further explanation.

create or replace function name_coll_type_to_clob (
   p_name_coll    in name_coll_type
 , p_delimiter    in varchar2 default null
)
   return clob
is
   v_name_string clob := ' ';
   
begin
   for idx in p_name_coll.first..p_name_coll.last
   loop
      if idx = p_name_coll.first then
         dbms_lob.open(v_name_string, DBMS_LOB.LOB_READWRITE);
         dbms_lob.write(v_name_string, length(p_name_coll(idx)), 1, p_name_coll(idx));
      else
         DBMS_LOB.WRITEAPPEND(v_name_string, length(p_name_coll(idx) || p_delimiter), p_delimiter || p_name_coll(idx));
      end if;
   end loop;
   DBMS_LOB.CLOSE(v_name_string);
   return v_name_string;
end name_coll_type_to_clob;

 

Finally I need to execute the query:

select
name_coll_type_to_clob(
      cast(
         collect(
            cache
            order by timestamp
         )
         as name_coll_type
    )
  , ','
 ) as some_list
from test t
where rownum < 10000000;


Elapsed: 00:02:08.11

Statistics
----------------------------------------------------------
	302  recursive calls
   89615909  db block gets
   20002237  consistent gets
     105463  physical reads
	  0  redo size
      23606  bytes sent via SQL*Net to client
      14131  bytes received via SQL*Net from client
	 66  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  1  sorts (disk)
	  1  rows processed

 

Oracle COLLECT function enables aggregating data into a collection retaining multiple records within a single row – similar to the nested tables.

CAST function converts input collection (result of COLLECT function) into another collection, where I need to specify exactly which one.

That’s why I have to use “as name_coll_type”.

More about SQL functions you can find in Oracle Database SQL Language Reference book, which is available at the following link:

https://docs.oracle.com/en/database/oracle/oracle-database/19/sqlrf/COLLECT.html#GUID-A0A74602-2A97-449B-A3EC-847D38D3DA90

As you can observe, SQL completes in 2 minutes and 8 seconds which is a very good result, when compared with two previous attempt, and taking into account extensive context switching – 10 million times between SQL and PL/SQL engine.

Can we do better?

 

Method 4 – apex_string – join_clob function

In this method I’m going to use Oracel APEX join_clob function.

For this to work, you need to first install APEX.

You can download APEX from the following link:

https://www.oracle.com/tools/downloads/apex-downloads.html

 

You can choose the latest version of APEX (at the time of writing it is 19.2) no matter which database version you are running (e.g. you can install APEX 19.2 on Oracle database 12c).

If you only need one language (english), you can download smaller version.

After downloading and unpacking zip file, you need to connect as SYSDBA to create tablespace and execute the installation script:

connect / as sysdba 

create tablespace apex datafile '/u01/app/oracle/oradata/ORCL/apex01.dbf' size 10m autoextend on next 10m maxsize unlimited
extent management local autoallocate;


--execute as SYSDBA
@apexins.sql apex apex temp /i/


Position 1: Name of tablespace for Application Express application user – apex 
Position 2: Name of tablespace for Application Express files user - apex
Position 3: Name of temporary tablespace or tablespace group - temp
Position 4: Virtual directory for APEX images - i

 

You don’t need to create web listener if you just want to install APEX functions and you are not planning to run APEX apps.

Finally I can execute the query:

select
   apex_string.join_clob(
      cast(
         collect(
            rpad(t.cache, 2)
            order by t.timestamp
         )
         as apex_t_varchar2
      )
  , ''
  , 12 /* dbms_lob.call */
 ) as some_list
from test t
where rownum < 10000000;

Elapsed: 00:00:24.33

Statistics
----------------------------------------------------------
	283  recursive calls
      55942  db block gets
      92872  consistent gets
      29510  physical reads
	  0  redo size
      23610  bytes sent via SQL*Net to client
      14130  bytes received via SQL*Net from client
	 66  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  1  sorts (disk)
	  1  rows processed

 

Again I’m using cast/collect combination as in previous method (name_coll_type type/name_coll_type_to_clob function), but this time I added apex_string.join_clob function.

But this approach has brought the best result so far (from 2 minutes 8 sec → 24 sec).

 

Method 5 – json_arrayagg / json_value

This is the last method I’ll explain.

JSON_ARRAYAGG function takes as input column of SQL expression, converts it to JSON value and returns JSON array where each element is JSON value.

JSON_VALUE converts JSON value into the SQL value.

Here is the SQL code:

select
   json_value(
      replace(
         json_arrayagg(
            rpad(t.cache, 2)
            order by t.timestamp
            returning clob
         )
       , '","'
     , ''
    )
  , '$[0]' returning clob
 ) as some_list
from test t
where rownum < 10000000;


Elapsed: 00:00:14.92

Statistics
----------------------------------------------------------
	406  recursive calls
     670738  db block gets
     175185  consistent gets
      29510  physical reads
	  0  redo size
      23606  bytes sent via SQL*Net to client
      14130  bytes received via SQL*Net from client
	 66  SQL*Net roundtrips to/from client
	  0  sorts (memory)
	  1  sorts (disk)
	  1  rows processed

 

The JSON method is even better than previous one, as elapsed time dropped from 24 sec → 15 sec.

 

Summary:

There are many way to achieve desired functionality.

In this and previous posts I’ve listed five different methods.

While there are many ways to achieve the same goal, only one or two ways are usually scalable and fast.

As always, the key is to understand how, under the hood, some functionality is implemented.



Get notified when a new post is published!

 


Comments

There are no comments yet. Why not start the discussion?

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.