Very often you’ll find delimited text to be more suitable format option instead of splitting text in many rows.
Typical use cases for having delimited text includes preparation of data for further analysis, reporting, importing data into the spreadsheet etc.
To demonstrate the concept, I’ll use my demo table TEST with 10 million rows.
And here is the content of the table:
Method 1 – listagg
One extremely easy (and prefered) way to perform string concatenation is to use listagg function, like in the following example:
select
listagg(rpad(t.cache, 2)) within group (
order by t.timestamp
) as some_list
from test t
where rownum < 100;
And here is the output of string concatenation:
Problem with this method is in maximum string length of 4000 bytes.
If output is larger than 4000, I’ll get the following error:
select
listagg(rpad(t.cache, 2)) within group (
order by t.timestamp
) as some_list
from test t
where rownum < 10000000;
ORA-01489: result of string concatenation is too long
01489. 00000 - "result of string concatenation is too long"
*Cause: String concatenation result is more than the maximum size.
*Action: Make sure that the result is less than the maximum size.
This means I want be able to process my 10 million rows by using that method due to the 4000 bytes limitation.
One way to avoid that error (in case you are running 12c+) is to change the max_string_size standard parameter from STANDARD–> EXTENDED.
As this is a major change you need to perform the following steps:
CONN / AS SYSDBA
PURGE DBA_RECYCLEBIN
SHUTDOWN IMMEDIATE;
STARTUP UPGRADE;
ALTER SYSTEM SET max_string_size=extended;
@?/rdbms/admin/utl32k.sql
SHUTDOWN IMMEDIATE;
STARTUP;
Additionally you’ll need to ensure that the compatible parameter is: compatible >= 12.0.0.
Be warned that this is a one way street and there is no easy way to rollback that change.
If you still decide to go with that change, Oracle will store resulting string in varchar2 in case ending string fits within 4000 bytes limit, otherwise it will create a LOB and LOB index automatically in the background.
2. Method 2 – XMLAGG/ XMLPARSE
As the previous method is system-wide and requires a major database change to be performed, in case you don’t want to take that approach, there is an alternative way to achieve the same by using XMLAGG which takes a list of XML elements from one columns and returns an aggregated XML document in a single cell.
To do that I also need to use XMLPARSE function that parses and generates an XML instance from the evaluated result of value expression.
Here I also specified WELLFORMED to avoid check that value_expr resolves to well-formed XML document, as this will boost execution time.
select
xmlagg(
xmlparse(
content rpad(t.cache, 2) wellformed
)
order by t.timestamp
).getclobval() as some_list
from test t
where rownum < 100000;
Elapsed: 00:00:31.69
Statistics
----------------------------------------------------------
3 recursive calls
685 db block gets
1052 consistent gets
0 physical reads
0 redo size
23606 bytes sent via SQL*Net to client
14130 bytes received via SQL*Net from client
66 SQL*Net roundtrips to/from client
1 sorts (memory)
0 sorts (disk)
1 rows processed
In this case to process only 100k records it takes almost 32 seconds, while in my TEST table I have 10 million rows which I can’t process within reasonable time frame by using described method.
Summary:
First two methods are not the best fit as either I need to perform serious database level change that I can’t revert (easily), or in case of XML string manipulation, I have unacceptable slow query execution.
In the next post I’ll describe method that can allow you both: much better performance and no string size limitation.
Comments