How to split comma separated value strings into rows in Oracle Database

Often you receive a CSV string that you need to convert to rows. To do this with SQL you need to:

◉ Generate a row for each value

◉ For each row N, extract the value at position N from the string

◉ You can do this in Oracle Database with a query like:

with rws as (

select 'split,into,rows' str from dual

)

select regexp_substr (

str,

'[^,]+',

level

) value

from rws

connect by level <=

length ( str ) - length ( replace ( str, ',' ) ) + 1;

VALUE

split

into

rows

So what's going on here?

The connect by level clause generates a row for each value. It finds how many values there are by:

◉ Using replace ( str, ',' ) to remove all the commas from the string

◉ Subtracting the length of the replaced string from the original to get the number of commas

◉ Add one to this result to get the number of values

The regexp_substr extracts each value using this regular expression:

[^,]+

This searches for:

◉ Characters not in the list after the caret. So everything except a comma.

◉ The plus operator means it must match one or more of these non-comma characters

The third argument tells regexp_substr to start the search at the first character. And the final one instructs it to fetch the Nth occurrence of the pattern. So row one finds the first value, row two the second, and so on.

Oracle Database, Oracle Database Exam, Oracle Database Exam Prep, Oracle Database Tutorial and Materials, Oracle Database Certification, Oracle Database Learning, Oracle Database Career, Oracle Database Preparation

Photo by Toa Heftiba Şinca from Pexels

Before we move on, note the solution above assumes there are no commas at the start or end of the string. If this is possible, you should trim the string first:

with rws as (

select ',leading,commas,and,trailing,' str from dual

)

select regexp_substr (

str,

'[^,]+',

level

) value

from rws

connect by level <=

length ( trim ( both ',' from str ) ) -

length ( replace ( str, ',' ) ) + 1;

VALUE

leading

commas

and

trailing

You can also adapt this to split strings with other delimiters.

How to turn space, semicolon, pipe or other delimited strings into rows

CSV may be the most common format for delimited values. But often you'll want to split strings with other delimiters, such as pipes or semicolons. To do this, swap commas for the new character:

with rws as (

select 'split;semicolons;into;rows' str from dual

)

select regexp_substr (

str,

'[^;]+',

level

) value

from rws

connect by level <=

length ( str ) -

length ( replace ( str, ';' ) ) + 1;

VALUE

split

semicolons

into

rows

Using this basic method you can overcome a common challenge: passing a CSV as input to an in list.

How to do variable in lists

The in condition returns all rows where the target matches one of the values in a comma-separated list. For example:

select employee_id, first_name, last_name

from hr.employees

where last_name in ( 'King', 'Kochhar', 'De Haan' );

EMPLOYEE_ID FIRST_NAME LAST_NAME

102 Lex De Haan

156 Janette King

100 Steven King

101 Neena Kochhar

The issue here is the items in the list need to be separate values. If you pass them as a single CSV string like this:

select employee_id, first_name, last_name

from hr.employees

where last_name in ( 'King,Kochhar,De Haan' );

no rows selected

The query returns no rows!

This is because it's searching for someone with the name King,Kochhar,De Haan. Unsurprisingly, no one has that name.

This makes in hard to use with bind variables, particularly if the number of values is unknown.

Using the CSV-to-rows trick above you can pass the search string as a single value and split it into rows like this:

with rws as (

select 'King,Kochhar,De Haan' str from dual

)

select employee_id, first_name, last_name

from hr.employees

where last_name in (

select regexp_substr (

str,

'[^,]+',

level

) value

from rws

connect by level <=

length ( str ) -

length ( replace ( str, ',' ) ) + 1

);

EMPLOYEE_ID FIRST_NAME LAST_NAME

156 Janette King

100 Steven King

101 Neena Kochhar

102 Lex De Haan

This enables you to use a bind variable for the in list:

declare

emp_cur sys_refcursor;

search_str varchar2(100) := 'King,Kochhar,De Haan';

employee_id integer;

first_name varchar2(30);

last_name varchar2(30);

begin

open emp_cur for q'!select employee_id, first_name, last_name

from hr.employees

where last_name in (

select regexp_substr (

:str,

'[^,]+',

level

) value

from dual

connect by level <=

length ( :str ) - length ( replace ( :str, ',' ) ) + 1

)!' using search_str, search_str, search_str;

loop

fetch emp_cur into employee_id, first_name, last_name;

exit when emp_cur%notfound;

dbms_output.put_line (

employee_id || ' ' || first_name || ' ' || last_name

);

end loop;

end;

156 Janette King

100 Steven King

101 Neena Kochhar

102 Lex De Haan

This works well when you're splitting one value, such as a string literal or bind variable.

But what if the strings are stored in a table and you have many rows with strings to split?

Image by ejaugsburg from Pixabay

Splitting delimited values stored in columns into rows

Sometimes you'll find columns storing lists of separated values. For example:

create table csvs (

csv_id integer

primary key,

csv_text varchar2(1000)

);

insert into csvs

values ( 1, 'split,into,rows' );

insert into csvs

values ( 2, 'even,more,values,to,extract' );

commit;

This is bad design, violates first normal form, and causes many headaches.

For example, if you want to join each value to rows in another table, you need to convert the string into rows first.

You can do this with the CSV-to-rows trick above as the basis. But it needs a tweak.

If you swap dual for your table (csvs here), the query links every row with every other row in the table! This leads to generating a huge data set and slowing the query to a crawl.

There are a few ways to avoid this.

From 12c you can use a lateral join. This enables you to write a subquery that uses values from tables to its left. So the database only generates a tree corresponding to each source row.

This gives a query like:

select csv_id, regexp_substr (

csv_text,

'[^,]+',

) val

from csvs

cross join lateral (

select level rn from dual

connect by level <=

length ( csv_text ) - length ( replace ( csv_text, ',' ) ) + 1

);

CSV_ID VAL

1 split

1 into

1 rows

2 even

2 more

2 values

2 to

2 extract

If you're stuck on ancient versions of Oracle Database, you can extend the connect by clause instead:

select csv_id, regexp_substr (

csv_text,

'[^,]+',

level

) val

from csvs

connect by level <=

length ( csv_text ) - length ( replace ( csv_text, ',' ) ) + 1

and prior csv_text = csv_text

and prior sys_guid () is not null;

CSV_ID VAL

2 even

2 more

2 values

2 to

2 extract

1 split

1 into

1 rows

So what do the extra connect by clauses do?

◉ prior csv = csv ensures the query only creates new rows linked to the original

◉ prior sys_guid () is not null adds a unique bit of data to each row. This prevents cycles because Oracle Database uses all the columns with prior operators in the connect by to check for loops. If two rows in a tree have the same value for these columns, the database considers this a loop.

These methods help you turn delimited values into rows, but so far we've made a big assumption:

The separator cannot appear within the values themselves.

What do you do if it can?

Photo by Toa Heftiba Şinca from Pexels

Converting quoted values containing the separator into rows

Say you've received a CSV with the values in "lastname, firstname" format. Splitting on each comma will place last and first names on separate lines, giving twice the number of rows you need!

To overcome this change the regular expression. Instead of searching for the Nth token that excludes a comma, an alternative is to:

◉ Find the number of tokens by counting how many times "," (quote comma quote) appears; regexp_count ( str, '","' ) is a neat way to do this

◉ Find the position of the next double quote after every second comma. You can do this with instr ( str, '"', 1, ( rn * 2 ) - 1 )

◉ Return all the characters that are not a double quote from this point

Which looks something like:

with rws as (

select department_id,

listagg ( '"' || last_name || ', ' || first_name || '"', ',' )

within group ( order by employee_id ) str

from hr.employees

where department_id in ( 10, 20, 30 )

group by department_id

)

select department_id, regexp_substr (

str,

'[^"]+',

instr ( str, '"', 1, (rn*2)-1 ),

) employee

from rws, lateral (

select level rn from dual

connect by level <=

regexp_count ( str, '","' ) + 1

);

DEPARTMENT_ID EMPLOYEE

10 Whalen, Jennifer

20 Hartstein, Michael

20 Fay, Pat

30 Raphaely, Den

30 Khoo, Alexander

30 Baida, Shelli

30 Tobias, Sigal

30 Himuro, Guy

30 Colmenares, Karen

This makes two important assumptions:

◉ Every value contains exactly one comma. If there can be a different number of delimiters in each value, you'll need to adjust the logic.

◉ Each value is enclosed in quotes (or some other character not present in the values themselves)

A note on performance

At this point you may be wondering:

Why not use regexp_count in the original query to find how many values there? Why subtract the lengths of the strings?

The short answer is: you can!

But be aware that regular expressions are slower than other string manipulation functions.

On small data sets processing short CSVs you're unlikely to notice this difference. But if gaining every last microsecond matters to you, avoid the regular expressions. You can also swap out the regexp_substr expression in the select for one using substr and instr.

Whichever delimited values to rows method you use, writing the query is fiddly. As this is a common task, it'd be good to create a function to split strings for you.

You can split a single string with pipelined table functions (PTFs) like this:

create or replace type string_table as

table of varchar2 (4000);

create or replace function split_string (

delimited_string varchar2,

separator varchar2 default ','

) return string_table

pipelined

delimited_string_cleaned varchar2(32767);

substring varchar2(4000);

pos pls_integer;

begin

delimited_string_cleaned :=

trim ( both separator from delimited_string ) ||

separator;

pos := instr ( delimited_string_cleaned, separator );

substring := substr ( delimited_string_cleaned, 1, pos - 1 );

loop

exit when substring is null;

pipe row ( substring );

substring := substr (

delimited_string_cleaned,

pos + 1,

instr (

delimited_string_cleaned, separator, pos + 1

) - pos - 1

);

pos := instr ( delimited_string_cleaned, separator, pos + 1 );

end loop;

return;

end;

select *

from split_string ( 'King,Kochhar,De Haan' );

COLUMN_VALUE

King

Kochhar

De Haan

While PTFs can accept cursors, making a generic PTF that converts CSVs stored in a table to rows is hard.

Luckily Oracle Database 19c offers a better way: SQL macros!

Make reusable string-to-rows functions with SQL macros

Added in Oracle Database 19.6, SQL macros return table SQL expressions. From 21c you can also make macros that return scalar expressions.

You can think of SQL macros as query templates. They return a string containing the text of your expression. The function's parameters are placeholders in this string.

At parse time the database resolves the expression. It replaces the parameter placeholders with the text of the actual values you call it with. It then substitutes this expression back into your query.

This enables you to define a reusable query you can pass tables or columns to at runtime.

This is a big advantage over PTFs which can't accept tables as parameters.

Macros also resolve to pure SQL. This avoids any context switches from SQL to PL/SQL and gives the optimizer better visibility of the query. These benefits mean using macros is likely to be faster than PTFs.

In this post we've identified two template queries:

◉ Taking a CSV string literal and converting it into rows

◉ Extracting delimited values stored in a table

So we need two macros. One that accepts an input string, the other a table name and source column. You could create these as two functions with different names. Or create one overloaded function in a package. I'll do the latter here.

One overload will accept a string, the other a table and column containing delimited values. Both will have an optional parameter for the separator, which defaults to a comma:

create or replace package string_macros_pkg as

function split_string (

tab dbms_tf.table_t,

col dbms_tf.columns_t,

separator varchar2 default ','

) return clob sql_macro;

function split_string (

delimited_string varchar2,

separator varchar2 default ','

) return clob sql_macro;

end;

Place the template queries above into these like so:

create or replace package body string_macros_pkg as

function split_string (

tab dbms_tf.table_t,

col dbms_tf.columns_t,

separator varchar2 default ','

) return clob sql_macro as

sql_text clob;

begin

sql_text := 'select t.*,

regexp_substr (

' || col ( 1 ) || ',

''[^'' || separator || '']+'',

pos

) str,

pos

from tab t,

lateral (

select level pos

from dual

connect by level <=

length ( ' || col ( 1 ) || ' )

- length ( replace ( ' || col ( 1 ) || ', separator ) )

+ 1

)';

return sql_text;

end split_string;

function split_string (

delimited_string varchar2,

separator varchar2 default ','

) return clob sql_macro as

sql_text clob;

begin

sql_text := 'select

regexp_substr (

delimited_string,

''[^'' || separator || '']+'',

level

) str,

level pos

from dual

connect by level <=

length ( delimited_string )

- length ( replace ( delimited_string, separator ) )

+ 1';

return sql_text;

end split_string;

end;

With this in place, you can now call the macro, passing either a string or a table's column to split. Here are a few examples to get you going:

select employee_id, first_name, last_name

from hr.employees

where last_name in (

select str

from string_macros_pkg.split_string ( 'King,Kochhar,De Haan' )

);

EMPLOYEE_ID FIRST_NAME LAST_NAME

100 Steven King

156 Janette King

101 Neena Kochhar

102 Lex De Haan

select employee_id, first_name, last_name

from hr.employees

join string_macros_pkg.split_string ( 'King,Kochhar,De Haan' )

on last_name = str;

EMPLOYEE_ID FIRST_NAME LAST_NAME

100 Steven King

156 Janette King

101 Neena Kochhar

102 Lex De Haan

select csv_id, str, pos

from string_macros_pkg.split_string (

csvs, columns ( csv_text ), ','

);

CSV_ID STR POS

1 split 1

1 into 2

1 rows 3

2 even 1

2 more 2

2 values 3

2 to 4

2 extract 5

Source: oracle.com

DB Exam Study

Monday, November 22, 2021

How to split comma separated value strings into rows in Oracle Database

How to turn space, semicolon, pipe or other delimited strings into rows

How to do variable in lists

Splitting delimited values stored in columns into rows

Converting quoted values containing the separator into rows

A note on performance

Make reusable string-to-rows functions with SQL macros

0 comments:

Post a Comment

Blog Archive

Labels

Popular Posts

Total Pageviews

Google Translate

DB Exam Study

Monday, November 22, 2021

How to split comma separated value strings into rows in Oracle Database

How to turn space, semicolon, pipe or other delimited strings into rows

How to do variable in lists

Splitting delimited values stored in columns into rows

Converting quoted values containing the separator into rows

A note on performance

Make reusable string-to-rows functions with SQL macros

Related Posts

0 comments:

Post a Comment

Blog Archive

Labels

Popular Posts

Subscribe To

Total Pageviews

Google Translate