Monday, June 21, 2021

LISTAGG DISTINCT in Oracle Database 19c

Oracle Database 19c, Oracle Database Tutorial and Material, Oracle Database Career, Oracle Database Prep, Database Study Material

The LISTAGG function was introduced in Oracle 11gR2 to make string aggregation simpler. In Oracle 12cR2 it was extended to include overflow error handling. Oracle 19c includes the ability to remove duplicates from the LISTAGG results by including the DISTINCT keyword.

1. Setup

The examples in this article use the following table.

-- DROP TABLE EMP PURGE;

CREATE TABLE EMP (

  EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,

  ENAME VARCHAR2(10),

  JOB VARCHAR2(9),

  MGR NUMBER(4),

  HIREDATE DATE,

  SAL NUMBER(7,2),

  COMM NUMBER(7,2),

  DEPTNO NUMBER(2)

);

INSERT INTO EMP VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);

INSERT INTO EMP VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);

INSERT INTO EMP VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);

INSERT INTO EMP VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);

INSERT INTO EMP VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);

INSERT INTO EMP VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);

INSERT INTO EMP VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);

INSERT INTO EMP VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);

INSERT INTO EMP VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);

INSERT INTO EMP VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);

INSERT INTO EMP VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);

INSERT INTO EMP VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);

INSERT INTO EMP VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);

INSERT INTO EMP VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);

COMMIT;

2. The Problem

The default action of the LISTAGG function is shown below.

COLUMN employees FORMAT A40

SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees

FROM   emp

GROUP BY deptno

ORDER BY deptno;

    DEPTNO EMPLOYEES

---------- ----------------------------------------

        10 CLARK,KING,MILLER

        20 ADAMS,FORD,JONES,SCOTT,SMITH

        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

Let's add some extra people called "MILLER" into department 10, to give us duplicates in the aggregated list.

INSERT INTO emp VALUES (9998,'MILLER','ANALYST',7782,to_date('23-1-1982','dd-mm-yyyy'),1600,NULL,10);

INSERT INTO emp VALUES (9999,'MILLER','MANADER',7782,to_date('23-1-1982','dd-mm-yyyy'),1500,NULL,10);

COMMIT;

As expected, we now see multiple entries for the name "MILLER" in department 10.

COLUMN employees FORMAT A40

SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees

FROM   emp

GROUP BY deptno

ORDER BY deptno;

    DEPTNO EMPLOYEES

---------- ----------------------------------------

        10 CLARK,KING,MILLER,MILLER,MILLER

        20 ADAMS,FORD,JONES,SCOTT,SMITH

        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

If that's what we are expecting, great. If we want to remove duplicates, what do we do?

3. The Solution : Pre-19c

We could solve this in a number of ways. In the following example we use the ROW_NUMBER analytic function to remove any duplicates, then use the conventional LISTAGG function to aggregate the data.

COLUMN employees FORMAT A40

SELECT e2.deptno, LISTAGG(e2.ename, ',') WITHIN GROUP (ORDER BY e2.ename) AS employees

FROM   (SELECT e.*,

               ROW_NUMBER() OVER (PARTITION BY e.deptno, e.ename ORDER BY e.empno) AS myrank

        FROM   emp e) e2

WHERE  e2.myrank = 1

GROUP BY e2.deptno

ORDER BY e2.deptno;

    DEPTNO EMPLOYEES

---------- ----------------------------------------

        10 CLARK,KING,MILLER

        20 ADAMS,FORD,JONES,SCOTT,SMITH

        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

Alternatively we could use DISTINCT in an inline view to remove the duplicate rows, then use the conventional LISTAGG function call to aggregate the data.

COLUMN employees FORMAT A40

SELECT e2.deptno, LISTAGG(e2.ename, ',') WITHIN GROUP (ORDER BY e2.ename) AS employees

FROM   (SELECT DISTINCT e.deptno, e.ename

        FROM   emp e) e2

GROUP BY e2.deptno

ORDER BY e2.deptno;

    DEPTNO EMPLOYEES

---------- ----------------------------------------

        10 CLARK,KING,MILLER

        20 ADAMS,FORD,JONES,SCOTT,SMITH

        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

4. The Solution: 19c Onward

Oracle 19c introduced a simpler solution. We can now include the DISTINCT keyword directly in the LISTAGG function call.

COLUMN employees FORMAT A40

SELECT deptno, LISTAGG(DISTINCT ename, ',') WITHIN GROUP (ORDER BY ename) AS employees

FROM   emp

GROUP BY deptno

ORDER BY deptno;

    DEPTNO EMPLOYEES

---------- ----------------------------------------

        10 CLARK,KING,MILLER

        20 ADAMS,FORD,JONES,SCOTT,SMITH

        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

The default functionality is to include all results, which we can express explicitly using the ALL keyword.

SELECT deptno, LISTAGG(ALL ename, ',') WITHIN GROUP (ORDER BY ename) AS employees

FROM   emp

GROUP BY deptno

ORDER BY deptno;

    DEPTNO EMPLOYEES

---------- ----------------------------------------

        10 CLARK,KING,MILLER,MILLER,MILLER

        20 ADAMS,FORD,JONES,SCOTT,SMITH

        30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD

3 rows selected.

SQL>

Related Posts

0 comments:

Post a Comment