The LISTAGG function was introduced in Oracle 11gR2 to make string aggregation simpler. In Oracle 12cR2 it was extended to include overflow error handling. Oracle 19c includes the ability to remove duplicates from the LISTAGG results by including the DISTINCT keyword.
1. Setup
The examples in this article use the following table.
-- DROP TABLE EMP PURGE;
CREATE TABLE EMP (
EMPNO NUMBER(4) CONSTRAINT PK_EMP PRIMARY KEY,
ENAME VARCHAR2(10),
JOB VARCHAR2(9),
MGR NUMBER(4),
HIREDATE DATE,
SAL NUMBER(7,2),
COMM NUMBER(7,2),
DEPTNO NUMBER(2)
);
INSERT INTO EMP VALUES (7369,'SMITH','CLERK',7902,to_date('17-12-1980','dd-mm-yyyy'),800,NULL,20);
INSERT INTO EMP VALUES (7499,'ALLEN','SALESMAN',7698,to_date('20-2-1981','dd-mm-yyyy'),1600,300,30);
INSERT INTO EMP VALUES (7521,'WARD','SALESMAN',7698,to_date('22-2-1981','dd-mm-yyyy'),1250,500,30);
INSERT INTO EMP VALUES (7566,'JONES','MANAGER',7839,to_date('2-4-1981','dd-mm-yyyy'),2975,NULL,20);
INSERT INTO EMP VALUES (7654,'MARTIN','SALESMAN',7698,to_date('28-9-1981','dd-mm-yyyy'),1250,1400,30);
INSERT INTO EMP VALUES (7698,'BLAKE','MANAGER',7839,to_date('1-5-1981','dd-mm-yyyy'),2850,NULL,30);
INSERT INTO EMP VALUES (7782,'CLARK','MANAGER',7839,to_date('9-6-1981','dd-mm-yyyy'),2450,NULL,10);
INSERT INTO EMP VALUES (7788,'SCOTT','ANALYST',7566,to_date('13-JUL-87','dd-mm-rr')-85,3000,NULL,20);
INSERT INTO EMP VALUES (7839,'KING','PRESIDENT',NULL,to_date('17-11-1981','dd-mm-yyyy'),5000,NULL,10);
INSERT INTO EMP VALUES (7844,'TURNER','SALESMAN',7698,to_date('8-9-1981','dd-mm-yyyy'),1500,0,30);
INSERT INTO EMP VALUES (7876,'ADAMS','CLERK',7788,to_date('13-JUL-87', 'dd-mm-rr')-51,1100,NULL,20);
INSERT INTO EMP VALUES (7900,'JAMES','CLERK',7698,to_date('3-12-1981','dd-mm-yyyy'),950,NULL,30);
INSERT INTO EMP VALUES (7902,'FORD','ANALYST',7566,to_date('3-12-1981','dd-mm-yyyy'),3000,NULL,20);
INSERT INTO EMP VALUES (7934,'MILLER','CLERK',7782,to_date('23-1-1982','dd-mm-yyyy'),1300,NULL,10);
COMMIT;
2. The Problem
The default action of the LISTAGG function is shown below.
COLUMN employees FORMAT A40
SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM emp
GROUP BY deptno
ORDER BY deptno;
DEPTNO EMPLOYEES
---------- ----------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
3 rows selected.
SQL>
Let's add some extra people called "MILLER" into department 10, to give us duplicates in the aggregated list.
INSERT INTO emp VALUES (9998,'MILLER','ANALYST',7782,to_date('23-1-1982','dd-mm-yyyy'),1600,NULL,10);
INSERT INTO emp VALUES (9999,'MILLER','MANADER',7782,to_date('23-1-1982','dd-mm-yyyy'),1500,NULL,10);
COMMIT;
As expected, we now see multiple entries for the name "MILLER" in department 10.
COLUMN employees FORMAT A40
SELECT deptno, LISTAGG(ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM emp
GROUP BY deptno
ORDER BY deptno;
DEPTNO EMPLOYEES
---------- ----------------------------------------
10 CLARK,KING,MILLER,MILLER,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
3 rows selected.
SQL>
If that's what we are expecting, great. If we want to remove duplicates, what do we do?
3. The Solution : Pre-19c
We could solve this in a number of ways. In the following example we use the ROW_NUMBER analytic function to remove any duplicates, then use the conventional LISTAGG function to aggregate the data.
COLUMN employees FORMAT A40
SELECT e2.deptno, LISTAGG(e2.ename, ',') WITHIN GROUP (ORDER BY e2.ename) AS employees
FROM (SELECT e.*,
ROW_NUMBER() OVER (PARTITION BY e.deptno, e.ename ORDER BY e.empno) AS myrank
FROM emp e) e2
WHERE e2.myrank = 1
GROUP BY e2.deptno
ORDER BY e2.deptno;
DEPTNO EMPLOYEES
---------- ----------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
3 rows selected.
SQL>
Alternatively we could use DISTINCT in an inline view to remove the duplicate rows, then use the conventional LISTAGG function call to aggregate the data.
COLUMN employees FORMAT A40
SELECT e2.deptno, LISTAGG(e2.ename, ',') WITHIN GROUP (ORDER BY e2.ename) AS employees
FROM (SELECT DISTINCT e.deptno, e.ename
FROM emp e) e2
GROUP BY e2.deptno
ORDER BY e2.deptno;
DEPTNO EMPLOYEES
---------- ----------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
3 rows selected.
SQL>
4. The Solution: 19c Onward
Oracle 19c introduced a simpler solution. We can now include the DISTINCT keyword directly in the LISTAGG function call.
COLUMN employees FORMAT A40
SELECT deptno, LISTAGG(DISTINCT ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM emp
GROUP BY deptno
ORDER BY deptno;
DEPTNO EMPLOYEES
---------- ----------------------------------------
10 CLARK,KING,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
3 rows selected.
SQL>
The default functionality is to include all results, which we can express explicitly using the ALL keyword.
SELECT deptno, LISTAGG(ALL ename, ',') WITHIN GROUP (ORDER BY ename) AS employees
FROM emp
GROUP BY deptno
ORDER BY deptno;
DEPTNO EMPLOYEES
---------- ----------------------------------------
10 CLARK,KING,MILLER,MILLER,MILLER
20 ADAMS,FORD,JONES,SCOTT,SMITH
30 ALLEN,BLAKE,JAMES,MARTIN,TURNER,WARD
3 rows selected.
SQL>
0 comments:
Post a Comment