Look in:

Web oracle-core-dba.blogspot.com

Monday, January 07, 2008

Deleting Duplicate Records

You may have come across numerous situations where duplicate records needs to be deleted from some table quickly. More often than not this is due to an application or data transformation issue and the number of duplicate rows are generally small related to the overall number of rows in the table. DBAs struggle with this task and spend way too much time and so coming up with elaborate scripts.

Assuming the table looks like this:

ACCOUNT
ACCOUNT_ID
STATUS_CODE
FIRST_NAME
LAST_NAME
STREET_ADDRESS

In this example, there is no primary key or unique constraint preventing duplicates from being inserted. The "logical primary key" is ACCOUNT_ID and STATUS_CODE.

First we might want to view the duplicate records to assist with troubleshooting and this can be done with this query:

SELECT Account_Id,
Status_Code,
COUNT(* ) OccurAnces
FROM ACCOUNT
GROUP BY Account_Id,Status_Code
HAVING COUNT(* ) > 1;

And to delete them we rely on Oracle’s rowed which is a unique "key" that exists on every table row:

DELETE
FROM ACCOUNT A
WHERE
(ACCOUNT_ID,STATUS_CODE) IN (
SELECT
ACCOUNT_ID,
STATUS_CODE,
FROM ACCOUNT
GROUP BY
ACCOUNT_ID,
STATUS_CODE
HAVING
COUNT(*) > 1) AND
ROWID != (
SELECT
MIN(ROWID)
FROM ACCOUNT
WHERE
ACCOUNT_ID = A.ACCOUNT_ID AND
STATUS_CODE = A.STATUS_CODE);

Alternatively this more simplified query can be used, but in some cases it might not execute as fast as the above query because of having to do more random I/O:

DELETE FROM ACCOUNT a
WHERE RowId != (SELECT MIN(RowId)
FROM ACCOUNT
WHERE Account_Id = a.Account_Id
AND Status_Code = a.Status_Code);

Please make sure that you thoroughly understand your duplicate record problem and what makes a record unique before deleting duplicate records in a table. Make sure you test the above SQL statements in a test environment with a representative test case before attempting this in a production environment

No comments: