Duplicate records: delete all but one.

There are duplicate records in a table. No unique id. We want leave only one row and delete the rest. I will show four different ways how to do it.

if exists (select * from #t) drop table #t;
create table #t (c1 char (3), c2 char(3));
insert into #t values ('aaa','aaa')
insert into #t values ('aaa','bbb')
insert into #t values ('aaa','bbb')
insert into #t values ('aaa','ccc')
insert into #t values ('aaa','ccc')
insert into #t values ('aaa','ccc')



select * from #t
c1 c2
---- ----
aaa aaa
aaa bbb
aaa bbb
aaa ccc
aaa ccc
aaa ccc

1). MSSQL2005 and above can use common table expression (CTE):

WITH xxx AS
(
    SELECT c1, c2,
    ROW_NUMBER() OVER (ORDER BY c1,c2) AS 'RowNumber'
    FROM #t
)
DELETE xxx WHERE RowNumber not in
(SELECT min(RowNumber) FROM xxx GROUP BY c1,c2)


select * from #t
c1 c2
---- ----
aaa aaa
aaa bbb
aaa ccc


2). Older version of SQL server do not support CTE. I used different techniques with older versions. Select distinct row values into temporary table, delete all duplicated rows, then insert temp table back into original table. Simple script, but could be problems with keys.

if exists (select * from #t) drop table #t;
create table #t (c1 char (3), c2 char(3));
insert into #t values ('aaa','aaa')
insert into #t values ('aaa','bbb')
insert into #t values ('aaa','bbb')
insert into #t values ('aaa','ccc')
insert into #t values ('aaa','ccc')
insert into #t values ('aaa','ccc')
go
select distinct c1, c2 into #t1 from #t 
delete #t
insert into #t select * from #t1


select * from #t
c1 c2
---- ----
aaa aaa
aaa bbb
aaa ccc


3). One more way is by using cursor. Cursor works with one row at the time even if rows are identical. That feature we will exploit. Open cursor, fetch value, update record then close cursor. I prefer update record first, visually check updated records making sure it updated correct records, then delete updated. This method works best when table has two duplicate rows. Here is an example of cursor update first duplicate record.

if exists (select * from #t) drop table #t;
create table #t (c1 char (3));
insert into #t values ('aaa')
insert into #t values ('aaa')
insert into #t values ('aaa')
go
declare @str char(3)
declare  pcur cursor for
select c1 from #t where c1='aaa'
for update
open pcur
fetch pcur into @str
update #t set c1='***' where current of pcur
close pcur
deallocate pcur


select * from #t
c1
----
***
aaa
aaa

4). And the last and easiest way for SQL2005 and older – use TOP clause.

if exists (select * from #t) drop table #t;
go
create table #t (c1 char (3));
insert into #t values ('aaa')
insert into #t values ('aaa')
insert into #t values ('aaa')
go
select * from #t
delete top (2) from #t
select * from #t
Advertisements

One Response to Duplicate records: delete all but one.

  1. Pingback: How to Remove Duplicate Rows in SQL Server - SQL Training Online

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: