Duplicate records: delete all but one.
September 5, 2012 1 Comment
There are duplicate records in a table. No unique id. We want leave only one row and delete the rest. I will show four different ways how to do it.
if exists (select * from #t) drop table #t; create table #t (c1 char (3), c2 char(3)); insert into #t values ('aaa','aaa') insert into #t values ('aaa','bbb') insert into #t values ('aaa','bbb') insert into #t values ('aaa','ccc') insert into #t values ('aaa','ccc') insert into #t values ('aaa','ccc')
select * from #t
c1 c2
---- ----
aaa aaa
aaa bbb
aaa bbb
aaa ccc
aaa ccc
aaa ccc
1). MSSQL2005 and above can use common table expression (CTE):
WITH xxx AS ( SELECT c1, c2, ROW_NUMBER() OVER (ORDER BY c1,c2) AS 'RowNumber' FROM #t ) DELETE xxx WHERE RowNumber not in (SELECT min(RowNumber) FROM xxx GROUP BY c1,c2)
select * from #t
c1 c2
---- ----
aaa aaa
aaa bbb
aaa ccc
2). Older version of SQL server do not support CTE. I used different techniques with older versions. Select distinct row values into temporary table, delete all duplicated rows, then insert temp table back into original table. Simple script, but could be problems with keys.
if exists (select * from #t) drop table #t; create table #t (c1 char (3), c2 char(3)); insert into #t values ('aaa','aaa') insert into #t values ('aaa','bbb') insert into #t values ('aaa','bbb') insert into #t values ('aaa','ccc') insert into #t values ('aaa','ccc') insert into #t values ('aaa','ccc') go select distinct c1, c2 into #t1 from #t delete #t insert into #t select * from #t1
select * from #t
c1 c2
---- ----
aaa aaa
aaa bbb
aaa ccc
3). One more way is by using cursor. Cursor works with one row at the time even if rows are identical. That feature we will exploit. Open cursor, fetch value, update record then close cursor. I prefer update record first, visually check updated records making sure it updated correct records, then delete updated. This method works best when table has two duplicate rows. Here is an example of cursor update first duplicate record.
if exists (select * from #t) drop table #t; create table #t (c1 char (3)); insert into #t values ('aaa') insert into #t values ('aaa') insert into #t values ('aaa') go declare @str char(3) declare pcur cursor for select c1 from #t where c1='aaa' for update open pcur fetch pcur into @str update #t set c1='***' where current of pcur close pcur deallocate pcur
select * from #t
c1
----
***
aaa
aaa
4). And the last and easiest way for SQL2005 and older – use TOP clause.
if exists (select * from #t) drop table #t; go create table #t (c1 char (3)); insert into #t values ('aaa') insert into #t values ('aaa') insert into #t values ('aaa') go select * from #t delete top (2) from #t select * from #t
Pingback: How to Remove Duplicate Rows in SQL Server - SQL Training Online