Remove duplicate records in two or more columns

If we wanna have a primary key or unique index on two or more columns, using CTE and DERIVED TABLE with ROW_NUMBER is very useful to remove dup records. In fact that’s CTE usage is because we cannot use RANKING FUNCTIONS in WHERE clause.

For select dup records, we use CTE like this:

-- This is just an example
-- Don't worry about sample table structure

;with  T ( RN, GroupId, UserName )
       as ( select ROW_NUMBER() over ( partition by GroupId, UserName order by GroupId ) as RN
                 , GroupId
                 , UserName
                 from Tbl )
   select GroupId
        , UserName
        from T
        where T.RN > 1 ;

For delete those dup records, just change CTE to a DERIVED TABLE like this:

delete T
     from ( select ROW_NUMBER() over ( partition by GroupId, UserName order by GroupId ) as RN
                 , GroupId
                 , UserName
                 from Tbl ) T
     where T.RN > 1 ;

Advertisements
This entry was posted in Joy of T-SQL. Bookmark the permalink.

Leave a comment

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s