MySQL Mastery: Conquering Duplicate Rows with DELETE JOIN and ROW_NUMBER()
Here are some additional points to consider:
This code removes duplicates based on all columns in the table my_table
.
DELETE t1
FROM my_table t1
INNER JOIN my_table t2 ON t1.id = t2.id AND t1.id < t2.id;
Explanation:
- We use
DELETE
to delete rows. t1
andt2
are aliases for themy_table
table used within the join.INNER JOIN
identifies rows whereid
values match betweent1
andt2
.t1.id < t2.id
ensures we only delete the older row (based on theid
order).
ROW_NUMBER() function:
DELETE FROM my_table
WHERE row_number() OVER (PARTITION BY name, email ORDER BY id) > 1;
DELETE FROM my_table
specifies the table to modify.WHERE
clause filters the rows to be deleted.ROW_NUMBER()
assigns a unique number to each row within partitions defined byname
andemail
(ordered byid
).- The
WHERE
conditionrow_number()... > 1
targets rows where the number isn't 1 (indicating duplicates).
Remember:
- Replace
my_table
with your actual table name. - Modify the columns used in the examples (
id
,name
, andemail
) to match your definition of unique rows. - It's recommended to back up your table before running these DELETE statements.
This method involves creating a temporary table to store the unique rows and then replacing the original table with the filtered data.
CREATE TEMPORARY TABLE unique_data AS
SELECT DISTINCT *
FROM your_table;
TRUNCATE TABLE your_table;
INSERT INTO your_table
SELECT * FROM unique_data;
DROP TEMPORARY TABLE unique_data;
- This method first creates a temporary table
unique_data
. SELECT DISTINCT * FROM your_table
selects all columns with duplicates removed usingDISTINCT
.TRUNCATE TABLE your_table
removes all existing data from the original table.INSERT INTO your_table SELECT * FROM unique_data
populates the original table with the unique entries from the temporary table.- Finally,
DROP TEMPORARY TABLE unique_data
removes the temporary table.
Using GROUP BY:
This method uses the GROUP BY
clause to group rows and then optionally keeps only the first occurrence (or applies an aggregate function).
DELETE FROM your_table
WHERE id NOT IN (
SELECT MIN(id)
FROM your_table
GROUP BY column1, column2, ...
);
- This approach uses
DELETE
to remove rows. - The subquery within
WHERE
utilizesGROUP BY
to group rows based on specified columns (column1
,column2
, etc.). MIN(id)
selects the minimumid
within each group, effectively keeping the first occurrence.id NOT IN (...)
identifies rows where theid
doesn't match the minimum IDs from the groups, meaning they are duplicates.
Choosing the Right Method:
- DELETE JOIN is efficient for smaller tables and when you want to delete duplicates based on specific conditions.
- ROW_NUMBER() is flexible for defining unique rows based on various columns and ordering.
- Temporary Table is a good option for larger tables but might require more temporary storage space.
- GROUP BY is efficient for simple duplicate removal based on specific columns and can be combined with aggregate functions.
mysql sql duplicates