How can I remove duplicates in SQL?

Question:

Angelo M

2012-07-23 16:27:03 UTC

I'm having trouble getting rid of duplicates in SQL. My requirements are spcifically like this. Consider the the table:

C_ID | REQ_ID | EXP_DATE
24 211 Jun-18-12
39 211 Jun-18-12
26 211 April-7-12

So what I need is to choose the req_id with the latest date. But in the case of a tie, the tie is broken by choosing the record with the highest c_id. So in the example above, the result would be:

C_ID | REQ_ID | EXP_DATE
39 211 Jun-18-12

So far I'm doing something like this: (pseudo-code)

select max(c_id), req_id, exp_date
from (select req_id, max(exp_date)
from table
group by req_id) t1
join
(select c_id, req_id, exp_date
from table) t2 on (t1.exp_date = t2.exp_date)
group by req_id, exp_date

I'm using PL/SQL. It works, but its not very efficient. I'm wondering if anyone can help me make this better. Thank you very much.

Six answers:

Lonely Rogue

2012-07-25 00:06:52 UTC

What you've written is perfect and is efficient.

Moreover, @DAVID solution has a loop-hole that brings the wrong data. Suppose, you have below records -

c_id req_id exp_date

24 211 300

26 211 50

Per your requirement, the right answer is the first record - 24 211 300

but @DAVID's solution would bring this output - 26 211 300

because of the usage of MAX for two columns against the "req_id".

Below 2 solutions would give you the right answer.

1. This is your solution, but a slight syntactical, logical & formatting corrections were made.

SELECT MAX(c_id), t1.req_id, Maxexp_date

FROM

(

SELECT req_id, Maxexp_date=MAX(exp_date)

FROM TABLE_S GROUP BY req_id

) t1

JOIN ( SELECT c_id, req_id, exp_date FROM TABLE_S ) t2

ON t1.Maxexp_date = t2.exp_date AND t1.req_id=t2.req_id

GROUP BY t1.req_id, Maxexp_date

2. Uses RANKing functions. Check if your DBMS supports this ( actually, majority systems support this with a little variation in syntax )

SELECT * FROM

(

SELECT c_id,req_id,exp_date,

CorrectRecord= RANK() OVER ( PARTITION BY req_id

ORDER BY req_id ASC ,exp_date DESC,c_id DESC

)

FROM TABLE_S

) SQ

WHERE CorrectRecord=1

-- In 'thoughts'...

Lonely Rogue.

https://twitter.com/LonelyRogue

galt_57

2012-07-23 16:55:39 UTC

Maybe something like

SELECT TOP(1) *

FROM table

WHERE req_id = 211

ORDER BY exp_date, c_id

;

It is not clear what you are really trying to accomplish.

DAVID

2012-07-23 17:25:09 UTC

I think this might do it more efficiently --

SELECT MAX([C_ID]), MAX([REQ_ID]), [EXP_DATE]

FROM "TABLE"

GROUP BY [EXP_DATE]

ORDER BY [EXP_DATE] DESC

Then, take the first record off the top of the result set.

DavidR

eSwitchBoard.com

Jurgen

2012-07-23 16:37:08 UTC

You probably shouldn't allow duplicates in the first place. Maybe you could add a TRIGGER that checks if a record is already available and removes it if this is the case.

2016-12-10 08:58:46 UTC

sq.> sq.> -- sq.> opt for * from type; CLASSID FACID SCHED ROOM ------- ---------- ------- ---- ART103A F101 MWF9 H221 CSC201A F105 TUTHF10 M110 CSC203A F105 MTHF12 M110 HST205A F202 MWF11 H221 MTH101B F110 MTUTH9 H225 MTH103C F110 MWF11 H225 6 rows chosen. sq.> sq.> -- sq.> opt for C.room, F.facid, C.facid 2 FROM type C, type F 3 the place c.room = f.room 4 AND F.facid < C.facid; ROOM FACID FACID ---- ---------- ---------- H221 F101 F202 a million row chosen. sq.> sq.>

Dani1

2012-07-23 16:27:39 UTC

hmmm

ⓘ

This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.

about - legalese