Question:
How can I remove duplicates in SQL?
Angelo M
2012-07-23 16:27:03 UTC
I'm having trouble getting rid of duplicates in SQL. My requirements are spcifically like this. Consider the the table:

C_ID | REQ_ID | EXP_DATE
24 211 Jun-18-12
39 211 Jun-18-12
26 211 April-7-12

So what I need is to choose the req_id with the latest date. But in the case of a tie, the tie is broken by choosing the record with the highest c_id. So in the example above, the result would be:

C_ID | REQ_ID | EXP_DATE
39 211 Jun-18-12

So far I'm doing something like this: (pseudo-code)

select max(c_id), req_id, exp_date
from (select req_id, max(exp_date)
from table
group by req_id) t1
join
(select c_id, req_id, exp_date
from table) t2 on (t1.exp_date = t2.exp_date)
group by req_id, exp_date

I'm using PL/SQL. It works, but its not very efficient. I'm wondering if anyone can help me make this better. Thank you very much.
Six answers:
Lonely Rogue
2012-07-25 00:06:52 UTC
What you've written is perfect and is efficient.



Moreover, @DAVID solution has a loop-hole that brings the wrong data. Suppose, you have below records -



c_id req_id exp_date

24 211 300

26 211 50



Per your requirement, the right answer is the first record - 24 211 300

but @DAVID's solution would bring this output - 26 211 300

because of the usage of MAX for two columns against the "req_id".



Below 2 solutions would give you the right answer.



1. This is your solution, but a slight syntactical, logical & formatting corrections were made.



SELECT MAX(c_id), t1.req_id, Maxexp_date

FROM

(

SELECT req_id, Maxexp_date=MAX(exp_date)

FROM TABLE_S GROUP BY req_id

) t1

JOIN ( SELECT c_id, req_id, exp_date FROM TABLE_S ) t2

ON t1.Maxexp_date = t2.exp_date AND t1.req_id=t2.req_id

GROUP BY t1.req_id, Maxexp_date



2. Uses RANKing functions. Check if your DBMS supports this ( actually, majority systems support this with a little variation in syntax )



SELECT * FROM

(

SELECT c_id,req_id,exp_date,

CorrectRecord= RANK() OVER ( PARTITION BY req_id

ORDER BY req_id ASC ,exp_date DESC,c_id DESC

)

FROM TABLE_S

) SQ

WHERE CorrectRecord=1



-- In 'thoughts'...

Lonely Rogue.

https://twitter.com/LonelyRogue
galt_57
2012-07-23 16:55:39 UTC
Maybe something like



SELECT TOP(1) *

FROM table

WHERE req_id = 211

ORDER BY exp_date, c_id

;



It is not clear what you are really trying to accomplish.
DAVID
2012-07-23 17:25:09 UTC
I think this might do it more efficiently --



SELECT MAX([C_ID]), MAX([REQ_ID]), [EXP_DATE]

FROM "TABLE"

GROUP BY [EXP_DATE]

ORDER BY [EXP_DATE] DESC



Then, take the first record off the top of the result set.



DavidR

eSwitchBoard.com
Jurgen
2012-07-23 16:37:08 UTC
You probably shouldn't allow duplicates in the first place. Maybe you could add a TRIGGER that checks if a record is already available and removes it if this is the case.
2016-12-10 08:58:46 UTC
sq.> sq.> -- sq.> opt for * from type; CLASSID FACID SCHED ROOM ------- ---------- ------- ---- ART103A F101 MWF9 H221 CSC201A F105 TUTHF10 M110 CSC203A F105 MTHF12 M110 HST205A F202 MWF11 H221 MTH101B F110 MTUTH9 H225 MTH103C F110 MWF11 H225 6 rows chosen. sq.> sq.> -- sq.> opt for C.room, F.facid, C.facid 2 FROM type C, type F 3 the place c.room = f.room 4 AND F.facid < C.facid; ROOM FACID FACID ---- ---------- ---------- H221 F101 F202 a million row chosen. sq.> sq.>
Dani1
2012-07-23 16:27:39 UTC
hmmm


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...