MVSFORUMS.com

Ranjish · Posted: Wed May 07, 2003 11:47 am Post subject: Removing the duplicates

Hi,

I know that we can reove the duplicates by using
SUM FIELDS=NONE
But can we remove the duplicates without doing a sort ?

When I did a
SORT FIELDS=COPY
SUM FIELDS=NONE, it was not working.

Is there any way of doing this ?

cheers
Ranjish

coolman · Posted: Wed May 07, 2003 12:36 pm Post subject:

Ranjish,
SORT FIELDS=COPY -> implies you are just copying the input onto the output. You can't have it with SUM FIELDS=NONE. The explanation is given below :

Duplicate removal :
-----------------------
1 -> How do you identify whether a record is a duplicate or not ?
2 -> Obviously, you would need a key field to determine that. Only when 2
two records in the file have the same key, you would say they are
duplicate records.
3 -> So, essentially, what you need to do is :
Change your Sort card like this :

Frank Yaeger · Posted: Wed May 07, 2003 5:16 pm Post subject:

Ranjish,

Theoretically, you could simulate SUM FIELDS=NONE while copying by storing all of the records in storage as you read them, determining which ones are the duplicates (that is, which ones have the same key) after all of the records are read, and then only writing out the non-duplicates and first record of each group of duplicates from storage. You could probably even do this with DFSORT using an E15 or E35 exit.

But unless you only have a few records to deal with, this isn't practical. It makes more sense to SORT the records so the duplicates are in order. That way, you can remove the unwanted duplicates as they're read without having to store them all. That's what SORT with SUM FIELDS=NONE does.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

Ranjish · Posted: Thu May 08, 2003 12:55 am Post subject:

Frank/Coolman,

Thanks a lot for the replies.
So does this mean that we dont have any option to preserve the original sequence and still remove the duplicates?

cheers
Ranjish

CaptBill · Posted: Thu May 08, 2003 4:19 pm Post subject:

Sure you can preserve the original sequence and still get rid of duplicates. How did you get the file in that sequence to begin with? Figure that out then just redo that sort or other process AFTER you remove the duplicates.

What this means I suppose you are saying is you have a file in sequence by FIELD-1. You want to remove all the duplicates based upon FIELD-2. So sort it by FIELD-2 with SUM FIELDS=NONE then take the output of that and resort it by FIELD-1.

Premkumar · Posted: Thu May 08, 2003 11:03 pm Post subject:

You can preserve the original sequence by following these steps.

Add sequence number to the records before removing duplicates,
Remove the duplicates by sorting on key w/ SUMFIELDS=NONE,
Sort by the sequence number to resore the original sequence and
Remove the sequence number.