MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Removing the duplicates

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
Ranjish
Beginner


Joined: 22 Dec 2002
Posts: 64
Topics: 28
Location: Chennai

PostPosted: Wed May 07, 2003 11:47 am    Post subject: Removing the duplicates Reply with quote

Hi,

I know that we can reove the duplicates by using
SUM FIELDS=NONE
But can we remove the duplicates without doing a sort ?

When I did a
SORT FIELDS=COPY
SUM FIELDS=NONE, it was not working.

Is there any way of doing this ?

cheers
Ranjish
Back to top
View user's profile Send private message
coolman
Intermediate


Joined: 03 Jan 2003
Posts: 283
Topics: 27
Location: US

PostPosted: Wed May 07, 2003 12:36 pm    Post subject: Reply with quote

Ranjish,
SORT FIELDS=COPY -> implies you are just copying the input onto the output. You can't have it with SUM FIELDS=NONE. The explanation is given below :

Duplicate removal :
-----------------------
1 -> How do you identify whether a record is a duplicate or not ?
2 -> Obviously, you would need a key field to determine that. Only when 2
two records in the file have the same key, you would say they are
duplicate records.
3 -> So, essentially, what you need to do is :
Change your Sort card like this :

Code:
 
sort fields=(1,10,ch,a) * -> Assuming the first 10 bytes is the key
sum fields=none


and run the job.

Hope this helps...

Cheers,
Coolman.

P.S : Why would you want to remove the duplicates, other than using SORT. BTW, there are lots of ways of doing it, but SORT is the most efficient and neat way of doing it.[/code]
________
Mazdaspeed6 specifications


Last edited by coolman on Sat Feb 05, 2011 1:20 am; edited 1 time in total
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Wed May 07, 2003 5:16 pm    Post subject: Reply with quote

Ranjish,

Theoretically, you could simulate SUM FIELDS=NONE while copying by storing all of the records in storage as you read them, determining which ones are the duplicates (that is, which ones have the same key) after all of the records are read, and then only writing out the non-duplicates and first record of each group of duplicates from storage. You could probably even do this with DFSORT using an E15 or E35 exit.

But unless you only have a few records to deal with, this isn't practical. It makes more sense to SORT the records so the duplicates are in order. That way, you can remove the unwanted duplicates as they're read without having to store them all. That's what SORT with SUM FIELDS=NONE does.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Ranjish
Beginner


Joined: 22 Dec 2002
Posts: 64
Topics: 28
Location: Chennai

PostPosted: Thu May 08, 2003 12:55 am    Post subject: Reply with quote

Frank/Coolman,

Thanks a lot for the replies.
So does this mean that we dont have any option to preserve the original sequence and still remove the duplicates?

cheers
Ranjish
Back to top
View user's profile Send private message
CaptBill
Beginner


Joined: 02 Dec 2002
Posts: 100
Topics: 2
Location: Pasadena, California, USA

PostPosted: Thu May 08, 2003 4:19 pm    Post subject: Reply with quote

Sure you can preserve the original sequence and still get rid of duplicates. How did you get the file in that sequence to begin with? Figure that out then just redo that sort or other process AFTER you remove the duplicates.

What this means I suppose you are saying is you have a file in sequence by FIELD-1. You want to remove all the duplicates based upon FIELD-2. So sort it by FIELD-2 with SUM FIELDS=NONE then take the output of that and resort it by FIELD-1.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Premkumar
Moderator


Joined: 28 Nov 2002
Posts: 77
Topics: 7
Location: Chennai, India

PostPosted: Thu May 08, 2003 11:03 pm    Post subject: Reply with quote

You can preserve the original sequence by following these steps.

  1. Add sequence number to the records before removing duplicates,
  2. Remove the duplicates by sorting on key w/ SUMFIELDS=NONE,
  3. Sort by the sequence number to resore the original sequence and
  4. Remove the sequence number.
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group