MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Eliminate duplicates using OUTFIL- SYNCSORT

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
rasprasads
Beginner


Joined: 10 Dec 2002
Posts: 59
Topics: 20
Location: Chennai

PostPosted: Fri Dec 13, 2002 2:20 am    Post subject: Eliminate duplicates using OUTFIL- SYNCSORT Reply with quote

How to eliminate duplicates when using OUTFIL statement in SYNCSORT.

My SYSIN would be like:

Code:

   SORT FIELDS=(509,5,PD,A)
   ...
   OUTFIL FILES=2,                                   
     INCLUDE=(35,3,CH,EQ,C'010')
 ...


Here i need only the first record to be present in the output file 2(as all the records would have same data on position 509(length=5))
_________________
Rasprasad S
Back to top
View user's profile Send private message
Mike Tebb
Beginner


Joined: 02 Dec 2002
Posts: 20
Topics: 0
Location: Yorkshire, England

PostPosted: Fri Dec 13, 2002 3:04 am    Post subject: Reply with quote

SUM FIELDS=NONE will remove all records where the sort fields are duplicated.
_________________
Cheers - Mike
Back to top
View user's profile Send private message
rasprasads
Beginner


Joined: 10 Dec 2002
Posts: 59
Topics: 20
Location: Chennai

PostPosted: Fri Dec 13, 2002 3:48 am    Post subject: Reply with quote

Mike,
Thanks for your response but what i wanted to know was how to eliminate the duplicates when using OUTFIL.

I have tried

Code:

//SYSIN    DD  *                                     
  SORT FIELDS=(509,5,PD,A)                               
  OUTFIL FILES=1,                                     
    INCLUDE=(35,3,CH,EQ,C'010')                       
  OUTFIL FILES=2,                                     
    INCLUDE=(35,3,CH,EQ,C'020'),   
 [color=blue]SOME FIELDS=NONE                                   [/color]


But this would produce an error OUTFIL STATEMENT : SYNTAX ERROR Exclamation

Hope now you understand the problem.I need to eliminate the duplicates only in the second file.

I need a solution for this please...
_________________
Rasprasad S
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12368
Topics: 75
Location: San Jose

PostPosted: Fri Dec 13, 2002 5:35 am    Post subject: Reply with quote

Rasprasad,

The following jcl will give you the desired results. you cannot code sum fields statement on outfil. so initially we copy the out2 file to a temp file and then sort it to remove the duplicates.

Code:

//STEP0100 EXEC PGM=SYNCTOOL
//*
//TOOLMSG  DD SYSOUT=*
//DFSMSG   DD SYSOUT=*
//IN       DD DSN=YOUR INPUT FILE,
//            DISP=SHR
//T1       DD DSN=&T1,DISP=(,PASS),UNIT=SYSDA,SPACE=(CYL,(X,Y),RLSE)
//OUT1     DD DSN=OUTPUT FILE1,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//OUT2     DD DSN=OUTPUT FILE2,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//TOOLIN   DD *
  SORT   FROM(IN) USING(CTL1)
  SORT   FROM(T1) USING(CTL2)
//CTL1CNTL DD *
  SORT FIELDS=(509,5,PD,A)
  OUTFIL FNAMES=OUT1,INCLUDE=(35,3,CH,EQ,C'010')
  OUTFIL FNAMES=T1,INCLUDE=(35,3,CH,EQ,C'020')
//CTL2CNTL DD *
  SORT FIELDS=(509,5,PD,A)
  SUM FIELDS=NONE
  OUTFIL FNAMES=OUT2
/*


Hope this helps...

cheers

kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike Tebb
Beginner


Joined: 02 Dec 2002
Posts: 20
Topics: 0
Location: Yorkshire, England

PostPosted: Fri Dec 13, 2002 6:24 am    Post subject: Reply with quote

Kolusu,

I had come up with the following:

Code:

//TOOLIN   DD  *                   
  SORT FROM(IN) TO(OUT1) USING(CTL1)
  SORT FROM(IN) TO(OUT2) USING(CTL2)
/*                                 
//CTL1CNTL DD  *                   
  SORT    FIELDS=(509,5,PD,A)       
  INCLUDE COND=(35,3,CH,EQ,C'010') 
/*                                 
//CTL2CNTL DD  *                   
  SORT    FIELDS=(509,5,PD,A)       
  SUM FIELDS=NONE                   
  INCLUDE COND=(35,3,CH,EQ,C'020') 
/*                                 


My knowledge of SYNCTOOL is basically what I have picked up from your posts, so could you comment on whether my solution is okay/efficient etc.
I suspect that my solution may involve two passes of the file, so would affect run times on a large input file.
_________________
Cheers - Mike
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12368
Topics: 75
Location: San Jose

PostPosted: Fri Dec 13, 2002 6:38 am    Post subject: Reply with quote

Mike,

Techinically speaking your solution is okay. But efficiency wise it is not. The second pass is doing a sum sort on the entire file instead of the specific records.

Let us assume that the input file have 1 million records and 800,000 records are having '10' in postion 35 and only 200,000 records are having '20' in position 35.Out of these 200,000 only 50,000 are duplicates.

So if we split the file initially into 2 different files and then sumsort on the smaller files.

Your solution will sum sort on the entire file and it affects the performance.

For relatively small files you wont even notice the difference.

Even in my posted solution , the first step can be a COPY instead of sort. But I assumed rasprasad wanted to have both files sorted on field at 509.

I hope I explained it clearly.Let me know if you have any questions

Thanks

Kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike Tebb
Beginner


Joined: 02 Dec 2002
Posts: 20
Topics: 0
Location: Yorkshire, England

PostPosted: Fri Dec 13, 2002 6:40 am    Post subject: Reply with quote

That is exactly what I was guessing would happen.

Thanks (as ever) for your advice.
_________________
Cheers - Mike
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12368
Topics: 75
Location: San Jose

PostPosted: Fri Dec 13, 2002 6:55 am    Post subject: Reply with quote

Mike,

Well this another version to implement your logic. we first sumsort the entire file to eliminate the duplicates, but we will be writting the duplicates to SORTXSUM file.The XSUM parameter of the SUM statement sends all records deleted to the //SORTXSUM DD.

The second step takes in the SORTXSUM file and includes only '10' records and appends the data to out1 file

Code:

//STEP0100 EXEC PGM=SORT
//*
//SYSOUT   DD SYSOUT=*
//SORTIN   DD DSN=YOUR INPUT FILE,
//            DISP=SHR
//OUT1     DD DSN=OUTPUT FILE1,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//OUT2     DD DSN=OUTPUT FILE2,
//            DISP=(NEW,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//SORTXSUM DD DSN=&T1,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)
//SYSIN    DD *
  SORT FIELDS=(509,5,PD,A)
  SUM FIELDS=NONE,XSUM
  OUTFIL FNAMES=OUT1,INCLUDE=(35,3,CH,EQ,C'010')
  OUTFIL FNAMES=OUT2,INCLUDE=(35,3,CH,EQ,C'020')
/*
//STEP0200 EXEC PGM=SORT
//*
//SYSOUT   DD SYSOUT=*
//SORTIN   DD DSN=YOUR SORTXSUM  FILE CREATED ABOVE,
//            DISP=(OLD,PASS)
//OUT1     DD DSN=OUTPUT FILE1,
//            DISP=(MOD,CATLG,DELETE),
//            UNIT=SYSDA,
//            SPACE=(CYL,(X,Y),RLSE)
//SYSIN    DD *
  SORT FIELDS=COPY
  INCLUDE COND=(35,3,CH,EQ,C'010')
/*



Hope this helps...

cheers

kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike Tebb
Beginner


Joined: 02 Dec 2002
Posts: 20
Topics: 0
Location: Yorkshire, England

PostPosted: Fri Dec 13, 2002 9:20 am    Post subject: Reply with quote

For what it's worth my first thought had been to use two sort steps.
My first step would output all the type 010 and 020 records to the relevant file:

Code:

//SYSIN    DD *
  SORT FIELDS=(509,5,PD,A)
  OUTFIL FILES=01,INCLUDE=(35,3,CH,EQ,C'010')
  OUTFIL FILES=02,INCLUDE=(35,3,CH,EQ,C'020')
/*


The second step would then remove duplicates from the SORTOF02.

Code:

//SYSIN    DD *
  SORT    FIELDS=(509,5,PD,A)
  SUM FIELDS=NONE     
/*
       


I suppose the most efficient version depends on the expected data.

N.B. It should be noted for DFSORT users that XSUM is only available in SYNCSORT, as I have discovered when trying to 'advise' users of that product.
_________________
Cheers - Mike
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12368
Topics: 75
Location: San Jose

PostPosted: Fri Dec 13, 2002 9:38 am    Post subject: Reply with quote

Mike,

Your idea of 2 sorts is better than my appending data idea using xsum feature.May be my brain did not start working at 6 am in the morning Sad

Kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Fri Dec 13, 2002 11:14 am    Post subject: Reply with quote

Mike said "It should be noted for DFSORT users that XSUM is only available in SYNCSORT, as I have discovered when trying to 'advise' users of that product".

Mike,

Although DFSORT does not support XSUM, it does support the same function (and more) through ICETOOL's SELECT with DISCARD feature (which Syncsort does not support). For more information on that, see the "Keep dropped duplicate records (XSUM)" Smart DFSORT Trick at:

http://www.ibm.com/servers/storage/support/software/sort/mvs/tricks/

That should help you advise users of DFSORT/ICETOOL correctly.

Note that the DFSORT and ICETOOL documentation are freely available on the Web for reference at:

http://www.ibm.com/servers/storage/support/software/sort/mvs/srtmpub.html
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort


Last edited by Frank Yaeger on Thu Sep 14, 2006 4:42 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Mike Tebb
Beginner


Joined: 02 Dec 2002
Posts: 20
Topics: 0
Location: Yorkshire, England

PostPosted: Fri Dec 13, 2002 11:27 am    Post subject: Reply with quote

Frank,

rest assured that I was merely making the point that XSUM is a SYNCSORT only statement, in the context of the solutions given in this (SYNCSORT) thread.

I am certainly not qualified to talk about the alternative options in DFSORT as I do not have access to your product.
_________________
Cheers - Mike
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Fri Dec 13, 2002 1:45 pm    Post subject: Reply with quote

Mike,

I guess I misinterpreted what you said.

At any rate, even if your site doesn't have access to a DFSORT license, you still have access to the online DFSORT books in case you're ever curious about DFSORT/ICETOOL/ICEGENER.

And if anybody is interested in the DFSORT Team's analysis of DFSORT's advantages, contact me offline (yaeger@us.ibm.com) and I'll send you a document on that.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group