MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

remove duplicates from file with LRECL 3709

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
Martin
Beginner


Joined: 20 Mar 2006
Posts: 133
Topics: 58

PostPosted: Fri Feb 12, 2010 10:50 am    Post subject: remove duplicates from file with LRECL 3709 Reply with quote

Hi All,

I have 2 files with an LRECL of 3709. I need to compare these two files and remove the duplicates .

My question is NOT on how to remove the duplicates. I would like to know if SORT can handle file with LRECL 3709.

P.S : The sort job keeps throwing the below error message :

"ON" LENGTH IS NOT 1 TO 1500

Any pointers is much appreciated.

Thanks,
Martin
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Fri Feb 12, 2010 11:24 am    Post subject: Re: remove duplicates from file with LRECL 3709 Reply with quote

Martin,

The limitation of 1500 is for a single ON parm. You can have multiple ON parms, but the total key length for comparison should not exceed 4088 bytes. Try this DFSORT/ICETOOL job

Code:

//STEP0100 EXEC PGM=ICETOOL                                 
//TOOLMSG  DD SYSOUT=*                                       
//DFSMSG   DD SYSOUT=*                                       
//IN       DD DSN=your input fb 3709 file,DISP=SHR
//OUT      DD SYSOUT=*                                       
//TOOLIN   DD *                                             
  SELECT FROM(IN) TO(OUT) NODUPS -                           
  ON(1,1500,CH) ON(1501,1500,CH) ON(3001,709,CH)
//*

_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Martin
Beginner


Joined: 20 Mar 2006
Posts: 133
Topics: 58

PostPosted: Fri Feb 12, 2010 11:40 am    Post subject: Reply with quote

Thanks Kolusu !!

I will try this out...

I have another question here:

what if the LRECL exceeds 4088 bytes, How does SORT handle this?
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Fri Feb 12, 2010 11:54 am    Post subject: Reply with quote

Martin wrote:
what if the LRECL exceeds 4088 bytes, How does SORT handle this?


well there is a trick to handle if the total key length exceeds 4088. However it would involve multi pass of data.
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Martin
Beginner


Joined: 20 Mar 2006
Posts: 133
Topics: 58

PostPosted: Fri Feb 12, 2010 12:07 pm    Post subject: Reply with quote

kolusu wrote:
Martin wrote:
what if the LRECL exceeds 4088 bytes, How does SORT handle this?


well there is a trick to handle if the total key length exceeds 4088. However it would involve multi pass of data.


Could you please share it with lesser mortals like us ?? Very Happy

I have a file with a LRECL of 23200 and as mentioned in the original post I need to compare the entire record btw TWO files and remove the duplicates.

Help is much appreciated.

Thanks,
Martin
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Fri Feb 12, 2010 1:33 pm    Post subject: Reply with quote

Martin wrote:
I have a file with a LRECL of 23200 and as mentioned in the original post I need to compare the entire record btw TWO files and remove the duplicates.


wow 23,200 bytes eh? Do any of these files have duplicates ?
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Martin
Beginner


Joined: 20 Mar 2006
Posts: 133
Topics: 58

PostPosted: Fri Feb 12, 2010 1:41 pm    Post subject: Reply with quote

kolusu wrote:
Martin wrote:
I have a file with a LRECL of 23200 and as mentioned in the original post I need to compare the entire record btw TWO files and remove the duplicates.


wow 23,200 bytes eh? Do any of these files have duplicates ?


Yes... there are duplicate records in both the files.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Fri Feb 12, 2010 1:42 pm    Post subject: Reply with quote

Martin,

how do you plan to compare duplicates? lets say file 1 has 4 duplicates and file 2 has 12 duplicates , what do you do in this case?
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Martin
Beginner


Joined: 20 Mar 2006
Posts: 133
Topics: 58

PostPosted: Fri Feb 12, 2010 2:05 pm    Post subject: Reply with quote

Hi Kolusu,

In this case I want all the duplicates records from file1 to be removed from the output file.

Ex:

File1
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111


File 2:
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
BBBBBBBB22222
CCCCCCC11111

O/P file :

aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
aaaaaaaaa11111
BBBBBBBB22222
CCCCCCC11111

Note : File 1 is always a subset of file 2. As mentioned above the first 4 records from File1 will also be present in file2. In addition File 2 can have 8 more such records which should NOT be removed .
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Fri Feb 12, 2010 2:26 pm    Post subject: Reply with quote

Martin,

As is the matching with longer keys is complicated and you threw in monkey wrench into it now with duplicates. I will try if I can come up with an elegant solution.
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Martin
Beginner


Joined: 20 Mar 2006
Posts: 133
Topics: 58

PostPosted: Fri Feb 12, 2010 2:49 pm    Post subject: Reply with quote

kolusu wrote:
Martin,

As is the matching with longer keys is complicated and you threw in monkey wrench into it now with duplicates. I will try if I can come up with an elegant solution.


Thanks !! If you are unable to come up with a solution, Please show me how to match the long keys.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Fri Feb 12, 2010 5:27 pm    Post subject: Reply with quote

Martin wrote:
kolusu wrote:
Martin,

As is the matching with longer keys is complicated and you threw in monkey wrench into it now with duplicates. I will try if I can come up with an elegant solution.


Thanks !! If you are unable to come up with a solution, Please show me how to match the long keys.



Martin,

I did not get a chance to work on your problem and IMHO I think it is NOT possible to do it with sort given the number of duplicates involved in each file.

Here is a sample DFSORT JCL which will remove duplicate records from a file of LRECL 4500.

A brief explanation of the job.

step0100 : Using an INREC we add a flag to all records at end with an "U" for Unique. Using SELECT operator we pick the LASTDUP from the first 4084 bytes (max value) and put them in DUP file (these are the records for potential duplicates) and we need to check the bytes from 4085 to the end of the file to see if they are indeed duplicates. We also override the flag to "D" for duplicate

If the first 4084 bytes aren't equal , then we don't have to perform any validation as they canNOT be dups , so we write them to UNQ file.

Step0200 : Now concatenate the above files together once again (Dup file should be first in the list ) and sort them again on the first 4084 bytes with equals option. Using WHEN=group , we push the D flag record on to the next record.

using an omit condition we perform the validation for bytes from 4085 to the end of the file in chunks of 256 bytes and if they are equal then we eliminate and if they aren't equal we write them out to the output file.

Code:

//STEP0100 EXEC PGM=ICETOOL                                     
//TOOLMSG  DD SYSOUT=*                                         
//DFSMSG   DD SYSOUT=*                                         
//IN       DD DSN=&&INP,DISP=SHR                               
//DUP      DD DSN=&&DUP,DISP=(,PASS),SPACE=(CYL,(1,1),RLSE)     
//UNQ      DD DSN=&&UNQ,DISP=(,PASS),SPACE=(CYL,(1,1),RLSE)     
//TOOLIN   DD *                                                 
  SELECT FROM(IN) TO(DUP) LASTDUP DISCARD(UNQ) -               
  ON(1,1500,CH) ON(1501,1500,CH) ON(3001,1084,CH) USING(CTL1)   
//CTL1CNTL DD *                                                 
  INREC OVERLAY=(4501:C'U')                                     
  OUTFIL FNAMES=DUP,OVERLAY=(4501:C'D')                         
//*                                                             
//STEP0200 EXEC PGM=SORT                                           
//SYSOUT   DD SYSOUT=*                                             
//SORTIN   DD DSN=&&DUP,DISP=SHR                                   
//         DD DSN=&&UNQ,DISP=SHR                                   
//SORTOUT  DD SYSOUT=*                                             
//SYSIN    DD *                                                     
  SORT FIELDS=(1,4084,CH,A),EQUALS                                 
  OUTREC IFTHEN=(WHEN=GROUP,BEGIN=(4501,1,CH,EQ,C'D'),             
  PUSH=(4502:4501,1,1,4500),RECORDS=2)                             
  OUTFIL IFOUTLEN=4500,                                             
  OMIT=(4501,1,CH,EQ,C'D',OR,(4501,002,CH,EQ,C'UD',AND,             
        4085,256,CH,EQ,8587,256,CH,AND,4342,159,CH,EQ,8842,159,CH)),
  IFTHEN=(WHEN=(4501,2,CH,EQ,C'UD'),BUILD=(1,4500,/,4503,4500))     
//*

_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Sqlcode
Intermediate


Joined: 15 Dec 2006
Posts: 157
Topics: 38

PostPosted: Tue Feb 23, 2010 10:48 pm    Post subject: Reply with quote

Throwing out something which may or may not be possible...

Steps
1) Break your input record (23200 bytes) into 4000 bytes each. This will create 6 records for each input record. Make sure your LRECL is 4000 bytes.You may need multiple pass for each input file. Also create record-id for each record which will be used later to merge them back.

Here is what I tested for 30 bytes record and breaking it into 23 bytes (15 bytes data + 8 bytes record-id). Once again I dont know if its correct or not.

Here in the IFTHEN condition use any valid condition to populate record-id. Can we use entire records greater than spaces?? Don't know...

Code:
//SORT01   EXEC PGM=SORT                                     
//SORTIN   DD  *                                             
123456789012345ABCDEFGHIJKLMNO                               
111112222233333AAAAABBBBBCCCCC                               
//SORTOUT  DD  DSN=TSOID.SPLIT.TEST,                       
//             DISP=(,CATLG,DELETE),                         
//             UNIT=SYSDA,LRECL=23                           
//SYSIN DD *                                                 
   INREC OVERLAY=(1,15,16:SEQNUM,08,ZD,16,15,SEQNUM,08,ZD)   
   SORT FIELDS=COPY                                         
   OUTFIL IFOUTLEN=23,IFTHEN=(WHEN=(1,1,CH,GT,C' '),         
                         BUILD=(01,15,16,08,/,24,15,39,08)) 
/*                                                           
//SYSOUT DD SYSOUT=*                                         
//*                                                         

2) Do your comparison using many to many compare logic. Now that record length is reduced you may be able to solution described above.

3) After your comparison, when you have identified the records to be kept join them back using record-id.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group