MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Split file into two based on occurrence of another field

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
somuk
Beginner


Joined: 04 Feb 2003
Posts: 113
Topics: 37

PostPosted: Tue Jan 12, 2016 6:23 pm    Post subject: Split file into two based on occurrence of another field Reply with quote

I have a requirement to split an input file into two. The input file has millions of records pertaining to different account numbers. My system has a maximum of 20,000 account numbers. At any given time I will be expecting less than 20,000 of account related data present in the file I mentioned above. Each account can have multiple records present in the input file. I want to split the file when I encounter 10,000th account (and it's related records) and put it into one file and the remaining records should be kept into the second file. Here is the sample file. LRECL is 80, FB. Account number starts at 72nd position and it’s 9 byte length.
Code:

11111111111111111111111111111111111111111111111111111111    222222222
11111111111111111111111111111111111111111111111111111111    222222222
11111111111111111111111111111111111111111111111111111111    222222222
11111111111111111111111111111111111111111111111111111111    222222222
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    333333333
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    333333333
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    333333333
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    333333333
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444   
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz    555555555
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz    555555555
123231231232zzzzzzzzzzzzzz1232312321323zzzzzzzz123231231    666666666
1232131213asdasddasdffe4343243423432afddfdfdfwdf43434324    777777777
1232131213asdasddasdffe4343243423432afddfdfdfwdf43434324    777777777
1232131213asdasddasdffe4343243423432afddfdfdfwdf43434324    777777777
1232131213asdasddasdffe4343243423432afddfdfdfwdf43434324    777777777


As shown above 222222222,333333333,444444444 etc. are account numbers. For example if 444444444 is the 10000th account, then the first file should be:

Code:

11111111111111111111111111111111111111111111111111111111    222222222
11111111111111111111111111111111111111111111111111111111    222222222
11111111111111111111111111111111111111111111111111111111    222222222
11111111111111111111111111111111111111111111111111111111    222222222
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    333333333
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    333333333
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    333333333
xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx    333333333
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444   
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444
yyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyyy    444444444


Second file should be:
Code:


zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz    555555555
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz    555555555
123231231232zzzzzzzzzzzzzz1232312321323zzzzzzzz123231231    666666666
1232131213asdasddasdffe4343243423432afddfdfdfwdf43434324    777777777
1232131213asdasddasdffe4343243423432afddfdfdfwdf43434324    777777777
1232131213asdasddasdffe4343243423432afddfdfdfwdf43434324    777777777
1232131213asdasddasdffe4343243423432afddfdfdfwdf43434324    777777777


Is it possible to achive this using DFSORT? Please help.
_________________
Regds,
Somu
Back to top
View user's profile Send private message Yahoo Messenger
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Wed Jan 13, 2016 2:27 am    Post subject: Reply with quote

Not only possible, but easy.

WHEN=GROUP has an option to mark each record within a group with a group-number (you PUSH an ID). Append that to each record (assuming fixed-length records, prepend with variable-length records).

Use OUTFIL INCLUDE=/OMIT= to only select records with an ID less-than-or-equal-to 10000. Use BUILD there to return records to their original size.

Use a second OUTFIL with SAVE to output all the records which don't already appear on an OUTFIL. Again use BUILD to return the records to their original size.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Wed Jan 13, 2016 10:43 am    Post subject: Reply with quote

somuk,

William pretty much explained what needs to be done and his control cards translate to 4 lines of control cards.

Code:

//SYSIN    DD *                                             
  OPTION COPY                                               
  INREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(72,9),PUSH=(81:ID=8))
 
  OUTFIL FNAMES=OUT1,INCLUDE=(81,8,ZD,LE,10000),BUILD=(1,80)
  OUTFIL FNAMES=OUT2,SAVE,BUILD=(1,80)                       
//*

_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
somuk
Beginner


Joined: 04 Feb 2003
Posts: 113
Topics: 37

PostPosted: Wed Jan 13, 2016 11:38 am    Post subject: Reply with quote

Thanks Kolusu and Williams. Much appreciated your response. I have tried with the above control card (which Kolusu provided) to test the following data to obtain two account numbers and related data
in the first file and rest in second. Modified the control card to LE,2 in INCLUDE condition.
Code:
 

//SYSIN    DD *                                           
  OPTION COPY                                             
  INREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(72,9),PUSH=(81:ID=8))
                                                           
  OUTFIL FNAMES=OUT1,INCLUDE=(81,8,ZD,LE,2),BUILD=(1,80)   
  OUTFIL FNAMES=OUT2,SAVE,BUILD=(1,80)                     
//*                                                       


Input file:

Code:

11111111111111111111111111111111111111111111111111111111    111111111
11111111111111111111111111111111111111111111111111111111    111111111
11111111111111111111111111111111111111111111111111111111    111111111
22222222222222222222222222222222222222222222222222222222    222222222
22222222222222222222222222222222222222222222222222222222    222222222
22222222222222222222222222222222222222222222222222222222    222222222
33333333333333333333333333333333333333333333333333333333    333333333
33333333333333333333333333333333333333333333333333333333    333333333


First file I'm expecting was (firts two account numbers and related data):
Code:

11111111111111111111111111111111111111111111111111111111    111111111
11111111111111111111111111111111111111111111111111111111    111111111
11111111111111111111111111111111111111111111111111111111    111111111
22222222222222222222222222222222222222222222222222222222    222222222
22222222222222222222222222222222222222222222222222222222    222222222
22222222222222222222222222222222222222222222222222222222    222222222


Second file I'm expecting was (remaining account numbers and associated data):
Code:

33333333333333333333333333333333333333333333333333333333    333333333
33333333333333333333333333333333333333333333333333333333    333333333


However the first file looks as below:

Code:

11111111111111111111111111111111111111111111111111111111    111111111
11111111111111111111111111111111111111111111111111111111    111111111


Second file looks as below:
Code:

11111111111111111111111111111111111111111111111111111111    111111111
22222222222222222222222222222222222222222222222222222222    222222222
22222222222222222222222222222222222222222222222222222222    222222222
22222222222222222222222222222222222222222222222222222222    222222222
33333333333333333333333333333333333333333333333333333333    333333333
33333333333333333333333333333333333333333333333333333333    333333333

_________________
Regds,
Somu
Back to top
View user's profile Send private message Yahoo Messenger
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Wed Jan 13, 2016 12:12 pm    Post subject: Reply with quote

When you get a problem like this it is good to see the data before it is returned to its original size:

Code:
//SYSIN    DD *                                           
  OPTION COPY                                             
  INREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(72,9),PUSH=(81:ID=8))
                                                           
  OUTFIL FNAMES=OUT1,INCLUDE=(81,8,ZD,LE,2) BUILD=(1,80)   
  OUTFIL FNAMES=OUT2,SAVE BUILD=(1,80)


All I've done is changed a comma to a blank, so that the BUILDs become comments. You can then see the values that have been PUSHed.

Edit: What I suspect is that you have your test-data in a numbered PDS/PDSE member with RECFM 80, so what you think is your key is actually an ISPF sequence number.


Last edited by William Collins on Wed Jan 13, 2016 12:14 pm; edited 1 time in total
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Wed Jan 13, 2016 12:13 pm    Post subject: Reply with quote

somuk,

Is your Input file FB file or a VB file? and are you sure that the account number starts at 72 for 9 bytes?

There is a quick way to check it. Assuming your Input hass FB RECFM and the account number does start at position 72, use the following control cards to display the account number and the counter

Add the SORTOUT dd to the JCL

Code:

//SORTOUT  DD SYSOUT=*                                         
//SYSIN    DD *                                               
  OPTION COPY                                                 
  INREC IFTHEN=(WHEN=INIT,BUILD=(72,9)),                       
        IFTHEN=(WHEN=GROUP,KEYBEGIN=(01,9),PUSH=(15:ID=8))     
                                                               
//*

_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12378
Topics: 75
Location: San Jose

PostPosted: Wed Jan 13, 2016 12:16 pm    Post subject: Reply with quote

William Collins wrote:
Edit: What I suspect is that you have your test-data in a numbered PDS/PDSE member with RECFM 80, so what you think is your key is actually an ISPF sequence number.


William,


I was about to quote you that it is probably the line numbers that is messing up Somuk's test JCL. Like Minds think alike. Smile
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
William Collins
Supermod


Joined: 03 Jun 2012
Posts: 437
Topics: 0

PostPosted: Wed Jan 13, 2016 12:17 pm    Post subject: Reply with quote

Mmmm.... actual sample data shown finishes short of column 72, the alleged start-column...
Back to top
View user's profile Send private message
somuk
Beginner


Joined: 04 Feb 2003
Posts: 113
Topics: 37

PostPosted: Wed Jan 13, 2016 6:55 pm    Post subject: Reply with quote

Thanks William and Kolusu.
You are correct. My bad. Made a mistake when I tried with the SYSIN instream data. Thanks a lot for your help.
_________________
Regds,
Somu
Back to top
View user's profile Send private message Yahoo Messenger
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group