Break records after particular Count and column value change
Select messages from
# through # FAQ
[/[Print]\]

MVSFORUMS.com -> Utilities

#1: Break records after particular Count and column value change Author: Suchay PostPosted: Fri Jul 28, 2017 4:58 pm
    —
We extract around 320000 records thru Db2 utility and split the records into 8 files when the count is greater than 40000 and the first column value change after reading 40000k records. We used seqnum currently to split on 40000, it works fine but we want to break on column value change and count should be greater than 40000 so it goes into 8 datasets

Please advise

Code:

col1 col2
001  10 
001  10 
001  10
001  10  Record 40000
....
001  10  records 41032 -- Write to file D0001A since after 40000 records the col2 changes in the next row
001  11  Record 41033 
001  11 
001  11
..
001  12 Record 56000
001  12 records 80000
..
001  12 records 82312 -   Write to file D0001B since after 40000 records the col2 changes in the next row
001  13   records 82313

Similarly, we have to break after 40000 records where the last record was written and when col2  changes

Currently, we have coded using SEQNUM, but the column changes are not handled

//SYSIN   DD *                                           
  OPTION COPY                                           
                                                         
  INREC  IFTHEN=(WHEN=(1,3,CH,EQ,C'001'),OVERLAY=(110:SEQNUM,8,ZD))     
           
 OUTFIL FNAMES=D0001A,                                               
        INCLUDE=(1,3,CH,EQ,C'001',AND,110,8,ZD,LE,40000)             
 OUTFIL FNAMES=D0001B,                                               
        INCLUDE=(1,3,CH,EQ,C'001',AND,110,8,ZD,GT,40000,AND,         
                                     110,8,ZD,LE,80000)               
  OUTFIL FNAMES=D0001C,                                       
       INCLUDE=(1,3,CH,EQ,C'001',AND,110,8,ZD,GT,80000,AND, 
                                    110,8,ZD,LE,120000)     
OUTFIL FNAMES=D0001D,                                       
       INCLUDE=(1,3,CH,EQ,C'001',AND,110,8,ZD,GT,120000,AND,
                                    110,8,ZD,LE,160000)     
OUTFIL FNAMES=D0001E,                                       
       INCLUDE=(1,3,CH,EQ,C'001',AND,110,8,ZD,GT,160000,AND,
                                    110,8,ZD,LE,200000)     
OUTFIL FNAMES=D0001F,                                       
       INCLUDE=(1,3,CH,EQ,C'001',AND,110,8,ZD,GT,200000,AND,
                                    110,8,ZD,LE,240000)     
                                                             
OUTFIL FNAMES=D0001G,                                       
       INCLUDE=(1,3,CH,EQ,C'001',AND,110,8,ZD,GT,240000,AND,
                                    110,8,ZD,LE,280000)     

#2:  Author: kolusuLocation: San Jose PostPosted: Sat Jul 29, 2017 12:14 pm
    —
Suchay,

If your intention is to split the file based on COL2 , why bother about about the limit of 40,000? Use WHEN=GROUP with KEYBEGIN on COL2 and add an id number at the end.

Try the following untested control cards ( I assumed that your input is already sorted on col2, if it is not the replace the COPY statement with SORT FIELDS=(6,2,CH,A)
Code:

//SYSIN    DD *                                             
  OPTION COPY   
  INCLUDE COND=(1,3,CH,EQ,C'001')                                               
  OUTREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(6,2),PUSH=(110:ID=2)) 
                                                             
  OUTFIL FNAMES=D0001A,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,1)
  OUTFIL FNAMES=D0001B,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,2)
  OUTFIL FNAMES=D0001C,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,3)
  OUTFIL FNAMES=D0001D,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,4)
  OUTFIL FNAMES=D0001E,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,5)
  OUTFIL FNAMES=D0001F,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,6)
  OUTFIL FNAMES=D0001G,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,7)
  OUTFIL FNAMES=D0001H,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,8)
/*

#3:  Author: Suchay PostPosted: Sat Jul 29, 2017 12:49 pm
    —
Reason for spitting into 8 files: we provide these reports to users into spreadsheets, so we it would be cumbersome for them to look into the reports, so we break into 40,000 records and when the col2 changes after 40000 records we create separate files

#4:  Author: kolusuLocation: San Jose PostPosted: Sat Jul 29, 2017 3:53 pm
    —
Suchay wrote:
Reason for spitting into 8 files: we provide these reports to users into spreadsheets, so we it would be cumbersome for them to look into the reports, so we break into 40,000 records and when the col2 changes after 40000 records we create separate files


Suchay,

The control cards i gave will split the file into 8 files. It is not lmited to 40000 records. Are you telling that you have more than 40000 records per col2 values? If that is the case just add SEQ=6 in the Oush statement after id and check that value on OUTFIL
Code:


  OUTREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(6,2),PUSH=(110:ID=,SEQ=6))


Code:


INCLUDE=(110,2,ZD,EQ,1,AND,112,6,ZD,LE,40000))

#5:  Author: Suchay PostPosted: Sat Jul 29, 2017 10:16 pm
    —
Kolusu, Col2 is sorted and the catch is Col2 might have values between 40000. IF you check the first post after 40000 records we have 11 and 12 in the next 40000 and after 40000 record ,col2 value changes to 13, then we create a separate file.


col1 col2
001 10
001 10
001 10
001 10 Record 40000
....
001 10 records 41032 -- Write to file D0001A since after 40000 records the col2 changes in the next row
001 11 Record 41033
001 11
001 11
..
001 12 Record 56000
001 12 records 80000
..
001 12 records 82312 - Write to file D0001B since after 40000 records the col2 changes in the next row
001 13 records 82313


[\Quote]

#6:  Author: kolusuLocation: San Jose PostPosted: Mon Jul 31, 2017 11:18 am
    —
Suchay wrote:

001 10 records 41032 -- Write to file D0001A since after 40000 records the col2 changes in the next row


Suchay,

Your requirements are conflicting. Shouldn't be written to D0001B since you are past 40000 record mark?

Lets clear the requirement.

Lets say your COL2 have the following number of records.

Code:

10 - 200,000 records
11- 8,000 records
12- 15,000 records
13- 97,000 records
14 - 5000 records
total number of records = 325,000


Now how do you plan to split these records?
Code:

D0001A -  00001   to  40,000 records of 10
D0001B - 40,001   to  80,000 records of 10
D0001C - 80,001   to 120,000 records of 10
D0001D - 120,001  to 160,000 records of 10
D0001E - 160,001  to 200,000 records of 10

D0001F -  8,000 of 11 + 15,000 of 12 + 0001  to 7,000 of 13
D0001G -  7,0001  to 47,000 records of 13
D0001H -  47,0001  to 87,000 records of 13


If you did split it like that then you have still 10,000 records of 13 and 5000 records of 14.

#7:  Author: Suchay PostPosted: Mon Jul 31, 2017 11:58 am
    —
Kolusu,

Generally, we don't get more than 30000 records for a particular COl2 record.
We just try to keep the count close to 40000 in a file. Currently, we do this manually we check the col2 value
and break the files whenever the col2 value changes after each 40000 record.


Code:

10 - 20,000 records
11- 8,000 records
12- 15,000 records
13- 17,000 records
14 - 15,000 records
15 - 9500 records
16 - 23,000 records
17 -18,500 records.



Let us try to keep it simple with 3 files with above data and output should like below. is it possible



Code:

D0001A -  00001   to  43,000 records of 10  ,11, 12
D0001B - 43,001   to  84,500 records of 13, 14, 15
D0001C - 84,501   to 126,000 records of 16, 17

#8:  Author: kolusuLocation: San Jose PostPosted: Mon Jul 31, 2017 12:32 pm
    —
Suchay wrote:
Let us try to keep it simple with 3 files with above data and output should like below. is it possible


Suchay,


You need to design solutions that would work for any kind of scenario. If you requirements are clear it is a simple task.

All you had to say is "I have to split an input file into chunks of 40000 records but if there are more records for the last key beyond 40000 records, I want them included in the same file even though the number of records included would be greater than 40000."

You spent 2 days to get this simple requirement right. You could have a written a COBOL program by now to get the desired results. We can ONLY read your text and interpret it and if your requirements aren't clear you would just be wasting my time as well as your time.

Any way use the following control cards which I think will give you the desired results
Code:

//SYSIN    DD *                                                     
  OPTION COPY                                                       
  INREC IFTHEN=(WHEN=GROUP,RECORDS=40000,PUSH=(110:ID=2,113:SEQ=6)),
        IFTHEN=(WHEN=GROUP,KEYBEGIN=(6,2),PUSH=(120:SEQ=6)),       
        IFTHEN=(WHEN=GROUP,                                         
               BEGIN=(113,6,ZD,EQ,40000,AND,120,6,ZD,GT,1),         
                 END=(120,6,ZD,EQ,1),                               
                PUSH=(128:110,2)),                                 
        IFTHEN=(WHEN=(120,6,ZD,NE,1,AND,128,2,CH,GT,C' '),         
       OVERLAY=(110:128,2))                                         
                                                                   
  OUTFIL FNAMES=D0001A,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,1)       
  OUTFIL FNAMES=D0001B,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,2)       
  OUTFIL FNAMES=D0001C,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,3)       
  OUTFIL FNAMES=D0001D,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,4)       
  OUTFIL FNAMES=D0001E,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,5)       
  OUTFIL FNAMES=D0001F,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,6)       
  OUTFIL FNAMES=D0001G,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,7)       
  OUTFIL FNAMES=D0001H,BUILD=(1,109),INCLUDE=(110,2,ZD,EQ,8)       
/*

#9:  Author: Magesh_J PostPosted: Mon Jul 31, 2017 5:42 pm
    —
kolusu wrote:

Code:

        IFTHEN=(WHEN=GROUP,                                         
               BEGIN=(113,6,ZD,EQ,40000,AND,120,6,ZD,GT,1),         
                 END=(120,6,ZD,EQ,1),                               
                PUSH=(128:110,2)),                                 
        IFTHEN=(WHEN=(120,6,ZD,NE,1,AND,128,2,CH,GT,C' '),         
       OVERLAY=(110:128,2))     



Impressive thinking !!!...

Thanks
Magesh

#10:  Author: Magesh_J PostPosted: Mon Jul 31, 2017 5:47 pm
    —
Also we may have as follows, so we may not miss any records.
Code:

OUTFIL FNAMES=D0001H,BUILD=(1,109),INCLUDE=(110,2,ZD,GE,8)


Regards,
Magesh

#11:  Author: kolusuLocation: San Jose PostPosted: Tue Aug 01, 2017 10:27 am
    —
Magesh_J wrote:
Also we may have as follows, so we may not miss any records.
Code:

OUTFIL FNAMES=D0001H,BUILD=(1,109),INCLUDE=(110,2,ZD,GE,8)


Regards,
Magesh


Magesh,

You can use it, however I prefer SAVE instead of INCLUDE.
Code:

OUTFIL FNAMES=REMAIN,BUILD=(1,109),SAVE


I am creating a 2 Byte ID for every 40000 records. So the maximum is 99 values for 40000*99 = 3,960,000 records and it restarts from 00. With a greater than condition you will skip the 00. But with SAVE you will not.

OP needs to change the ID to 8 byte value to make sure that he doesn't miss any records. I will leave the exercise to him given that he already has a template solution.

#12:  Author: Suchay PostPosted: Tue Aug 01, 2017 4:23 pm
    —
Thanks Kolusu sort card worked fine



MVSFORUMS.com -> Utilities


output generated using printer-friendly topic mod. All times are GMT - 5 Hours

Page 1 of 1

Powered by phpBB © 2001, 2005 phpBB Group