Posted: Fri Jul 28, 2017 4:58 pm Post subject: Break records after particular Count and column value change
We extract around 320000 records thru Db2 utility and split the records into 8 files when the count is greater than 40000 and the first column value change after reading 40000k records. We used seqnum currently to split on 40000, it works fine but we want to break on column value change and count should be greater than 40000 so it goes into 8 datasets
Please advise
Code:
col1 col2
001 10
001 10
001 10
001 10 Record 40000
....
001 10 records 41032 -- Write to file D0001A since after 40000 records the col2 changes in the next row
001 11 Record 41033
001 11
001 11
..
001 12 Record 56000
001 12 records 80000
..
001 12 records 82312 - Write to file D0001B since after 40000 records the col2 changes in the next row
001 13 records 82313
Similarly, we have to break after 40000 records where the last record was written and when col2 changes
Currently, we have coded using SEQNUM, but the column changes are not handled
Joined: 26 Nov 2002 Posts: 12359 Topics: 75 Location: San Jose
Posted: Sat Jul 29, 2017 12:14 pm Post subject:
Suchay,
If your intention is to split the file based on COL2 , why bother about about the limit of 40,000? Use WHEN=GROUP with KEYBEGIN on COL2 and add an id number at the end.
Try the following untested control cards ( I assumed that your input is already sorted on col2, if it is not the replace the COPY statement with SORT FIELDS=(6,2,CH,A)
Code:
//SYSIN DD *
OPTION COPY
INCLUDE COND=(1,3,CH,EQ,C'001')
OUTREC IFTHEN=(WHEN=GROUP,KEYBEGIN=(6,2),PUSH=(110:ID=2))
Reason for spitting into 8 files: we provide these reports to users into spreadsheets, so we it would be cumbersome for them to look into the reports, so we break into 40,000 records and when the col2 changes after 40000 records we create separate files
Joined: 26 Nov 2002 Posts: 12359 Topics: 75 Location: San Jose
Posted: Sat Jul 29, 2017 3:53 pm Post subject:
Suchay wrote:
Reason for spitting into 8 files: we provide these reports to users into spreadsheets, so we it would be cumbersome for them to look into the reports, so we break into 40,000 records and when the col2 changes after 40000 records we create separate files
Suchay,
The control cards i gave will split the file into 8 files. It is not lmited to 40000 records. Are you telling that you have more than 40000 records per col2 values? If that is the case just add SEQ=6 in the Oush statement after id and check that value on OUTFIL
Kolusu, Col2 is sorted and the catch is Col2 might have values between 40000. IF you check the first post after 40000 records we have 11 and 12 in the next 40000 and after 40000 record ,col2 value changes to 13, then we create a separate file.
col1 col2
001 10
001 10
001 10
001 10 Record 40000
....
001 10 records 41032 -- Write to file D0001A since after 40000 records the col2 changes in the next row
001 11 Record 41033
001 11
001 11
..
001 12 Record 56000
001 12 records 80000
..
001 12 records 82312 - Write to file D0001B since after 40000 records the col2 changes in the next row
001 13 records 82313
Joined: 26 Nov 2002 Posts: 12359 Topics: 75 Location: San Jose
Posted: Mon Jul 31, 2017 11:18 am Post subject:
Suchay wrote:
001 10 records 41032 -- Write to file D0001A since after 40000 records the col2 changes in the next row
Suchay,
Your requirements are conflicting. Shouldn't be written to D0001B since you are past 40000 record mark?
Lets clear the requirement.
Lets say your COL2 have the following number of records.
Code:
10 - 200,000 records
11- 8,000 records
12- 15,000 records
13- 97,000 records
14 - 5000 records
total number of records = 325,000
Now how do you plan to split these records?
Code:
D0001A - 00001 to 40,000 records of 10
D0001B - 40,001 to 80,000 records of 10
D0001C - 80,001 to 120,000 records of 10
D0001D - 120,001 to 160,000 records of 10
D0001E - 160,001 to 200,000 records of 10
D0001F - 8,000 of 11 + 15,000 of 12 + 0001 to 7,000 of 13
D0001G - 7,0001 to 47,000 records of 13
D0001H - 47,0001 to 87,000 records of 13
If you did split it like that then you have still 10,000 records of 13 and 5000 records of 14. _________________ Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Generally, we don't get more than 30000 records for a particular COl2 record.
We just try to keep the count close to 40000 in a file. Currently, we do this manually we check the col2 value
and break the files whenever the col2 value changes after each 40000 record.
Code:
10 - 20,000 records
11- 8,000 records
12- 15,000 records
13- 17,000 records
14 - 15,000 records
15 - 9500 records
16 - 23,000 records
17 -18,500 records.
Let us try to keep it simple with 3 files with above data and output should like below. is it possible
Code:
D0001A - 00001 to 43,000 records of 10 ,11, 12
D0001B - 43,001 to 84,500 records of 13, 14, 15
D0001C - 84,501 to 126,000 records of 16, 17
Joined: 26 Nov 2002 Posts: 12359 Topics: 75 Location: San Jose
Posted: Mon Jul 31, 2017 12:32 pm Post subject:
Suchay wrote:
Let us try to keep it simple with 3 files with above data and output should like below. is it possible
Suchay,
You need to design solutions that would work for any kind of scenario. If you requirements are clear it is a simple task.
All you had to say is "I have to split an input file into chunks of 40000 records but if there are more records for the last key beyond 40000 records, I want them included in the same file even though the number of records included would be greater than 40000."
You spent 2 days to get this simple requirement right. You could have a written a COBOL program by now to get the desired results. We can ONLY read your text and interpret it and if your requirements aren't clear you would just be wasting my time as well as your time.
Any way use the following control cards which I think will give you the desired results
You can use it, however I prefer SAVE instead of INCLUDE.
Code:
OUTFIL FNAMES=REMAIN,BUILD=(1,109),SAVE
I am creating a 2 Byte ID for every 40000 records. So the maximum is 99 values for 40000*99 = 3,960,000 records and it restarts from 00. With a greater than condition you will skip the 00. But with SAVE you will not.
OP needs to change the ID to 8 byte value to make sure that he doesn't miss any records. I will leave the exercise to him given that he already has a template solution.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum