Posted: Tue May 25, 2010 10:15 am Post subject: Split file by a specific count across groups
Hello,
I am looking at a way to split a large file (4.5 mil records) into smaller chunks say about 100,000 records each, but keep a specific grouping together.
The file contains Student information, but I need to keep all the students together that belong to a campus if the split happens in the middle of it.
Is there a way to do this in the sort criteria (DFSORT or SyncSort) or easier to just write a separate routine.
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Tue May 25, 2010 10:23 am Post subject:
kkittinger,
Show us a sample of input and desired output with split taken into consideration for a group. Also what is the LRECL and RECFM of the input and output files?
the number of records in each file has not been determined yet as tot he number of files. I just know that I will need to split into manageable chunks and keep the 1st 9 digits in the same file.
[ Key Info ]
256999001 student information ...................................
256999001 student information ...................................
256999001 student information ...................................
256999001 student information ...................................
256999001 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
257999001 student information ...................................
257999001 student information ...................................
257999002 student information ...................................
257999003 student information ...................................
257999004 student information ...................................
so if splitting on record count of 10 then
File 1: (13 records since the group/key needs to be kept together)
256999001 student information ...................................
256999001 student information ...................................
256999001 student information ...................................
256999001 student information ...................................
256999001 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
256999002 student information ...................................
file 2:
257999001 student information ...................................
257999001 student information ...................................
257999002 student information ...................................
257999003 student information ...................................
257999004 student information ...................................
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Tue May 25, 2010 10:55 am Post subject:
kkittinger,
How about splitting based on the groups instead of going by record count? Each output file can have max of 10 groups or so? _________________ Kolusu
www.linkedin.com/in/kolusu
There will be approximately 11,000 groups since this is all the schools in the state of Texas. True some of the groupings will be small, but when getting into the Houston and Dallas area, they get kinda big.
This data is being used to populate are server side data bases and I am being told they want it in manageable chunks in case of errors they only need to reload that piece.
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Tue May 25, 2010 11:45 am Post subject:
kkittinger,
Try this DFSORT JCL. Here i am splitting the records in chunks of 10. You can change that number to any number you want. The parm RECORDS=n specifies the maximum number of records in a group. n can be 1 to 2000000000
Code:
//STEP0100 EXEC PGM=SORT
//SYSOUT DD SYSOUT=*
//SORTIN DD *
256999001 STUDENT INFORMATION ...................................
256999001 STUDENT INFORMATION ...................................
256999001 STUDENT INFORMATION ...................................
256999001 STUDENT INFORMATION ...................................
256999001 STUDENT INFORMATION ...................................
256999002 STUDENT INFORMATION ...................................
256999002 STUDENT INFORMATION ...................................
256999002 STUDENT INFORMATION ...................................
256999002 STUDENT INFORMATION ...................................
256999002 STUDENT INFORMATION ...................................
256999002 STUDENT INFORMATION ...................................
256999002 STUDENT INFORMATION ...................................
256999002 STUDENT INFORMATION ...................................
257999001 STUDENT INFORMATION ...................................
257999001 STUDENT INFORMATION ...................................
257999002 STUDENT INFORMATION ...................................
257999003 STUDENT INFORMATION ...................................
257999004 STUDENT INFORMATION ...................................
//OUT1 DD SYSOUT=*
//OUT2 DD SYSOUT=*
//SYSIN DD *
SORT FIELDS=COPY
INREC IFTHEN=(WHEN=INIT,OVERLAY=(101:SEQNUM,8,ZD,RESTART=(1,9))),
IFTHEN=(WHEN=GROUP,RECORDS=10,PUSH=(109:ID=8)),
IFTHEN=(WHEN=GROUP,BEGIN=(101,8,ZD,EQ,1),PUSH=(117:109,8))
OUTFIL FNAMES=OUT1,INCLUDE=(117,8,ZD,EQ,1),BUILD=(1,100)
OUTFIL FNAMES=OUT2,INCLUDE=(117,8,ZD,EQ,2),BUILD=(1,100)
//*
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum