Posted: Thu Apr 06, 2017 2:31 am Post subject: Extract a variable group of records from a VB
Hello everybody,
I have a file with variable lenght which contain logical groups of records, each group pertain a specific client, groups are composed by different records type, records type follows a specific sequence which is always the same, number of records in a group can't be previously determined. File lenght is 504 (record is 500, plus 4 bytes for variable lenght I suppose), record format is VB.
Following an example of 2 groups:
So these are two distinct groups, I used "..." to specify a variable number of records, I don't know previously how many records they will be (it depends on how many operations that client did during the past year, and this is a variable). BEGINNING and END records are NOT part of any groups, starting record is just a placeholder, while ending record is NOT, it MUST report 'END' followed by the number of groups I have extracted (in my previous example 70), by the number of records extracted (3882 in my example), and finally by the number of bytes extracted, so ending record is composed by 3 counters like this:
Code:
END
0000070 number of groups extracted (I will have to csalculate it)
0000003882 number of records extracted except ending one (needs to be calculated)
001241167297 number of bytes extracted (I guess I can just avoid it because probably it will be calculated again later)
STDANN, ST100, INO are records types, they are always present and always in that sequence, and each group begins with STDANN record type, so a group is the variable amount of records between a STDANN and the next one. Also I have noted that every group always ends with a D11 record type, this D11 is a 93 bytes long record and it always begins with 'D11' followed by all blanks. I dunno if this is important but obviously every record type might have a different lenght (we are facing a VB file).
I have to:
1) extract in output the first and the last records (which identify the beginning and the ending of the file). The first and last records in input must be the first and last records in output also. in the last record I will have to specify how many groups I have extracted, how many records, and probably the number of bytes (not sure about that, in first instance I can just put it to zeroes)
2) extract in output all the variable groups of records that have the record type INO which will be like this:
so probably begins with INO and have '00000' at position 36 (I guess it will be at position 40 because of the variables bytes, so 36+4=40). these group of records must be complete and the same as input, obviously.
3) input is already sorted, I don't need to sort that again.
Joined: 26 Nov 2002 Posts: 12360 Topics: 75 Location: San Jose
Posted: Thu Apr 06, 2017 7:32 am Post subject:
fab wrote:
STDANN, ST100, INO are records types, they are always present and always in that sequence
Fab,
I thought by now you should be well versed to use the trick of Joinkeys to extract the groups of records.
If the INO record is always present in the group, then it is quite easy to extract the group of records that has INO 0000
Here is an untested job which I think will give you the desired results. Please excuse me if I missed something as it is too early and just had my first coffee.
Brief explanation of the job.
Since your input is VB we need to preserve the length as is, so any temporary fields that we add, we have to put them right after RDW. For fixed length records, we usually add temp fields at the end of the record and then chop it off later. But for variable length files you cannot add temporary fields at the end as it will make all variable length records as full length. So to preserve the variable lengths we add the temp fields right after RDW. Later once we remove the temp fields we are still left with the original variable length records.
JNF1CNTL - will use WHEN=INIT to pad 9 spaces(8 byte group key + 1 byte pick byte for ino rec)
Using WHEN=GROUP, we will tag all the groups that begin with STDANN with a seqnum
Note that I checked for position 14 instead of 5 as the INIT statement padded 9 spaces before the actual data. So now your actual data starts at 14.
JNF2CNTL - Will ONLY pick the INO records with INCLUDE COND (this acts before INREC statement so we use the original record position)
Since we just need the INO sequence record number for matching , I chopped of the length to just 13 bytes (4 byte rdw+ 8 byte seqnum+ 1 pick byte)
If the INO record has 0000 at position 49, then we update the pick byte to 'p' so that it will match with file 1
Note that I checked for position 49 instead of 40 as the INIT statement padded 9 spaces before the actual data. So now your actual data starts at 14.
This will match the files and we extract the matched records as well unmatched records from file1
SYSIN - Using an INCLUDE cond we filter out the desired records i.e matched records, BEGIN and END records.
Since you wanted the BYTE count, we once again rebuild the record using INREC. The byte count is in the first 2 bytes of RDW. But a minor caveat here is that it has an additional 1 byte because of the match indicator(?) on the REFORMAT statement. So we need to subtract that 1 from RDW to get the correct length as we don't need that.
Using WHEN=GROUP, we will once again tag the matched group with a seqnum so that we can use that to generate the stats you need at the end.
Using OUTFIL BUILD we remove the temp fields that we added and get the original file as is
Using Reporting features TRAILER1 we get the counts and byte total that you want.
OUTFIL REMOVECC,
BUILD=(1,4, $ RDW
15), $ ACTUAL DATA
TRAILER1=(5,8,
' NUMBER OF GROUPS EXTRACTED',/,
COUNT-1=(M11,LENGTH=8),
' NUMBER OF RECORDS EXTRACTED',/,
TOT=(13,2,BI,M11,LENGTH=8),
' NUMBER OF BYTES EXTRACTED')
//*
//JNF1CNTL DD *
OPTION VLSHRT
INREC IFTHEN=(WHEN=INIT,
BUILD=(1,4, $ RDW
8X, $ SPACES FOR GROUP KEY
C'P', $ PICK FOR INO REC
5)), $ ACTUAL DATA
IFTHEN=(WHEN=GROUP,
BEGIN=(14,6,CH,EQ,C'STDANN'),
PUSH=(05:ID=8))
//*
//JNF2CNTL DD *
OPTION VLSCMP
INCLUDE COND=(5,3,CH,EQ,C'INO')
INREC IFOUTLEN=13,
IFTHEN=(WHEN=INIT,
BUILD=(1,4, $ RDW
SEQNUM,8,ZD, $ SEQNUM FOR GROUP KEY
X, $ SPACES FOR INO REC
5)), $ ACTUAL DATA
IFTHEN=(WHEN=(14,3,CH,EQ,C'INO',AND,
49,4,CH,EQ,C'0000'),
OVERLAY=(13:C'P')) $ PICK FOR INO REC
//*
_________________ Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Kolusu lol
I am lucky because you always answer me, and this is amazing...the bad part is that we have approx 9 hours gap (you are behind). I spent whole morning editing that file extracting by hand hundreds of groups...this because they did not give me the time to think...the problem here is too urgent for give me time to think...so no I am not confident with joinkeys (not yet)... But I have not finished yet so, if your solution works I would be definitely lucky I am now taking my time to study your code...hope they will leave me quiet
Edited
this morning I was very perplexed because INO is not at the beginning of the group, so I thought it not could be done...
Joined: 26 Nov 2002 Posts: 12360 Topics: 75 Location: San Jose
Posted: Thu Apr 06, 2017 10:36 am Post subject:
Fab wrote:
It worked, just ending record is not formatted as I would need
Fab,
Glad that worked. I guess I understood it wrong when you showed this
Code:
and finally by the number of bytes extracted, so ending record is composed by 3 counters like this:
END
0000070 number of groups extracted (I will have to csalculate it)
0000003882 number of records extracted except ending one (needs to be calculated)
001241167297 number of bytes extracted (I guess I can just avoid it because probably it will be calculated again later)
I thought you wanted the counters on separate lines.
Do you really need the BEGIN and END from your original input file or can they be generated? If they can be generated it makes it quite easy. You wouldn't need the INCLUDE and JOIN UNPAIRED,F1 statements.
Also what are the lengths of the group count, record count and byte count? I had them at 8 bytes each but you seem to have more _________________ Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
as you can see it is composed by:
- END (a constant)
- 0330061 a 7 digits counter for the number of groups extracted/elaborated
- 0017430781 a 10 digits counter for the number of records
- 001241167297 a 12 digits counter for to the number of bytes
Joined: 26 Nov 2002 Posts: 12360 Topics: 75 Location: San Jose
Posted: Fri Apr 07, 2017 7:17 am Post subject:
Fab,
Here are the modified untested control cards. However I would like to point out that the header byte count for the header record(TSTDEMA record) and the trailer record (END) is not added to the total of byte count. So your byte count may be off around 80-90 bytes. The header record if I counted correctly is about 47 bytes and the END record is about 43 bytes inclusive of the RDW. If you do need to add these bytes, we can. Let me know if you want them. we would add them to one of the record.
I added +1 to the count to account for the header record. If you don't want to count the header record, then simply change it to COUNT instead of COUNT+1 on the trailer1.
Ideally I would have preferred the counters to be separated by delimiter(|),so that I clearly know where each count ends. something like this
Code:
END0330061|0017430781|001241167297
Either way it is quite easy to add the delimiters on the END record. I will leave it for you to handle it if you need them.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum