In other words, the result should ONLY have contain records that have matching keys (Cols 1-6) from both files, not just from one of the files.
Also, the files are quite large, so I want to avoid, if possible creating a large file of matching records all both files and flagging the ones to select by placing a temporary indicator somewhere on the record.
File A contains over 40 million records and most records have duplicate keys, so doing an "AllDups" on the two files will result in a very large file and I would like to avoid that if possible.
File B is a subset of FileA and does not contain duplicates.
For smaller files, I can do this with DFSORT by "Flagging" the records on both files using a SPLICE, then exclude all other records. But I can't use this method here because the intermediate result (i.e. ALLDUPS) will be too large.
Joined: 01 Feb 2007 Posts: 1075 Topics: 7 Location: At Home
Posted: Thu May 03, 2012 9:49 am Post subject:
JOINKEYS? And you might get a better response if you had posted in the Utilities part of the forum. It has a description that starts 'DFsort' so it should have been hard to miss. _________________ Utility and Program control cards are NOT, repeat NOT, JCL.
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
Posted: Thu May 03, 2012 10:04 am Post subject:
Santlou,
why do you have 445566B... twice in the output?
and why are the others there at all?
if the files are presorted (as you probably know)
the process will be quicker.
do you actually want a match? or a join of the records?
a match would mean a minimum of 2 records in output.
you could use JOINKEYS in one process to create 1 fixed length record for each 'match'
since file 2 is a subset of file 1,
the number of potential duplicates of file1 matching a file2 record
dictate how many output records there will be.
but a file record size equal to file 1 max record size PLUS file 2 max record size.
on the output side of the JOINKEYS,
you could 'split' the records though i do not see why you would want to,
Frank/Kolusu will be along shortly (or even now)
and provide you with a more relevant answer. _________________ Dick Brenholtz
American living in Varel, Germany
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
Posted: Thu May 03, 2012 10:06 am Post subject:
delta403.
the splice would require a sort,
where the JOINKEYS on OPTION COPY with presorted files, would not. _________________ Dick Brenholtz
American living in Varel, Germany
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Thu May 03, 2012 10:34 am Post subject:
delta403 wrote:
@dbz: SPLICE will sort the records automatcially, we don't need to specify any sort step for this.
*Sigh* Joinkeys reads the files in 1 pass and it is much more efficient than 3 pass solution using SPLICE. Remember that SPLICE cannot handle MANY to MANY match. _________________ Kolusu
www.linkedin.com/in/kolusu
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Thu May 03, 2012 10:41 am Post subject:
Santlou,
What is the LRECL and RECFM of both the files? You said your files are variable block files , so the key actually starts from position 5 as the first 4 bytes have the RDW. Is the data already presorted on the key? _________________ Kolusu
www.linkedin.com/in/kolusu
Thanks for all your responses. I appreciate all the help.
Delta, Yes. I would not have a problem using ICETOOLS if it would get the results that I am looking for. However, your solution basically combines both files first, making it inefficient with my files. If FileA has 40 million records and FileB has 1000 records, copying both files to TEMP will mean copying many millions of records that we don't want. I'm looking for a solution that would basically extract ONLY the records from FILEA that Match FILEB without building another file that includes both FILEA and FILEB. With 40million records in filea, I would not have enough DASD to build TEMP. Also using the SPLICE that you suggested, wouldn't I still have the records on fileA that are duplicate (i.e. Key=334455) on fileA but are not on fileB?
DBZ... I have 445566B Twice in the output because I originally wanted All Records from FILEB and Only those Recs from FILEA that MATCH FILEB. But, I can live with just a match that would result in an output file that Only includes Records from FileA that Match records in FileB. Also, Yes, the files can be pre-sorted and creating a Result file with Only records from Filea that Match Records from Fileb would suffice. However, the output Result file MUST be in the SAME FORMAT as FILEA. Both FileA and FileB are LRECL=10004, RECFM=VB, DSORG=PS. The Result has to be in the same format with the same record lengths as the Original records from FileA. How can using JoinKeys to create 1 fixed length record help me obtain a Result file in VB format? I appreciate your input and I admit that I have never used Joinkeys, so pardon my ignorance here. However, would your solution, given Pre-Sorted files, provide a result file that includes all the records from fileA that match position 1-6 (Yes - for Variable records I would specify positions 5-10 in the sort cards) of FileB?
Thanks for your assistance and your expertise.
I know that I can achieve my results by using ICETOOLS to basically combine both files, the use a SPLICE to identify only the keys from FileB, then do an ALLDUPS then remove unwanted records by selecting only those records Flagged by the SPLICE. But as I stated, this is basically combining both files into one big file, which is simply too inefficient, then eliminating unwanted records. This works fine for smaller files. However, when working with a FileA that is 40million+ records vs a FileB that has about 1000 records, the result file should be only about 800,000 records which is far shorter than the 40Million+ records from Filea.
Also, the requirement that the format of the Result file, including record lengths of all records be the same as FileA (LRECL=10004, RECFM=VB, DSORG=PS) is an issue for me since I would normally add an extra byte (if it was FB) to the end of the file to indicate a FileB record and use that byte to SPLICE into all the matching records. This will allow me to remove all dups on filea that are not on fileb. But this also requires me to basically combine both files into one with an ALLDUPS, creating a file that contains about 39 million records that I do not need.
This is what I've done in the past, but this won't work because of the file size:
Step 1: Copy FILEB to TMP. TMP LRECL is 1 byte bigger than FILEB to accomodate a "B" flag that I will insert for FILEB records. Also copy FileA to TMP1 to add this extra byte.
Step2: Concatenate TMP and TMP1 and SELECT ALLDUPS.
This will result in One file that has all duplicates from the combined files. Making sure that the records from FileB result at the top of each Group in Sorted Sequence.
Step3. SPLICE the records from Step2 so that I put the "B" flag on all records that Match the Keys from FILEB.
Step4. Remove all unwanted records by selecting only those records flagged with a "B" (i.e. INCLUDE COND=...)
However, to do this for 40 million records is not very efficient. What I'm looking for is a way to achieve this without having to create a "Combined" file. Also, the variable LRECL is an issue for me since I need to persist the LRECL of the Original record from FILEA - I basically have no place to put the "B" without losing the LRECL on the VB file.
I appreciate any assistance and I appologize if my description is not detailed enought.
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Thu May 03, 2012 12:41 pm Post subject:
santlou,
If your intention is to just get the matched records from fileA then it is very easy with Joinkeys. The following DFSORT JCL will give all the records from FileA which has matching record in FILEB.
Huh? Could below be the issue? What is the DD name(s) for your input files?
Code:
ICE005A 0 STATEMENT DEFINER ERROR
ICE056A 0 SORTIN NOT DEFINED
ICE751I 0 C5-K90013 C6-K90013 C7-K90000 C8-K90013 E7-K24705
Quote:
A DD statement for a specified JOINKEYS F1 or F2 ddname was not
present. The F1 ddname is SORTJNF1 (or ccccJNF1 if SORTDD=cccc is in
effect) if FILE=F1 or FILES=F1 is specified. The F2 ddname is SORTJNF2
(or ccccJNF2 if SORTDD=cccc is in effect) if FILE=F2 or FILES=F2 is
specified.
Joined: 01 Feb 2007 Posts: 1075 Topics: 7 Location: At Home
Posted: Mon May 07, 2012 10:56 am Post subject:
I don't think JOINKEYS is available in 1.5 except, perhaps, with a PTF. I believe it is outdated and unsupported now. May be mis-remembering posts ona nother forum but Kolusu will set us all straight. _________________ Utility and Program control cards are NOT, repeat NOT, JCL.
Since the JOINKEYS are not recognized, DFSORT throws them out and expects a SORTIN DD statement since the INA and INB DD statements that I am referencing in my F1 and F2 parameters of my JOINKEYS statements is also ignored by DFSORT because it does not know what JOINKEYS is.
According to what I'm seeing in the DFSORT docs, I should not need a SORTIN statement.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum