1. master file - which usually is very huge and contains duplicates on the field which we have to use to match (Say Account number).
2. a daily file which contains a list of account nos (Unique).
Now, we need to match these two datasets and extract the accounts listed in the daily file from the master file (all occurances).
Could you please create & run a test job and provide me the runtime & cpu time statistics. Assume, file 1 (master file - which has dups) has nearly 5 million records and file 2 has nearly 3000 unique accounts.
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Fri Dec 24, 2004 11:16 am Post subject:
Phantom,
I can do that, but the runtime and cpu time statistics will reflect the hardware and software I'm running on, which may or may not be similar to the hardware and software you'd be running on. So any timing comparisons between what I get here for the DFSORT job and what you get there for the COBOL job may or may not mean anything. I suppose if I could duplicate your COBOL job here, then I could get a valid comparison, but I'm NOT a COBOL programmer, so you'd have to give me everything I need for the setup. I'd be happy to discuss this with you further offline (yaeger@us.ibm.com).
However, if you want me to do the run anyway, then I need to know what you want me to use for the RECFM and LRECL of each file, and the starting position and length you want me to use for the account number. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
Posted: Fri Dec 24, 2004 11:34 am Post subject:
Thanks Frank,
I'll send you the COBOL code and the JCL once I go back to office on monday. Probably u can try to run the DFSORT version now. Please find the dataset properties below.
Code:
Master File: nearly 5 Million Records
LRECL = 300
RECFM = FB
ACCT POS = 1 to 9 (9 characters - Alphanumeric) - Contains dups
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Fri Dec 24, 2004 11:54 am Post subject:
Phantom,
Ok. Please send me the compile and linkedit JCL for the COBOL program as well. As I said, I'm not a COBOL programmer. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Thu Dec 30, 2004 7:28 pm Post subject:
Phantom,
I never received your COBOL program. Did you send it? _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
Posted: Fri Dec 31, 2004 12:41 am Post subject:
Frank,
I was on vacation for the past four days. I was working from home. So didn't get a chance to send you the COBOL code & JCL. Sorry for the delay. I'm back in office today. I will try to send them by tonight.
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Sat Jan 01, 2005 1:27 pm Post subject:
Phantom,
I received your COBOL program. It appears that it's pretty well optimized for doing what it does in minimal CPU time by decreasing the number of compares it has to do (this is the kind of situation where a well-written program with optimized logic for a specific task can gain efficiency over a general purpose utility).
I ran the experiment using three different DFSORT IFTHEN methods (all three set up the IFTHEN clauses dynamically from the transaction file):
Method 1 is the brute force method of using two IFTHENs, each with 1500 conditions, and testing each master account number against all 3000 conditions.
Method 2 uses two IFTHENs, each with 1500 conditions, but only tests each master account number against 1500 conditions.
Method 3 uses six IFTHENs, each with 500 conditions, and only tests each master account number against 500 conditions.
So methods 2 and 3 improve EXCPs very signifcantly and elapsed time significantly, but degrade CPU time significantly.
By extrapolating to more IFTHENs, each with less conditions, we might be able to improve things even more, but the setup becomes more tedious as we do that. We could discuss that offline if you want to pursue it. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum