MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Matching two datasets using DFSORT Dec 2004 release

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Fri Dec 24, 2004 10:10 am    Post subject: Matching two datasets using DFSORT Dec 2004 release Reply with quote

Frank,

I need your help on this. One of the common tasks that we do in COBOL is to match two datasets.

something like this:
http://www.mvsforums.com/helpboards/viewtopic.php?t=3403

1. master file - which usually is very huge and contains duplicates on the field which we have to use to match (Say Account number).

2. a daily file which contains a list of account nos (Unique).

Now, we need to match these two datasets and extract the accounts listed in the daily file from the master file (all occurances).

Could you please create & run a test job and provide me the runtime & cpu time statistics. Assume, file 1 (master file - which has dups) has nearly 5 million records and file 2 has nearly 3000 unique accounts.

Thanks in advance for your help,

Regards,
Phantom
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Fri Dec 24, 2004 11:16 am    Post subject: Reply with quote

Phantom,

I can do that, but the runtime and cpu time statistics will reflect the hardware and software I'm running on, which may or may not be similar to the hardware and software you'd be running on. So any timing comparisons between what I get here for the DFSORT job and what you get there for the COBOL job may or may not mean anything. I suppose if I could duplicate your COBOL job here, then I could get a valid comparison, but I'm NOT a COBOL programmer, so you'd have to give me everything I need for the setup. I'd be happy to discuss this with you further offline (yaeger@us.ibm.com).

However, if you want me to do the run anyway, then I need to know what you want me to use for the RECFM and LRECL of each file, and the starting position and length you want me to use for the account number.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Fri Dec 24, 2004 11:34 am    Post subject: Reply with quote

Thanks Frank,

I'll send you the COBOL code and the JCL once I go back to office on monday. Probably u can try to run the DFSORT version now. Please find the dataset properties below.

Code:

Master File:  nearly 5 Million Records
LRECL  =  300
RECFM  =  FB
ACCT POS  = 1 to 9 (9 characters - Alphanumeric) - Contains dups

Transaction File: approx 3000 Records
LRECL  = 80
RECFM  =  FB
ACCT POS  = 1 to 9 (9 characters - Alphanumeric) - UNIQUE


I will send the COBOL code to your IBM mail ID.

Thanks,
Phantom
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Fri Dec 24, 2004 11:54 am    Post subject: Reply with quote

Phantom,

Ok. Please send me the compile and linkedit JCL for the COBOL program as well. As I said, I'm not a COBOL programmer.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Thu Dec 30, 2004 7:28 pm    Post subject: Reply with quote

Phantom,

I never received your COBOL program. Did you send it?
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Fri Dec 31, 2004 12:41 am    Post subject: Reply with quote

Frank,

I was on vacation for the past four days. I was working from home. So didn't get a chance to send you the COBOL code & JCL. Sorry for the delay. I'm back in office today. I will try to send them by tonight.

Thanks,
Phantom
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Sat Jan 01, 2005 1:27 pm    Post subject: Reply with quote

Phantom,

I received your COBOL program. It appears that it's pretty well optimized for doing what it does in minimal CPU time by decreasing the number of compares it has to do (this is the kind of situation where a well-written program with optimized logic for a specific task can gain efficiency over a general purpose utility).

I ran the experiment using three different DFSORT IFTHEN methods (all three set up the IFTHEN clauses dynamically from the transaction file):

Method 1 is the brute force method of using two IFTHENs, each with 1500 conditions, and testing each master account number against all 3000 conditions.

Method 2 uses two IFTHENs, each with 1500 conditions, but only tests each master account number against 1500 conditions.

Method 3 uses six IFTHENs, each with 500 conditions, and only tests each master account number against 500 conditions.

Here are the results:

Code:

             EXCPs     CPU     Elapsed
COBOL pgm    80866    1.64         1.0
Method 1      1782   50.64         1.0
Method 2      1813   34.15         0.6
Method 3      2123   18.92         0.4


So methods 2 and 3 improve EXCPs very signifcantly and elapsed time significantly, but degrade CPU time significantly.

By extrapolating to more IFTHENs, each with less conditions, we might be able to improve things even more, but the setup becomes more tedious as we do that. We could discuss that offline if you want to pursue it.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group