Posted: Fri Jan 13, 2006 7:01 am Post subject: Optimum Execution time to process huge data.
Hi all,
This is what is the requirement.
I have a Tape file which has large no. of records say file1. I have another sequential file which will have less no. of records say file2. What I basically want is, read each and every record from file1 and check whether it is present in file2. If it is present then pull the record from file2 and then write the o/p file.
This read will be carried out until the match found. Once the record is written, then i need to take the next record from the file2 and continue the process. Both the files are in sorted order based on the specific key.
We already have a Program in place to process the above request. But the issue here is the program takes almost 90 min of execution time. Which is eating considerable amount of time during the production window. So, we thought of reducing the execution time for this.
What is the best way to do this ?. I tried with Sort, still sort takes around 80 min of execution time.
Your view and ideas are welcome.
(I tried searching for the similar post here, did not find any matching post for my criteria. If I miss something, please let me know the link).
Joined: 07 Jan 2003 Posts: 1056 Topics: 91 Location: The Blue Planet
Posted: Fri Jan 13, 2006 7:29 am Post subject:
Mouli,
Quote:
I tried with Sort, still sort takes around 80 min of execution time.
Well, there are different ways to achieve this extraction process using sort. What sort product do you have and What is its version? Can you show us the sort job you used - we can see if something could be tweaked in it.
Certain commands work only with a specific version of sort. So, please provide complete information about the sort product and its version. (To get the version, just code a dummy sort job, and look at the first line of sysout).
Thanks.
Here is the sort that I am using. This is a simple sort to extract particular record from the tape. Basically we need to extract more than one record from the tape file on the same position. I am just quoting one just for example.
Your first post talks about a problem you have. And now your second post shows some piece of JCL which little relation to the problem described in your first post. Why don't you show us the FULL procedure (all the JCL etc) that you are using, and having a problem with? ie: Post one talks about two files being compared, and yet post two has one file, and one hardcoded entry.
Have you looked at the Syncsort "Programmers Guide" manual? I have just had a quick look now, and there is an entire Chapter called "Performance Considerations". Undoubtedly your shop has this manual, and ferreting out the whereabouts of this manual should not be a problem.
Yes. In my first post, I mentioned about the my requirement. Also I updated about the other possibility that I tried for the requirement(Post 2).
O.k. Let me try to put my requirement little detailed.
We already have a Program in Production which basically has 1 main driver file and 3 other files. The three files are in Tape and the other one is in DASD. The 3 tapes files have more no. of records.
The logic of the program is, it is reading the input file sequentilly and from that specific record, it is reading the Tape file one after another until match found. Once the match is found then the records are pulled from the tape file and written to the new sequential file (seperate file for each tape).
This is continuning for other two files. Once the 3 files are completed, the program terminates. That's It.
As I mentioned, each Tape file is taking around approx 90 min of execution time. So, for the job to complete it is almost aprox 270 min
of execution is required.
My questions was
a) Is there any "other" of reducing the execution time for this job ?
One option that I had was to split the job to process the three files seperately. This will solve the problem. But the question we had was whether it is advisible to use multiple tape drives parallely in different jobs?. Can anyone let me know about using multiple tape drives parallely?
I hope, I have updated my requirement somewhat in clear picture. Let me know, if you have any questions. _________________ Regards,
Mouli
Instead of writing out (those three files) to tape, can you write out to KSDS instead?? Examine the overall system before just saying "No. If it is tape now then it must remain tape forever... someone else did it, therefore I am not allowed to change it..."
Imagine that on one of the tapes you have 100 million records, and there are only two records which match the small input file. One record is at the beginning of the tape, and the other is at the end. How clever is this?
Are you using buffers on your JCL? Look at "BUFNO", and experiment with the number of buffers. More is not necessarily better.
As for tape drives, how many physical and virtual drives have you got? Would Operations be happy for you to hog three physical ones at the same time? You need to speak to your local storage guru about this and ask for advice.
Ask about running in a special class in order to get priority over other jobs, assuming your stuff is more critical than the other jobs.
Splitting the file will really help, or think abt reducing the number of records ure processing like using PIPE processing. I don't know much abt PIPE processing but this should take care of parallel processing.
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum