Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Fri Feb 12, 2010 11:24 am Post subject: Re: remove duplicates from file with LRECL 3709
Martin,
The limitation of 1500 is for a single ON parm. You can have multiple ON parms, but the total key length for comparison should not exceed 4088 bytes. Try this DFSORT/ICETOOL job
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Fri Feb 12, 2010 11:54 am Post subject:
Martin wrote:
what if the LRECL exceeds 4088 bytes, How does SORT handle this?
well there is a trick to handle if the total key length exceeds 4088. However it would involve multi pass of data. _________________ Kolusu
www.linkedin.com/in/kolusu
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Fri Feb 12, 2010 1:42 pm Post subject:
Martin,
how do you plan to compare duplicates? lets say file 1 has 4 duplicates and file 2 has 12 duplicates , what do you do in this case? _________________ Kolusu
www.linkedin.com/in/kolusu
Note : File 1 is always a subset of file 2. As mentioned above the first 4 records from File1 will also be present in file2. In addition File 2 can have 8 more such records which should NOT be removed .
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Fri Feb 12, 2010 2:26 pm Post subject:
Martin,
As is the matching with longer keys is complicated and you threw in monkey wrench into it now with duplicates. I will try if I can come up with an elegant solution. _________________ Kolusu
www.linkedin.com/in/kolusu
As is the matching with longer keys is complicated and you threw in monkey wrench into it now with duplicates. I will try if I can come up with an elegant solution.
Thanks !! If you are unable to come up with a solution, Please show me how to match the long keys.
Joined: 26 Nov 2002 Posts: 12378 Topics: 75 Location: San Jose
Posted: Fri Feb 12, 2010 5:27 pm Post subject:
Martin wrote:
kolusu wrote:
Martin,
As is the matching with longer keys is complicated and you threw in monkey wrench into it now with duplicates. I will try if I can come up with an elegant solution.
Thanks !! If you are unable to come up with a solution, Please show me how to match the long keys.
Martin,
I did not get a chance to work on your problem and IMHO I think it is NOT possible to do it with sort given the number of duplicates involved in each file.
Here is a sample DFSORT JCL which will remove duplicate records from a file of LRECL 4500.
A brief explanation of the job.
step0100 : Using an INREC we add a flag to all records at end with an "U" for Unique. Using SELECT operator we pick the LASTDUP from the first 4084 bytes (max value) and put them in DUP file (these are the records for potential duplicates) and we need to check the bytes from 4085 to the end of the file to see if they are indeed duplicates. We also override the flag to "D" for duplicate
If the first 4084 bytes aren't equal , then we don't have to perform any validation as they canNOT be dups , so we write them to UNQ file.
Step0200 : Now concatenate the above files together once again (Dup file should be first in the list ) and sort them again on the first 4084 bytes with equals option. Using WHEN=group , we push the D flag record on to the next record.
using an omit condition we perform the validation for bytes from 4085 to the end of the file in chunks of 256 bytes and if they are equal then we eliminate and if they aren't equal we write them out to the output file.
Throwing out something which may or may not be possible...
Steps
1) Break your input record (23200 bytes) into 4000 bytes each. This will create 6 records for each input record. Make sure your LRECL is 4000 bytes.You may need multiple pass for each input file. Also create record-id for each record which will be used later to merge them back.
Here is what I tested for 30 bytes record and breaking it into 23 bytes (15 bytes data + 8 bytes record-id). Once again I dont know if its correct or not.
Here in the IFTHEN condition use any valid condition to populate record-id. Can we use entire records greater than spaces?? Don't know...
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum