Posted: Fri Sep 15, 2006 3:18 pm Post subject: 2 file comparision using ICETOOL
I need help to compare 2 files and create a third file that should contain only the changed records and new records, using ICETOOL or SORT.
Except the first 7 bytes any field value may change in File-2. All fields are alphanumeric and the file is 500 bytes fixed rec length.
Third file must be created from file-2.
Any help ASAP please !!!
Field-1: From 1, 7 bytes
Field-2: From 8, 10 bytes
Field-3: From 18, 2 bytes
Field-4: From 20, 2 bytes
Field-5: From 22, 2 bytes
Field-6: From 24, 2 bytes
Field-7: From 26, 2 bytes
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Fri Sep 15, 2006 4:13 pm Post subject:
It would have helped if you'd showed your expected output.
If I understand correctly, you want the records from file2 that are not in file1 based on the first 27 bytes. If so, here's a DFSORT/ICETOOL job that will give you that:
_________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Thank you for your immediate response. I will try with my input files and give you the feed back.
I missed to include some additional requirement.
(1) As a first step I want to eliminate the records that are common in both the Files. Then
(2) I need to add the date in ccyymmdd in last 8 bytes in all the records (for file - 1 previous month date and for File-2 current date)
(3) Also I need to find the file that is droped from FILE-2.
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Mon Sep 18, 2006 2:26 pm Post subject:
Please show the expected output.
Please explain more clearly what you want to do.
Quote:
I want to eliminate the records that are common in both the Files.
Common on which positions?
Quote:
I need to add the date in ccyymmdd in last 8 bytes in all the records
You mean in positions 28-35 of the output records?
Quote:
I need to find the file that is droped from FILE-2.
You mean the "records that are dropped from File2"? What exactly does that mean? Once you "find" these records, what do you want to do with them? _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Tue Sep 19, 2006 2:46 pm Post subject:
Your new post really doesn't clarify things. You didn't answer my previous questions and you didn't explain the rules for determining if a record was "dropped", "added" or "updated". And I still don't know which bytes you're matching on.
Quote:
Q) If I use _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
The first and second records are both from file1, but you label one "dropped" and remove the other one. The third and fourth records are both from file2, but you label one "updated" and one "added". You need to explain the rules you're using to determine if a record is "dropped", "updated" or "added". I suspect it has something to do with comparing other bytes than all 500, but I can't read your mind to tell what the rules are, so you need to clarify them. I'm trying to help, but you're not giving me enough to go on. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Joined: 02 Dec 2002 Posts: 1618 Topics: 31 Location: San Jose
Posted: Tue Sep 19, 2006 5:19 pm Post subject:
Quote:
Now we may take the first 7 bytes as the key (that will not change)
If you use the first 7 bytes, then the only record without a match is the 8888888 record from file2 (e.g. you have two 4444444 records in file1 and one 4444444 records in file2, so those match). So you can't get the output records you show based on using 7 bytes as the key.
The only way I can make sense out of what you've shown so far is to use the following rules:
Compare the records in file1 vs file2 using all 500 bytes as the "key". Keep the records that don't match and apply these rules to them:
1. If the first 16 bytes are the same for a record from file1 and file2, but a byte anywhere in positions 17-500 is different for the two records, put the record from file2 in output file2 and mark it as "updated".
2. If the first 16 bytes are only found in a record from file1, put it in output file1 and mark it as "dropped"
3. If the first 16 bytes are only found in a record from file2, put it in output file2 and mark it as "added".
That's the kind of thing I mean when I ask for the rules.
By those rules, when we match on all 500 bytes, the non-matching records are:
The first record satisfies rule 2.
The second and third records satisfy rule 1.
The fourth record satisfies rule 3.
If those are the rules you want to use, let me know. If those are not the rules you want to use, then tell me what rules you do want to use and show how they match the input and output records in your example. _________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
You are correct, these are the rules that i need to use. I have more than a dozen files with different key lengths to do the same process. Also there should not be any DUPS in the output files.
_________________ Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum