MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

remove dup but keep mapping

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
videlord
Beginner


Joined: 09 Dec 2004
Posts: 147
Topics: 19

PostPosted: Fri Jan 07, 2005 4:15 am    Post subject: remove dup but keep mapping Reply with quote

I want remove duplicate reocord in a data set, but i want to keep the mapping of removed record. 1 input => 2 outputs, one is from input without duplication and seq number, the other is seqnum of original file and seq num in new file.
for example:

INPUT:
AAA
BBB
AAA
CCC
BBB

OUTPUT1:
AAA 01
BBB 02
CCC 03

OUTPUT2:
01 01
02 02
03 03
04 01
05 02

Is there an easy way to do it?
Thanks.

Charlie
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Fri Jan 07, 2005 6:56 am    Post subject: Reply with quote

Kolusu,

The links you posted actually prints the no. of occurences of each duplicate key but Videlord is not looking for that. He wants to insert seqnum in the file and then eliminate the duplicates on the key portion and write to a file - say SORTOUT. Later, in the other file SORTXSUM which contains the eliminated records he wants to map each duplicate key to the seqnum of its first occurance (ie. the seqnum in SORTOUT).

Please correct me if I'm wrong.

Thanks,
Phantom
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12357
Topics: 75
Location: San Jose

PostPosted: Fri Jan 07, 2005 9:12 am    Post subject: Reply with quote

Phantom,

You are right . I did not read the question properly. Sorry.

Videlord,

Your output2 is misleading. can you tell us more as to what your output should be ? especially 3rd and 4th record in the output2

Kolusu
_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
videlord
Beginner


Joined: 09 Dec 2004
Posts: 147
Topics: 19

PostPosted: Fri Jan 07, 2005 9:34 pm    Post subject: Reply with quote

Kolusu,

Thanks.

The output2 is the file with the same number of records as input.
The first field is the seq# of input, the 2nd field the seq# in output1.
We can use table to describe my problem again:
input - TableA (COL1, SEQNO)
output1 - TableB (COL1, SEQNO)
output2 - TableC (SEQNO1,SEQNO2)
SEQNO in TableA and TableB are generated by database

TableB Data:
Insert into tableB (COL1)
(select distinct COL1 from TableA)

TableC Data:
Insert into tableC
(select TableA.SEQNO, TableB.SEQNO from TableA, TableB
where TableA.COL1 = TableB.COL2)

I hope it described my problem clearly.
I have one idea,
Step 1. copy input1 add seqno.
2. sort input1 add seqno keepnodups
3. Splice 2 & 1 generate the mapping
It seems some complex and not efficient.
Did you have any suggestion?

Thanks.

Charlie
Back to top
View user's profile Send private message
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Sat Jan 08, 2005 5:58 am    Post subject: Reply with quote

videlord,

I think you already gave the solution. I don't think this could be achieved using the traditional features of sort without using SPLICE. As you said, you can get the result in 1 Step - 2 Pass using ICETOOL / SYNCTOOL.

Code:

Pass 1:
a.  Use inrec and insert Seqnums
b.  Sort on key and remove dups using SUM FIELDS=NONE & XSUM or DISCARD(dd)

Pass 2:
a.  Use SPLICE to match the key & seqnums.


Kolusu / Frank,
your thoughts on this ? Do you feel that this could be done in a better way ?

Thanks & Regards,
Phantom
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Sat Jan 08, 2005 11:32 am    Post subject: Reply with quote

Quote:
The output2 is the file with the same number of records as input.
The first field is the seq# of input, the 2nd field the seq# in output1.


Charlie,

This doesn't seem to match what you actually show for output2.

Breaking it down:

>The first field is the seq# of input

Code:

Input
AAA 01
BBB 02
AAA 03
CCC 04
BBB 05


>output2 is the file with the same number of records as input.
>the 2nd field the seq# in output1.

Code:

OUTPUT1:
AAA 01
BBB 02
CCC 03


So I would expect the following for OUTPUT2:

Code:

01 01   01 from first AAA in input, 01 from AAA in output1
02 02   02 from first BBB in input, 02 from BBB in output1
03 01   03 from second AAA in input, 01 from AAA in output1
04 03   04 from CCC in input, 03 from CCC in output1
05 02   05 from second BBB in input, 02 from BBB in output1


What am I misunderstanding here? Can you please break down the input and output in the same way I did here to show how you want it done.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
videlord
Beginner


Joined: 09 Dec 2004
Posts: 147
Topics: 19

PostPosted: Sun Jan 09, 2005 8:32 pm    Post subject: Reply with quote

Frank,

Oh, sorry, i think i make a mistake when i type the INPUT file, the 4th record of input should be AAA instead of CCC.

So your explaination of Output2 is correct.

Thanks.

Charlie
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Mon Jan 10, 2005 11:11 am    Post subject: Reply with quote

Quote:
Oh, sorry, i think i make a mistake when i type the INPUT file, the 4th record of input should be AAA instead of CCC.



Charlie,

That still doesn't make sense. Please start over and show the CORRECT input, output1 and output2 broken down in the same way I did it, to show how you want it done. Remember: garbage in, garbage out.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
videlord
Beginner


Joined: 09 Dec 2004
Posts: 147
Topics: 19

PostPosted: Tue Jan 11, 2005 12:58 am    Post subject: Reply with quote

Frank,

Hope the following example can clearly describe my question.

INPUT: (total 8 records, added the seq#)
AAA 001
BBB 002
CCC 003
AAA 004
DDD 005
EEE 006
BBB 007
FFF 008

OUT1: (6 records no dup)
AAA 001
BBB 002
CCC 003
DDD 004
EEE 005
FFF 006

OUT2: (6 records)
001 001
002 002
003 003
004 001
005 004
006 005
007 002
008 006
Back to top
View user's profile Send private message
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Tue Jan 11, 2005 11:07 am    Post subject: Reply with quote

Ok, now it makes sense. But it will require multiple passes. I don't see any easier way to do it.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort


Last edited by Frank Yaeger on Tue Jan 11, 2005 4:47 pm; edited 1 time in total
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12357
Topics: 75
Location: San Jose

PostPosted: Tue Jan 11, 2005 3:59 pm    Post subject: Reply with quote

Videlord,

The following DFSORT/ICETOOL jcl will give you desired results. If your shop has syncsort then change the pgm name to synctool.

A brief explanation of the job. This is similar to the trick of sorting records with header and detail records explained in here.

http://www.mvsforums.com/helpboards/viewtopic.php?t=3432

The only difference here is that we do not have the Header. So we create a header for every record using sections and header3 parm.

The first operator (SORT) adds a seqnum and sorts on the key and creates a header for every key.

The second operator (COPY) takes this file and splits into 2 files viz.. header and detail records.

The third operator (SORT) concatenates these 2 files and sorts on the orginal seqnum and writes out the files.

Code:

//STEP0100 EXEC PGM=ICETOOL                               
//DFSMSG   DD SYSOUT=*                                   
//TOOLMSG  DD SYSOUT=*                                   
//IN       DD *                                           
AAA 001                                                   
BBB 002                                                   
CCC 003                                                   
AAA 004                                                   
DDD 005                                                   
EEE 006                                                   
BBB 007                                                   
FFF 008                                                   
//T1       DD DSN=&T1,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)
//T2       DD DSN=&T2,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)
//T3       DD DSN=&T3,DISP=(,PASS),SPACE=(CYL,(X,Y),RLSE)
//CON      DD DSN=*.T2,VOL=REF=*.T2,DISP=SHR             
//         DD DSN=*.T3,VOL=REF=*.T3,DISP=SHR             
//OUT1     DD SYSOUT=*                                   
//OUT2     DD SYSOUT=*                         
//TOOLIN   DD *                                                 
  SORT FROM(IN)   USING(CTL1)                                   
  COPY FROM(T1)   USING(CTL2)                                   
  SORT FROM(CON)  USING(CTL3)                                   
//CTL1CNTL DD *                                                 
  INREC FIELDS=(1,80,SEQNUM,8,ZD)                               
  SORT FIELDS=(1,3,CH,A)                                         
  OUTREC FIELDS=(1,88,C'Z')                                     
  OUTFIL FNAMES=T1,REMOVECC,                                     
  SECTIONS=(1,3,HEADER3=(1,88,C'H'))                             
//CTL2CNTL DD *                                                 
  INREC  FIELDS=(1,89,SEQNUM,8,ZD)                               
  OUTFIL FNAMES=T2,INCLUDE=(89,1,CH,EQ,C'H'),                   
         OUTREC=(1,88,8C'0',SEQNUM,8,ZD,C'H')                   
  OUTFIL FNAMES=T3,SAVE,                                         
         OUTREC=(1,88,90,8,SEQNUM,8,ZD,C'Z')                     
//CTL3CNTL DD *                                                 
  OPTION EQUALS                                                 
  INREC  FIELDS=(1,104,(89,8,ZD,SUB,97,8,ZD),M11,LENGTH=8,105,1)
  SORT   FIELDS=(81,8,ZD,A)                                     
  OUTFIL FNAMES=OUT1,INCLUDE=(113,1,CH,EQ,C'H'),                 
  OUTREC=(1,3,X,105,8)                                           
  OUTFIL FNAMES=OUT2,SAVE,                                       
  OUTREC=(81,8,X,105,8)                                         
/*


The output from this job is :

OUT1:
Code:

AAA 00000001
BBB 00000002
CCC 00000003
DDD 00000004
EEE 00000005
FFF 00000006


OUT2:
Code:

00000001 00000001
00000002 00000002
00000003 00000003
00000004 00000001
00000005 00000004
00000006 00000005
00000007 00000002
00000008 00000006


Hope this helps...

Cheers

Kolusu
_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Phantom
Data Mgmt Moderator
Data Mgmt Moderator


Joined: 07 Jan 2003
Posts: 1056
Topics: 91
Location: The Blue Planet

PostPosted: Wed Jan 12, 2005 1:07 am    Post subject: Reply with quote

Kolusu,
Excellent Solution. Gr8 job.

Code:

The only difference here is that we do not have the Header. So we create a header for every record using sections and header3 parm.


I was thinking of the same - using headers, but could not guess a way to insert a header ! This logic could be used in many cases. I have a similar situation - matching a unique key against a file with dup keys. This logic will come in handy in those cases.

Thanks a lot,
Phantom
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group