MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Remove duplicates from a string.

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
sjetty
Beginner


Joined: 28 Feb 2017
Posts: 8
Topics: 2

PostPosted: Wed Dec 13, 2017 9:45 am    Post subject: Remove duplicates from a string. Reply with quote

Hi,

I have to remove duplicates from a string , let says a string 'Sivaa' is in first ten bytes of a record. So I want to remove duplicate i.e. 'a' from the string. Can we achieve this using sort.

Thanks,
Siva.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12357
Topics: 75
Location: San Jose

PostPosted: Wed Dec 13, 2017 11:27 am    Post subject: Reply with quote

sjetty,

You want to remove duplicates while sorting the data horizontally which is kinda of weird.

Either way what are the rules of removing the duplicates?

SIVAA = SIVA
SIVAS = SIVA
SSSSS = S

Do you need to remove the duplicates alphabets if it occurs anywhere in the first 10 bytes or only if the duplicate alphabets are next to each other?

SIVAA = A is a duplicate as it is in adjacent positions 4 and 5
SIVAS = S is not duplicate as it is at position 1 and 5

What is the LRECL and RECFM of the file? Is the Data always Character data ?
_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Nic Clouston
Advanced


Joined: 01 Feb 2007
Posts: 1075
Topics: 7
Location: At Home

PostPosted: Wed Dec 13, 2017 11:41 am    Post subject: Reply with quote

What if the string is HARRY ? Should it remain as HARRY or be changed to HARY?
_________________
Utility and Program control cards are NOT, repeat NOT, JCL.


Last edited by Nic Clouston on Thu Dec 14, 2017 8:20 am; edited 1 time in total
Back to top
View user's profile Send private message
sjetty
Beginner


Joined: 28 Feb 2017
Posts: 8
Topics: 2

PostPosted: Thu Dec 14, 2017 2:21 am    Post subject: Reply with quote

Hi Kolusu,

Input file LRECL = 80 & RECFM = FB and we need to remove dups if they occur anywhere i.e. maybe adjacent or may not be like below. We need to retain only first occurrence in the string.

SIVAA = SIVA
SIVAS = SIVA
SIVAI = SIVA

They are just examples but actual data is not representing names in first 10 bytes.

Thanks,
Siva.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12357
Topics: 75
Location: San Jose

PostPosted: Fri Dec 15, 2017 11:20 am    Post subject: Reply with quote

sjetty wrote:

They are just examples but actual data is not representing names in first 10 bytes.

Thanks,
Siva.


Siva,

I asked you a question earlier if the data is ALL character data and you never answered.

I am guessing that you need to remove the dups for not just the 10 bytes but for the entire 80 bytes.

If it is just 10 bytes and character data, then it can be done in a single pass of data with subtracting the binary values and then validating them.

Another alternative approach is to use RESIZE or "/" to split single record into 10 records and then sort on it to remove the duplicates and then assemble it back again. This would require 2/3 passes of data.
_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
sjetty
Beginner


Joined: 28 Feb 2017
Posts: 8
Topics: 2

PostPosted: Wed Dec 20, 2017 9:37 am    Post subject: Reply with quote

Hi Koulsu,

The data is in character format. I was trying for first option you mentioned that can be done in single pass but I am not sure how we can achieve i.e subtract the binary values & validate them. If possible can you please show me an example for this that would be helpful.

Thanks,
Siva
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12357
Topics: 75
Location: San Jose

PostPosted: Wed Dec 20, 2017 12:19 pm    Post subject: Reply with quote

sjetty wrote:
Hi Koulsu,

The data is in character format. I was trying for first option you mentioned that can be done in single pass but I am not sure how we can achieve i.e subtract the binary values & validate them. If possible can you please show me an example for this that would be helpful.

Thanks,
Siva


Siva,

You subtract byte 1 with byte 2 thru 10 individually treating them as binary. So if there is a duplicate alphabet the subtraction will result in X'00' and you repeat this for every individual byte. so the number of subtractions will 9 , 8, 7 ....

The validation is done using a CHANGE command. Try this DFSORT JCL which will give you the desired results

Code:

//STEP0100 EXEC PGM=SORT                                                 
//SYSOUT   DD SYSOUT=*                                                   
//SORTIN   DD *                                                         
----+----1----+----2----+----3----+----4----+----5----+----6----+----7---
SJETTY     WANTS TO REMOVE DUPLICATES FROM 1ST 10 BYTES                 
VREELAND                                                                 
KOLUSU                                                                   
DAVE BACH                                                               
ARUN                                                                     
ALBERT                                                                   
CLOUSTON                                                                 
KEEP                                                                     
SOMARAJAN                                                               
WILLIAM                                                                 
HARRY                                                                   
SSS                                                                     
A B C D E                                                               
//SORTOUT  DD SYSOUT=*                                                   
//SYSIN    DD *                                                         
  OPTION COPY                                                           
  ALTSEQ CODE=(407B)           $ CHANGE ' ' TO '#' FOR EMBED ' ' SAVE   
  INREC IFOUTLEN=80,                                               
        IFTHEN=(WHEN=INIT,                                         
        OVERLAY=(081:01,10,JFY=(SHIFT=LEFT,TRAIL=C'@@@@@@@@@@',     
                                LENGTH=10),                         
                 081:81,10,TRAN=ALTSEQ,                             
                                                                   
                 091:81,1,BI,SUB,82,01,BI,BI,LENGTH=1,             
                     81,1,BI,SUB,83,01,BI,BI,LENGTH=1,             
                     81,1,BI,SUB,84,01,BI,BI,LENGTH=1,             
                     81,1,BI,SUB,85,01,BI,BI,LENGTH=1,             
                     81,1,BI,SUB,86,01,BI,BI,LENGTH=1,             
                     81,1,BI,SUB,87,01,BI,BI,LENGTH=1,             
                     81,1,BI,SUB,88,01,BI,BI,LENGTH=1,             
                     81,1,BI,SUB,89,01,BI,BI,LENGTH=1,             
                     81,1,BI,SUB,90,01,BI,BI,LENGTH=1,             
                                                                   
                 090:099,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),   
                 089:098,1,CHANGE=(1,X'00',C' '),NOMATCH=(89,1),   
                 088:097,1,CHANGE=(1,X'00',C' '),NOMATCH=(88,1),   
                 087:096,1,CHANGE=(1,X'00',C' '),NOMATCH=(87,1),   
                 086:095,1,CHANGE=(1,X'00',C' '),NOMATCH=(86,1),   
                 085:094,1,CHANGE=(1,X'00',C' '),NOMATCH=(85,1),   
                 084:093,1,CHANGE=(1,X'00',C' '),NOMATCH=(84,1),   
                 083:092,1,CHANGE=(1,X'00',C' '),NOMATCH=(83,1),   
                 082:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(82,1),   
                                                                   
                 091:8X,                                           
                 091:82,1,BI,SUB,83,01,BI,BI,LENGTH=1,             
                     82,1,BI,SUB,84,01,BI,BI,LENGTH=1,             
                     82,1,BI,SUB,85,01,BI,BI,LENGTH=1,             
                     82,1,BI,SUB,86,01,BI,BI,LENGTH=1,             
                     82,1,BI,SUB,87,01,BI,BI,LENGTH=1,             
                     82,1,BI,SUB,88,01,BI,BI,LENGTH=1,             
                     82,1,BI,SUB,89,01,BI,BI,LENGTH=1,             
                     82,1,BI,SUB,90,01,BI,BI,LENGTH=1,             
                                                                   
                 090:098,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),     
                 089:097,1,CHANGE=(1,X'00',C' '),NOMATCH=(89,1),     
                 088:096,1,CHANGE=(1,X'00',C' '),NOMATCH=(88,1),     
                 087:095,1,CHANGE=(1,X'00',C' '),NOMATCH=(87,1),     
                 086:094,1,CHANGE=(1,X'00',C' '),NOMATCH=(86,1),     
                 085:093,1,CHANGE=(1,X'00',C' '),NOMATCH=(85,1),     
                 084:092,1,CHANGE=(1,X'00',C' '),NOMATCH=(84,1),     
                 083:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(83,1),     
                                                                     
                 091:7X,                                             
                 091:83,1,BI,SUB,84,01,BI,BI,LENGTH=1,               
                     83,1,BI,SUB,85,01,BI,BI,LENGTH=1,               
                     83,1,BI,SUB,86,01,BI,BI,LENGTH=1,               
                     83,1,BI,SUB,87,01,BI,BI,LENGTH=1,               
                     83,1,BI,SUB,88,01,BI,BI,LENGTH=1,               
                     83,1,BI,SUB,89,01,BI,BI,LENGTH=1,               
                     83,1,BI,SUB,90,01,BI,BI,LENGTH=1,               
                                                                     
                 090:097,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),     
                 089:096,1,CHANGE=(1,X'00',C' '),NOMATCH=(89,1),     
                 088:095,1,CHANGE=(1,X'00',C' '),NOMATCH=(88,1),     
                 087:094,1,CHANGE=(1,X'00',C' '),NOMATCH=(87,1),     
                 086:093,1,CHANGE=(1,X'00',C' '),NOMATCH=(86,1),     
                 085:092,1,CHANGE=(1,X'00',C' '),NOMATCH=(85,1),     
                 084:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(84,1),     
                                                                     
                 091:6X,                                             
                 091:84,1,BI,SUB,85,01,BI,BI,LENGTH=1,               
                     84,1,BI,SUB,86,01,BI,BI,LENGTH=1,               
                     84,1,BI,SUB,87,01,BI,BI,LENGTH=1,               
                     84,1,BI,SUB,88,01,BI,BI,LENGTH=1,               
                     84,1,BI,SUB,89,01,BI,BI,LENGTH=1,               
                     84,1,BI,SUB,90,01,BI,BI,LENGTH=1,               

                 090:096,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),   
                 089:095,1,CHANGE=(1,X'00',C' '),NOMATCH=(89,1),   
                 088:094,1,CHANGE=(1,X'00',C' '),NOMATCH=(88,1),   
                 087:093,1,CHANGE=(1,X'00',C' '),NOMATCH=(87,1),   
                 086:092,1,CHANGE=(1,X'00',C' '),NOMATCH=(86,1),   
                 085:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(85,1),   
                                                                   
                 091:5X,                                           
                 091:85,1,BI,SUB,86,01,BI,BI,LENGTH=1,             
                     85,1,BI,SUB,87,01,BI,BI,LENGTH=1,             
                     85,1,BI,SUB,88,01,BI,BI,LENGTH=1,             
                     85,1,BI,SUB,89,01,BI,BI,LENGTH=1,             
                     85,1,BI,SUB,90,01,BI,BI,LENGTH=1,             
                                                                   
                 090:095,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),   
                 089:094,1,CHANGE=(1,X'00',C' '),NOMATCH=(89,1),   
                 088:093,1,CHANGE=(1,X'00',C' '),NOMATCH=(88,1),   
                 087:092,1,CHANGE=(1,X'00',C' '),NOMATCH=(87,1),   
                 086:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(86,1),   
                                                                   
                 091:4X,                                           
                 091:86,1,BI,SUB,87,01,BI,BI,LENGTH=1,             
                     86,1,BI,SUB,88,01,BI,BI,LENGTH=1,             
                     86,1,BI,SUB,89,01,BI,BI,LENGTH=1,             
                     86,1,BI,SUB,90,01,BI,BI,LENGTH=1,             
                                                                   
                 090:094,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),   
                 089:093,1,CHANGE=(1,X'00',C' '),NOMATCH=(89,1),   
                 088:092,1,CHANGE=(1,X'00',C' '),NOMATCH=(88,1),   
                 087:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(87,1),   
                                                                   
                 091:3X,                                           
                 091:87,1,BI,SUB,88,01,BI,BI,LENGTH=1,             
                     87,1,BI,SUB,89,01,BI,BI,LENGTH=1,             
                     87,1,BI,SUB,90,01,BI,BI,LENGTH=1,             
                                                                   
                 090:093,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),   
                 089:092,1,CHANGE=(1,X'00',C' '),NOMATCH=(89,1),   
                 088:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(88,1),   
                                                                   
                 091:2X,                                           
                 091:88,1,BI,SUB,89,01,BI,BI,LENGTH=1,             
                     88,1,BI,SUB,90,01,BI,BI,LENGTH=1,             
                                                                   
                 090:092,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),   
                 089:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(89,1),   
                                                                   
                 091:X,                                           
                 091:89,1,BI,SUB,90,01,BI,BI,LENGTH=1,             
                                                                   
                 090:091,1,CHANGE=(1,X'00',C' '),NOMATCH=(90,1),   
                                                                   
                 081:081,10,SQZ=(SHIFT=LEFT))),                   
                                                                   
        IFTHEN=(WHEN=INIT,                                         
                FINDREP=(IN=(C'#',C'@'),OUT=C' ')),               
        IFTHEN=(WHEN=INIT,                                         
               OVERLAY=(01:81,10))                                 
//*   

_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group