MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Deduplication using DFSORT

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
vkphani
Intermediate


Joined: 05 Sep 2003
Posts: 483
Topics: 48

PostPosted: Mon Jun 28, 2004 11:53 pm    Post subject: Deduplication using DFSORT Reply with quote

Hi,

I have an input file with Record length : 32756.
I want to remove duplicate records from this file.
I used the following select statement to do this.

SELECT FROM(IN) TO(OUT) ON(1,32756,CH) FIRST DISCARD(SORTXSUM).

But it is not working.It is working if I select from 1 to 80th position only.Beyond 80th position, this select statement is not working.

Can anybody please help me on this.
Back to top
View user's profile Send private message Send e-mail
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12382
Topics: 75
Location: San Jose

PostPosted: Tue Jun 29, 2004 4:34 am    Post subject: Reply with quote

Vkphani,

With DFSORT R14 PTF UQ99331 the maximum length you can have for a Character data for an ON Condition is 1500 bytes and you can have a maximum of 10 conditions. So even with 10 conditions the total comes to only 15,000 bytes. So you cannot use the select statement for finding the duplicates. With Syncsort it is even lesser. The max for a character field is only 256.You are better off coding a pgm for this

If your input file is VB, then you can drop duplicates with each length regardless of what's in the record, you can use the RDW.

Code:

//TOOLIN    DD   *
  SELECT FROM(IN) TO(OUT) ON(1,4,BI) FIRST DISCARD(SORTXSUM)


Hope this helps...

Cheers

Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
vkphani
Intermediate


Joined: 05 Sep 2003
Posts: 483
Topics: 48

PostPosted: Tue Jun 29, 2004 5:00 am    Post subject: Reply with quote

Kolusu,

The following RDW is not elminating the duplicate records.

//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,4,BI) FIRST DISCARD(SORTXSUM)
Back to top
View user's profile Send private message Send e-mail
vkphani
Intermediate


Joined: 05 Sep 2003
Posts: 483
Topics: 48

PostPosted: Tue Jun 29, 2004 5:04 am    Post subject: Reply with quote

Kolusu,

This RDW is eliminating the records when fields from 1 to 4th position are same.
Back to top
View user's profile Send private message Send e-mail
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12382
Topics: 75
Location: San Jose

PostPosted: Tue Jun 29, 2004 5:10 am    Post subject: Reply with quote

vkphani,

Did you read this in my post?

Code:

you can drop duplicates with each length regardless of what's in the record


Kolusu
_________________
Kolusu
www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Tue Jun 29, 2004 10:37 am    Post subject: Reply with quote

Kolusu wrote
Quote:
So even with 10 conditions the total comes to only 15,000 bytes.


Actually, you can't go that high.

For DFSORT, the maximum number of bytes you can use for the "key" to check for duplicates is 4084 (with EQUALS). This is true whether you use ICETOOL's SELECT or DFSORT's SUM FIELDS=NONE.

Paneendra,

You can't use a key of 32756 bytes. The maximum key you can use is 4084 bytes (with EQUALS).

Do you really expect your records to be the same for 4084 bytes and then different after that?

With DFSORT R14 PTF UQ90053 (Feb, 2003), the limit for a CH ON field used with ICETOOL's SELECT is 1500 bytes. So to use the maximum of 4084 bytes, you would specify:

ON(1,1500,CH) ON(1501,1500,CH) ON(3001,1084,CH)

If you do not have PTF UQ90053 installed, ask your System Programmer to install it (it's free). It's been available for over a year now. Without UQ90053, the limit for a CH field is 80 bytes and you can have a maximum of 10 ON fields, so that will only get you up to a key of 800 bytes.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Frank Yaeger
Sort Forum Moderator
Sort Forum Moderator


Joined: 02 Dec 2002
Posts: 1618
Topics: 31
Location: San Jose

PostPosted: Tue Jun 29, 2004 10:44 am    Post subject: Reply with quote

Paneendra wrote
Quote:
The following RDW is not elminating the duplicate records.

//TOOLIN DD *
SELECT FROM(IN) TO(OUT) ON(1,4,BI) FIRST DISCARD(SORTXSUM)


The length is actually in the first 2 bytes of the RDW, although the second two bytes are usually zeros. ON(VLEN) is better to use than ON(1,4,BI). ON(VLEN) is equivalent to ON(1,2,BI).

However, ON(VLEN) is only useful when your input data set has RECFM=VB. In this case, your SELECT statement will write the first record with each length to OUT and the other records to SORTXSUM. I doubt that's what you want.
_________________
Frank Yaeger - DFSORT Development Team (IBM)
Specialties: JOINKEYS, FINDREP, WHEN=GROUP, ICETOOL, Symbols, Migration
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group