Joined: 26 Nov 2002 Posts: 12357 Topics: 75 Location: San Jose
Posted: Wed Mar 15, 2017 7:48 am Post subject:
oerdgie,
You can use the trick of joinkeys of matching the file to itself and treating the 2 columns as keys while padding the columns with a specific pseudo key which would not match.
Here is an Untested JCL which will give you the desired results. I assumed that your column1 and column2 have a max length of 10 bytes.
I tried it and it works great !
It's really tricky... not quite easy to understand, especially the thing with the pseudo key. But I will take time to understand...
Joined: 26 Nov 2002 Posts: 12357 Topics: 75 Location: San Jose
Posted: Wed Mar 15, 2017 11:12 am Post subject:
oerdgie wrote:
It's really tricky... not quite easy to understand, especially the thing with the pseudo key. But I will take time to understand...
Oeridge,
There is nothing tricky nor complicated that is difficult to understand. If you break down your problem logically the solution will look simple.
So lets just forget about joinkeys and see how we would solve this problem logically
Problem statement : I have 2 columns with "n" number of data items and I want to count the unique data items in those 2 columns.
Solving the problem
Step1 : Gee I wish I had the 2 columns data in a single column and then it would easy to find the counts of the data values.
Find ways to get the data into a single column
Step2 : Lets assume that we got the 2 columns data into a single column and now I can easily count the unique data items.
hmmm I got the data into a single column and I now sorted the data as a single column but then I lost the ability to identify which column the data item belongs to. I wish I had the ability to uniquely identify the column values.
Solution : why don't you tag column value with a unique identifier.
Cool I can simply add the column-name to the data items and sort on the full key and then count the unique items.
Wait the long column name makes the sorting key length more and I would need more resources to sort them.
Solution : Just tag the column values with 1 byte indicator (A or B)
Wow now it is easy. I can sort on the tag+column-data-value as a key and eliminate the duplicates count the unique data items.
See how simple the solution is?
Now translate that to joinkeys
Look at JNF1CNTL. It is building the tag + column-1-data
Code:
//JNF1CNTL DD *
INREC BUILD=(C'A', $ Tag
01,10) $ colum-1 data
SUM FIELDS=NONE $ Eliminate Dups
//*
Similarly look at JNF2CNTL. It is building the tag + column-2-data
Code:
//JNF2CNTL DD *
INREC BUILD=(C'B', $ Tag
15,10) $ colum-2 data
SUM FIELDS=NONE $ Eliminate Dups
//*
Now we match these and none of them match, so you wiil have all tag "A" records first and then followed by tag "B" records.
The REFORMAT statement is building Just the tags.
Code:
REFORMAT FIELDS=(F1:1,1,F2:1,1)
Now all you need to do is the count the number of "A" and "B"
So replace the value of A and B with 1 and total them up.
It is simple as that. _________________ Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum