MVSFORUMS.com

misi01

We are testing various programs where the files come from production. Obviously (?), in test, we don't want to test using hundreds of thousands of records, so we create a sample extract of those records using SORT and the SAMPLE keyword. This works fine, but .....

is there a simple way of ensuring that the output file contains samples of all the "key" values found in the input file. What I mean by this is the following. Assuming your key values in the file are in position 5 and contain FRED, BERT and JOHN. Now, let's assume that records 1-1000 all contain FRED, 1001-1002 contain BERT, and the rest contain JOHN. If I run SORT using a simple sample of every 300 records, I'll get examples of FRED & JOHN, but none of BERT.

What I would like to do would be to run sort and either specify the values in my keys (FRED, BERT, JOHN here) or, even better, let SORT figure them out, and get SORT to produce an output file of max 20 records for each unique key. Is this doable (and does my append make sense)
_________________
Michael

misi01 · Posted: Tue Jul 08, 2014 2:24 am Post subject:

Googling at the same time. I should mention that if the answer is to USE ICETOOL, that's not a problem
_________________
Michael

misi01 · Posted: Tue Jul 08, 2014 2:57 am Post subject: Found the answer to that as well

Wow !!!! ICETOOL is slick. I found the original answer.
but I'll include my solution as well

misi01 · Posted: Tue Jul 08, 2014 3:33 am Post subject: A follow-on question to the above

It all works fine but for the small detail that the input file has a header date record. Is there any way of using the HEADER option so that the first record remains in place (as it is now, it gets sorted/selected as the last record in the output file as positions 11,5 contains a date in the form 2014-).

I have tried the following, but it didn't seem to work

misi01 · Posted: Tue Jul 08, 2014 4:06 am Post subject: Got it

The following code will do this

William Collins · Supermod Joined: 03 Jun 2012 Posts: 437 Topics: 0

If you need to SORT and preserve positions of headers/trailers, ICETOOL has DATASORT.

If you don't need to SORT, did you try turning it off for the SELECT? You could relocate your ON field to an extension of the record in INREC, clobber it to some low value for the header, and remove it later in OUTFIL.

A sequence number from WHEN=GROUP might also do what you want.

If you SORT Descending to keep the header at the front, haven't you ruined your sampling, by no longer representing your file (data not in the same order)?

kolusu · Posted: Tue Jul 08, 2014 10:25 am Post subject:

misi01,

If your input is already SORTED on the Key (11,5,CH) a simple COPY operation with WHEN=GROUP will give you the desired results.
_________________
Kolusu
www.linkedin.com/in/kolusu

misi01 · Posted: Wed Jul 09, 2014 1:30 am Post subject: In answer to your suggestions

William. I tried/looked at the DATASORT and have used it elsewhere. Trouble was, I couldn't figure out how to "extract" groups of similar records (as ON(11,5,CH) FIRST(10) does with select) using DATASORT.

Kolusu. The data isn't sorted, that's why the header record ends up as the last record using the SELECT statement.

Both of you. WHEN=GROUP. I've googled it but didn't really understand how it would apply. I also tried variations on the following

William Collins · Supermod Joined: 03 Jun 2012 Posts: 437 Topics: 0

Forgetting any header and trailer records, is the data a;ready in the sequence that you want before you extract it? Are data for the same "key" contiguous? Forget any header/trailer records for the moment. Is the data in the sequence you want already?

If you get error messages, you look them up.

If you want to look at something new, you read the manual and search here for some working examples.

WHEN=GROUP is told when a record is the start of a "group" (using BEGIN or KEYBEGIN) and can optionally be told ho to identify the end of a group (END or RECORDS). RECORDS is almost certainly not doing what you think.

With WHEN=GROUP, to achieve anything, you need to use PUSH. This puts data onto your records, the same data for each record in the group. The source of the data must be the first record of the group, or the special data SEQ= or ID=. SEQ= gets you a sequence number within the group. ID= gets you a sequence number of the groups.

You want to have 10 records, you PUSH a SEQ and on OUTFIL use INCLUDE=/OMIT= for less-than-or-equal-to-10 (or however you want to phrase it).

misi01 · Posted: Wed Jul 09, 2014 2:59 am Post subject: Thanks William

Forgetting any header and trailer records, is the data a;ready in the sequence that you want before you extract it? Are data for the same "key" contiguous? Forget any header/trailer records for the moment. Is the data in the sequence you want already? Yes

If you get error messages, you look them up. Hmmm, not quite THAT easy. Here is the error message from one job.

William Collins · Supermod Joined: 03 Jun 2012 Posts: 437 Topics: 0

Did you look up the message? There's a manual, there's IBM LookAt. You may find that knowing what the 6 means is useful. The message is located after INREC or OUTREC, so that bit is not so hard. DUPLICATE you can eyeball. Just leaves CONFLICTING and MISSING. MISSING is easier, so take the element you have coded, and look in the manual for the syntax and you should find the MISSING bit.

If the data is in order, GROUP is going to allow you to say (on OUTFIL) "give me the first 10 records of this key and not the other 9381, without having to sort the data. I thought that is what you were after.

misi01 · Posted: Wed Jul 09, 2014 6:16 am Post subject:

Sorry William. What I am after is reducing the file so that each UNIQUE identifier (in pos 11-15) is written to the output file a max of 10 times. Ie, I have 100 KEY1, 150 KEY2 and 3 KEY3. The resulting output file would have 10xKEY1, 10xKEY2 and 3xKEY3
_________________
Michael

William Collins · Supermod Joined: 03 Jun 2012 Posts: 437 Topics: 0

And that is what you'd get if you PUSH a SEQ and use INCLUDE= on OUTFIL for the SEQ-field being less-than-or-equal-to-10. You just need to drop the SEQ-field afterwards with BUILD on OUTFIL.

Here's a an example you could apply, at a very quick search: http://www.mvsforums.com/helpboards/viewtopic.php?p=57821#57821 (I used GROUP SEQ INCLUDE and selected to search for all words, and user kolusu).

kolusu · Posted: Wed Jul 09, 2014 10:56 am Post subject:

Misi,

As I mentioned earlier it is quite easy to pick the first "n" records using WHEN=GROUP. If your input is already sorted and you want to pick first "5" records, then you simply need to use SEQ on WHEN=GROUP . I assumed your Input has RECFM=FB and LRECL=30. If you want to understand how the SEQ and ID of WHEN=GROUP work, run the jobs removing the OUTFIL statements and see the contents at position 31 and 40.

ex:

misi01 · Posted: Mon Jul 14, 2014 4:03 am Post subject: Thanks Kolusu

Tried your suggestion (the first bit) and removed the OUTFIL statement. I understand now (I think) how it works.

That ISN'T to say I'll be able to put together the next variation on this theme without a load of experimenation.

Thanks again
_________________
Michael