MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

Split HTML record using tags

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities
View previous topic :: View next topic  
Author Message
guhanath
Beginner


Joined: 31 Oct 2006
Posts: 12
Topics: 2

PostPosted: Thu Sep 08, 2016 8:16 am    Post subject: Split HTML record using tags Reply with quote

Hi,

I am having an input file which is in format of html tags.I would like to split each record from input file after by '</TD>' string. Below given the sample input and required output file details.

Input file:(FB, LRECL=300)

Code:

REC1:
<TD><B>AAAAAAA</B></TD><TD><B>BBBBBB</B></TD><TD><B>CCCCCCC</B></TD><TD><B>DDDDDDD</B></TD><TD><B>AAAAAAA</B></TD><TD><B>BBBBBB</B></TD><TD><B>CCCCCCC</B></TD><TD><B>DDDDDDD</B></TD><TD><B>AAAAAAA</B></TD><TD><B>BBBBBB</B></TD><TD><B>CCCCCCC</B></TD><TD><B>DDDDDDD</B></TD>

REC2:
<TD><B>PPP</B></TD><TD><B>QQQ</B></TD><TD><B>RRR</B></TD><TD><B>SSS</B></TD><TD><B>PPP</B></TD><TD><B>QQQ</B></TD><TD><B>RRR</B></TD><TD><B>SSS</B></TD><TD><B>PPP</B></TD><TD><B>QQQ</B></TD><TD><B>RRR</B></TD><TD><B>SSS</B></TD>



Require output file:(FB, LRECL=80)

Code:

<TD><B>AAAAAAA</B></TD>
<TD><B>BBBBBB</B></TD>
<TD><B>CCCCCCC</B></TD>
<TD><B>DDDDDDD</B></TD>
<TD><B>AAAAAAA</B></TD>
<TD><B>BBBBBB</B></TD>
<TD><B>CCCCCCC</B></TD>
<TD><B>DDDDDDD</B></TD>
<TD><B>AAAAAAA</B></TD>
<TD><B>BBBBBB</B></TD>
<TD><B>CCCCCCC</B></TD>
<TD><B>DDDDDDD</B></TD>
<TD><B>PPP</B></TD>
<TD><B>QQQ</B></TD>
<TD><B>RRR</B></TD>
<TD><B>SSS</B></TD>
<TD><B>PPP</B></TD>
<TD><B>QQQ</B></TD>
<TD><B>RRR</B></TD>
<TD><B>SSS</B></TD>
<TD><B>PPP</B></TD>
<TD><B>QQQ</B></TD>
<TD><B>RRR</B></TD>
<TD><B>SSS</B></TD>



Total output records are 24

The data length inbetween the tags (<TD><B> and </B></TD>) are varying and its not fixed. Just i would like to break records after '</TD>' string.

Looking for your help on this !!


Thanks,
Nath
Back to top
View user's profile Send private message
Magesh_J
Intermediate


Joined: 21 Jun 2014
Posts: 259
Topics: 54

PostPosted: Thu Sep 08, 2016 9:52 am    Post subject: Reply with quote

guhanath,

Will there be a situation, where you will have <TD> starts in line 1 and </TD> end in line ?

in other words, let us know if REC1 would always be in a single line or will it be in multiple lines ?
Back to top
View user's profile Send private message
guhanath
Beginner


Joined: 31 Oct 2006
Posts: 12
Topics: 2

PostPosted: Thu Sep 08, 2016 12:19 pm    Post subject: Reply with quote

Hi Magesh,

Thanks for looking into this.

Yes. The Rec1 and Rec2 are in a single line (Single Record) with LRECL=300.
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12367
Topics: 75
Location: San Jose

PostPosted: Thu Sep 08, 2016 1:06 pm    Post subject: Re: Split HTML record using tags Reply with quote

guhanath wrote:

Require output file:(FB, LRECL=80)

The data length inbetween the tags (<TD><B> and </B></TD>) are varying and its not fixed. Just i would like to break records after '</TD>' string.


Since your input file is 300 bytes and you can have a string for a max length of 291 bytes as <TD></TD> is 9 bytes. So your string can be 300-9=291 bytes.

Code:

<TD>string of 291 bytes</TD>  = 300 bytes


So how do you plan to have 291 bytes in an output file of 80 bytes?
_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
t-bonham@scc.net
Supermod


Joined: 18 Oct 2012
Posts: 30
Topics: 0
Location: Minneapolis, MN

PostPosted: Thu Sep 08, 2016 3:50 pm    Post subject: Reply with quote

I do this frequently with a single SPF Edit command:
Code:
Change "</TD>" "</TD>[cr][lf]"

(You will have to replace the [cd] and [lf] with the hex codes for carriage-return & line-feed in ASCII or EBCDIC, whichever format your file is in.)
Back to top
View user's profile Send private message Send e-mail AIM Address
guhanath
Beginner


Joined: 31 Oct 2006
Posts: 12
Topics: 2

PostPosted: Fri Sep 09, 2016 3:55 am    Post subject: Reply with quote

Hi Kolusu,

Sorry for confusion. My input file record length is 300 (FB) and each record contains the string '</TD>'. I want to break the record after '</TD>' string.

Suppose one record is having 10 '</TD>' strings, then that record have to split into 10 records based on the string value '</TD>'. The output record length is 80 or 100 or 300 also (FB). I hope in this case the output file length 80 is enough.

As shown in my first post, i just want to break the record after '</TD>' string in a record. The position of '</TD>' string in each record is varying.

Ex:
input:
<TD><B>AAAAAAA</B></TD><TD><B>BBBBBB</B></TD>
<TD><B>AAA</B></TD><TD><B>BB</B></TD>

Output:
<TD><B>AAAAAAA</B></TD>
<TD><B>BBBBBB</B></TD>
<TD><B>AAA</B></TD>
<TD><B>BB</B></TD>

-Nath
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12367
Topics: 75
Location: San Jose

PostPosted: Fri Sep 09, 2016 10:39 am    Post subject: Reply with quote

guhanath wrote:

Suppose one record is having 10 '</TD>' strings, then that record have to split into 10 records based on the string value '</TD>'. The output record length is 80 or 100 or 300 also (FB). I hope in this case the output file length 80 is enough.


Guhanath,

I understood the requirement in the first post itself. I just wanted to account for the max string possible. Right now every string of your sample data is less than 80 bytes , but is that true in your real production data?

Assuming you have a string of more than 80 bytes length like this
Code:

<TD><B>string of 120 bytes</B></TD>


And if you want only the first 80 you would get
Code:

<TD><B>string of 76 bytes


without an ending tag </td>

Either way if you just want 80 bytes then here is the JCL to get the desired results.

Code:

//STEP0100 EXEC PGM=ICETOOL                         
//TOOLMSG  DD SYSOUT=*                               
//DFSMSG   DD SYSOUT=*                               
//IN       DD DISP=SHR,DSN=Your Input FB 300 Byte LRECL file
//OUT      DD SYSOUT=*                               
//TOOLIN   DD *                                     
  RESIZE FROM(IN) TO(OUT) TOLEN(080) USING(CTL1)     
//CTL1CNTL DD *                                     
  OPTION COPY                                       
  INREC PARSE=(%01=(STARTAT=C'<TD>',ENDAT=C'</TD>', 
                     FIXLEN=080,REPEAT=30)),         
        BUILD=(%01,%02,%03,%04,%05,%06,             
               %07,%08,%09,%10,%11,%12,             
               %13,%14,%15,%16,%17,%18,             
               %19,%20,%21,%22,%23,%24,             
               %25,%26,%27,%28,%29,%30)             
                                                     
  OUTFIL OMIT=(1,1,CH,EQ,C' ')                       
//*

_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12367
Topics: 75
Location: San Jose

PostPosted: Fri Sep 09, 2016 10:46 am    Post subject: Reply with quote

t-bonham@scc.net wrote:
I do this frequently with a single SPF Edit command:
Code:
Change "</TD>" "</TD>[cr][lf]"

(You will have to replace the [cd] and [lf] with the hex codes for carriage-return & line-feed in ASCII or EBCDIC, whichever format your file is in.)


t-bonham,

Unless the input file is the ZFS file system, I do not see how a CHANGE Command would split the record into multiple records. After the change you need to FTP the file once again to your self to be able to break down the single record into multiple records.

Secondly since you are inserting 2 new delimiters, your CHANGE command will have trouble adding the new chars if there are NO trailing blanks to account for the additional CRLF bytes you are adding with CHANGE command.
_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
guhanath
Beginner


Joined: 31 Oct 2006
Posts: 12
Topics: 2

PostPosted: Sat Sep 10, 2016 7:34 am    Post subject: Reply with quote

Hi Kolusu,

Thanks for your help. I have tried with your code but got RC=12 with below error message.

Code:

SYT000I  SYNCTOOL RELEASE 1.7.0 - COPYRIGHT 2008  SYNCSORT INC.                 
SYT001I  INITIAL PROCESSING MODE IS "STOP"                                     
SYT002I  "TOOLIN" INTERFACE BEING USED                                         
                                                                               
          RESIZE FROM(IN) TO(OUT) TOLEN(080) USING(CTL1)                       
SYT048E  STATEMENT DOES NOT BEGIN WITH A VALID OPERATOR                         
SYT030I  OPERATION COMPLETED WITH RETURN CODE 12                               
                                                                               
SYT015I  PROCESSING MODE CHANGED FROM "STOP" TO "SCAN" DUE TO OPERATION FAILURE
                                                                               
SYT004I  SYNCTOOL PROCESSING COMPLETED WITH RETURN CODE 12                     



Please suggest me

-Nath
Back to top
View user's profile Send private message
Magesh_J
Intermediate


Joined: 21 Jun 2014
Posts: 259
Topics: 54

PostPosted: Sat Sep 10, 2016 11:56 am    Post subject: Reply with quote

guhanath,

1. You are having old version of syncsort 2008.
2. Syncsort documentation is not free for all, but since your site is using you should have it.
3. Search for RESIZE, REPEAT, PARSE in your sysncsort manual i.e 2008 manual.
4. if you find the PARSE command in the document then following code might work for you.
5. Note: this code works in dfsort but not sure about syncsort.

Code:

//STEP01  EXEC PGM=SORT                                             
//SORTIN  DD DSN UR INPUT                                                       
//SORTOUT DD DISP=(,CATLG,DELETE),                                   
//           SPACE=(CYL,(10,10),RLSE),                               
//           DSN=&&TEMP                                             
//SYSOUT  DD SYSOUT=*                                             
//SYSIN   DD *                                                     
  OPTION COPY                                                     
  OUTFIL PARSE=(%00=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),   
              %01=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %02=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %03=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %04=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %05=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %06=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %07=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %08=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %09=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %10=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %11=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %12=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %13=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %14=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %15=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %16=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
              %17=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),     
             %18=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %19=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %20=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %21=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %22=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %23=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %24=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %25=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %26=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %27=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %28=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %29=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>'),       
             %30=(STARTAT=C'<TD>',FIXLEN=80,ENDAT=C'</TD>')),     
      BUILD=(%00,/,%01,/,%02,/,%03,/,%04,/,%05,/,                 
             %06,/,%07,/,%08,/,%09,/,%10,/,%11,/,                 
            %12,/,%13,/,%14,/,%15,/,%16,/,%17,/,                   
            %18,/,%19,/,%20,/,%21,/,%22,/,%23,/,                   
            %24,/,%25,/,%26,/,%27,/,%28,/,%29,/,%30)               
//STEP02  EXEC PGM=SORT                 
//SORTIN  DD DSN=&&TEMP,DISP=SHR       
//SORTOUT DD SYSOUT=*                   
//SYSOUT  DD SYSOUT=*                   
//SYSIN   DD *                         
  OPTION COPY                           
  INCLUDE COND=(1,80,CH,NE,C' ')       


Thanks
Magesh
Back to top
View user's profile Send private message
kolusu
Site Admin
Site Admin


Joined: 26 Nov 2002
Posts: 12367
Topics: 75
Location: San Jose

PostPosted: Sat Sep 10, 2016 12:26 pm    Post subject: Reply with quote

guhanath wrote:
Hi Kolusu,

Thanks for your help. I have tried with your code but got RC=12 with below error message.


Your WER messages indicate that you are using Syncsort. DFSORT and Syncsort are competitive products. I'm a DFSORT developer. I'm happy to answer questions on DFSORT and DFSORT's ICETOOL, but I don't answer questions on Syncsort.
_________________
Kolusu - DFSORT Development Team (IBM)
DFSORT is on the Web at:
www.ibm.com/storage/dfsort

www.linkedin.com/in/kolusu
Back to top
View user's profile Send private message Send e-mail Visit poster's website
guhanath
Beginner


Joined: 31 Oct 2006
Posts: 12
Topics: 2

PostPosted: Mon Sep 12, 2016 12:16 am    Post subject: Reply with quote

Hi Magesh,

Thanks for coding on SORT. I will check this and post you if any thing either work or not.

@Kolusu,

Thanks. Yes, I can understand the business rules. Any how i will check it in SYNCSORT. If any DFSORT related Queries, defiantly i will post in this forum.

-Nath
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Utilities All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group