MVSFORUMS.com

karavi2000 · Posted: Fri Aug 10, 2007 5:00 pm Post subject: XML PARSE issue

Hi,
We receive vendor data in the form of XML files and we were planning to put an application using Cobol's XML PARSE to parse it and update in the database.

Now, I see there is a restriction in XML PARSE that it cannot parse '&' and '<' symbols. But we can have these symbols in our files and we don't have control over that and I don't want to replace these symbols and parse it and then change it back to its original value. Is there any other way around. (Also I cannot include CDATA tag as this is the file from vendor). Please let us know.

Thanks & Regards,
Ravishankar

CICS Guy · Intermediate Joined: 30 Apr 2007 Posts: 292 Topics: 3

Could you point me to where is says you can't parse '&' and '<'?
I haven't run into that restriction, and my current production data too, could have that content.

semigeezer · Posted: Sat Aug 11, 2007 3:34 pm Post subject:

< and & aren't, as far as I know(*), valid in well formed XML. To represent these, you need to use the XML entities <, &, etc. If you are receiving invalid XML from a third party and have to use a compliant XML parser, you will need to massage (preprocess) the data yourself. It is easy for things like '&' but you will need to insure that < does not start a valid tag. This may or may not be hard if you know the valid tags ahead of time. You might want to do this using a language that has better parsing capabilities than COBOL (**).

* I don't really know XML well, but have used the occasional Java parser for it.
** I also don't know COBOL well -- just enough be dangerous Smile

-- but from what I've seen and written in COBOL, almost any language is better at string parsing than COBOL. Guess I'm spoiled by Rexx, PL/I & Java

dbzTHEdinosauer · Posted: Sun Aug 12, 2007 4:15 am Post subject:

actually this is 4.0xhtml compliance requirements. in all farness to the ibm xmlparser, it is written in c and is close to state of the art, and complies with the JAN2007 xpath/xquery/xlst agreement - which is industry best practice. and will continue to comply - which means that you have to have your systems people stay up-to-date with apars and releases. (same problem as dfsort.)

karavi2000,

the problem of second-rate/garbage xml generation was a problem with the early edi back in the 80's. as semigeezer said, you have to pre-process these files, which means you have to recognize them.

Past experience: each originator had his own version of bastardization. Back in the edi days (before the web), the IP's (for a price) would normalize the files. Now-a-days, that is pretty expensive.

Tis is going to occur for more than one of your vendors - the old if you want my business, eat my ****. Don't expect too much support from marketing/management, just make sure they realize the development costs and take that into consideration with pricing.

Some xml validators will actually tell you what is invalid and where so that you can find it and correct the problem - but these are not free- they love to sell corporate licenses.

If you write it yourself, design it well. we had over 1000 edi customers and it required a rather large 'normalization' and denormalization sub-system in addition to the edi translater (xml parser in your case). yeah, plan for the fact that the vendors that send you garbage, expect garbage and may not ever be compliant with current standards.
_________________
Dick Brenholtz
American living in Varel, Germany

Earl · Beginner Joined: 09 Jun 2007 Posts: 26 Topics: 1

cobol XML parser works well in my experience.

karavi2000 · Posted: Mon Aug 13, 2007 1:24 pm Post subject:

CICS Guy and Earl, please follow the below link
http://www.w3schools.com/xml/xml_cdata.asp

dbzTHEdinosauer and semigeezer,
Thanks a lot for the response. I believe I still need to use the old formula (INSPECT) to get things done from a relatively new funtion (XML PARSE).. Crying or Very sad

CICS Guy · Intermediate Joined: 30 Apr 2007 Posts: 292 Topics: 3

Thanks Dick, I guess I never got handed "illegal" characters....
I did get all the CDATA junk and stripped it out prior to running the data through COBOL's XML parser....

dbzTHEdinosauer · Posted: Mon Aug 13, 2007 4:18 pm Post subject:

karavi2000,

rkinfy · Beginner Joined: 12 Nov 2010 Posts: 11 Topics: 3

Hi All,

Here's what I'm getting while parsing the below message.

<Name>#Accountant</Name><Description>Some description</Description><EmployerName>XYZ <Financial Services </EmployerName><Department>Finance Department</Department><Industry>%Insurance</Industry><EndReason>Promoted</EndReason><RecordedDate>2003-09-20</RecordedDate>

********************************* TOP OF DATA **********************************
#Accountant
Some description
XYZ
Financial Services
Finance Department
%Insurance
Promoted
2003-09-20
******************************** BOTTOM OF DATA ********************************

The xml is having valid data (<), but the PARSE command is not giving a proper output. I was expecting

"XYZ <Financial Services"

but got

XYZ
Financial Services
(in two different lines)

Can someone tell me how to handle this?
_________________
RK.

Nic Clouston · Posted: Wed Dec 08, 2010 7:18 am Post subject:

Which parser are you using?
_________________
Utility and Program control cards are NOT, repeat NOT, JCL.

dbzTHEdinosauer · Posted: Wed Dec 08, 2010 7:56 am Post subject:

according to the COBOL manual, predefined entities must follow the standards.
Here is a link to the subject of predefined entities in the standards:
http://www.w3.org/TR/REC-xml/#sec-predefined-ent

you may want to insert the entity declarations in your document before processing.
_________________
Dick Brenholtz
American living in Varel, Germany

RonB · Posted: Wed Dec 08, 2010 8:26 am Post subject:

I believe that you should have gotten FIVE returns for the <EmployerName> Element:

rkinfy · Beginner Joined: 12 Nov 2010 Posts: 11 Topics: 3

Hi All,

Thanks a lot for the responses.

RonB, as you mentioned, I was using only CONTENT-CHARACTERS not CONTENT-CHARACTER. That was the issue. Now I'm able to get the correct output.

Thanks again!!
_________________
RK.