MVSFORUMS.com Forum Index MVSFORUMS.com
A Community of and for MVS Professionals
 
 FAQFAQ   SearchSearch   Quick Manuals   RegisterRegister 
 ProfileProfile   Log in to check your private messagesLog in to check your private messages   Log inLog in 

XML PARSE issue

 
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Application Programming
View previous topic :: View next topic  
Author Message
karavi2000
Beginner


Joined: 17 Aug 2003
Posts: 51
Topics: 26
Location: Chennai

PostPosted: Fri Aug 10, 2007 5:00 pm    Post subject: XML PARSE issue Reply with quote

Hi,
We receive vendor data in the form of XML files and we were planning to put an application using Cobol's XML PARSE to parse it and update in the database.

Now, I see there is a restriction in XML PARSE that it cannot parse '&' and '<' symbols. But we can have these symbols in our files and we don't have control over that and I don't want to replace these symbols and parse it and then change it back to its original value. Is there any other way around. (Also I cannot include CDATA tag as this is the file from vendor). Please let us know.

Thanks & Regards,
Ravishankar
Back to top
View user's profile Send private message
CICS Guy
Intermediate


Joined: 30 Apr 2007
Posts: 292
Topics: 3

PostPosted: Fri Aug 10, 2007 8:28 pm    Post subject: Reply with quote

Could you point me to where is says you can't parse '&' and '<'?
I haven't run into that restriction, and my current production data too, could have that content.
Back to top
View user's profile Send private message
semigeezer
Supermod


Joined: 03 Jan 2003
Posts: 1014
Topics: 13
Location: Atlantis

PostPosted: Sat Aug 11, 2007 3:34 pm    Post subject: Reply with quote

< and & aren't, as far as I know(*), valid in well formed XML. To represent these, you need to use the XML entities &lt;, &amp;, etc. If you are receiving invalid XML from a third party and have to use a compliant XML parser, you will need to massage (preprocess) the data yourself. It is easy for things like '&' but you will need to insure that < does not start a valid tag. This may or may not be hard if you know the valid tags ahead of time. You might want to do this using a language that has better parsing capabilities than COBOL (**).

* I don't really know XML well, but have used the occasional Java parser for it.
** I also don't know COBOL well -- just enough be dangerous Smile -- but from what I've seen and written in COBOL, almost any language is better at string parsing than COBOL. Guess I'm spoiled by Rexx, PL/I & Java
Back to top
View user's profile Send private message Visit poster's website
dbzTHEdinosauer
Supermod


Joined: 20 Oct 2006
Posts: 1411
Topics: 26
Location: germany

PostPosted: Sun Aug 12, 2007 4:15 am    Post subject: Reply with quote

actually this is 4.0xhtml compliance requirements. in all farness to the ibm xmlparser, it is written in c and is close to state of the art, and complies with the JAN2007 xpath/xquery/xlst agreement - which is industry best practice. and will continue to comply - which means that you have to have your systems people stay up-to-date with apars and releases. (same problem as dfsort.)

karavi2000,

the problem of second-rate/garbage xml generation was a problem with the early edi back in the 80's. as semigeezer said, you have to pre-process these files, which means you have to recognize them.

Past experience: each originator had his own version of bastardization. Back in the edi days (before the web), the IP's (for a price) would normalize the files. Now-a-days, that is pretty expensive.

Tis is going to occur for more than one of your vendors - the old if you want my business, eat my ****. Don't expect too much support from marketing/management, just make sure they realize the development costs and take that into consideration with pricing.

Some xml validators will actually tell you what is invalid and where so that you can find it and correct the problem - but these are not free- they love to sell corporate licenses.

If you write it yourself, design it well. we had over 1000 edi customers and it required a rather large 'normalization' and denormalization sub-system in addition to the edi translater (xml parser in your case). yeah, plan for the fact that the vendors that send you garbage, expect garbage and may not ever be compliant with current standards.
_________________
Dick Brenholtz
American living in Varel, Germany
Back to top
View user's profile Send private message
Earl
Beginner


Joined: 09 Jun 2007
Posts: 26
Topics: 1

PostPosted: Sun Aug 12, 2007 10:15 am    Post subject: Reply with quote

cobol XML parser works well in my experience.
Back to top
View user's profile Send private message
karavi2000
Beginner


Joined: 17 Aug 2003
Posts: 51
Topics: 26
Location: Chennai

PostPosted: Mon Aug 13, 2007 1:24 pm    Post subject: Reply with quote

CICS Guy and Earl, please follow the below link
http://www.w3schools.com/xml/xml_cdata.asp

dbzTHEdinosauer and semigeezer,
Thanks a lot for the response. I believe I still need to use the old formula (INSPECT) to get things done from a relatively new funtion (XML PARSE).. Crying or Very sad
Back to top
View user's profile Send private message
CICS Guy
Intermediate


Joined: 30 Apr 2007
Posts: 292
Topics: 3

PostPosted: Mon Aug 13, 2007 2:05 pm    Post subject: Reply with quote

Thanks Dick, I guess I never got handed "illegal" characters....
I did get all the CDATA junk and stripped it out prior to running the data through COBOL's XML parser....
Back to top
View user's profile Send private message
dbzTHEdinosauer
Supermod


Joined: 20 Oct 2006
Posts: 1411
Topics: 26
Location: germany

PostPosted: Mon Aug 13, 2007 4:18 pm    Post subject: Reply with quote

karavi2000,

Quote:
I believe I still need to use the old formula (INSPECT) to get things done

yeah, that's what we told you. that will get you thru, but at the expense of INSPECTing every xml document. If you have really high volume, could be too expensive.

Quote:
to get things done from a relatively new funtion (XML PARSE)


don't blame the xmlparser. it's your crappy input data. As I said, if you inspect everyline of xml you get, you will not be happy with the performance. you will eventually need a small sub-system to pre-process the xml before you get into the COBOL subsystem with the CALL to the xmlparse function. You will need to identify vendors and cater to their specific needs. Called the 'price of doing business'.
_________________
Dick Brenholtz
American living in Varel, Germany
Back to top
View user's profile Send private message
rkinfy
Beginner


Joined: 12 Nov 2010
Posts: 11
Topics: 3

PostPosted: Wed Dec 08, 2010 6:39 am    Post subject: Reply with quote

Hi All,

Here's what I'm getting while parsing the below message.

<Name>#Accountant</Name><Description>Some description</Description><EmployerName>XYZ &lt;Financial Services </EmployerName><Department>Finance Department</Department><Industry>%Insurance</Industry><EndReason>Promoted</EndReason><RecordedDate>2003-09-20</RecordedDate>


********************************* TOP OF DATA **********************************
#Accountant
Some description
XYZ
Financial Services
Finance Department
%Insurance
Promoted
2003-09-20
******************************** BOTTOM OF DATA ********************************

The xml is having valid data (&lt;), but the PARSE command is not giving a proper output. I was expecting

"XYZ <Financial Services"

but got

XYZ
Financial Services
(in two different lines)

Can someone tell me how to handle this?
_________________
RK.
Back to top
View user's profile Send private message
Nic Clouston
Advanced


Joined: 01 Feb 2007
Posts: 1075
Topics: 7
Location: At Home

PostPosted: Wed Dec 08, 2010 7:18 am    Post subject: Reply with quote

Which parser are you using?
_________________
Utility and Program control cards are NOT, repeat NOT, JCL.
Back to top
View user's profile Send private message
dbzTHEdinosauer
Supermod


Joined: 20 Oct 2006
Posts: 1411
Topics: 26
Location: germany

PostPosted: Wed Dec 08, 2010 7:56 am    Post subject: Reply with quote

according to the COBOL manual, predefined entities must follow the standards.
Here is a link to the subject of predefined entities in the standards:
http://www.w3.org/TR/REC-xml/#sec-predefined-ent

you may want to insert the entity declarations in your document before processing.
_________________
Dick Brenholtz
American living in Varel, Germany
Back to top
View user's profile Send private message
RonB
Beginner


Joined: 02 Dec 2002
Posts: 93
Topics: 0
Location: Orlando, FL

PostPosted: Wed Dec 08, 2010 8:26 am    Post subject: Reply with quote

I believe that you should have gotten FIVE returns for the <EmployerName> Element:
Code:
XML-EVENT          XML-TEXT

START-OF-ELEMENT   EmployerName
CONTENT-CHARACTERS XYZ
CONTENT-CHARACTER  <
CONTENT-CHARACTERS Financial Services
END-OF-ELEMENT     EmployerName


Does your Parse Routine recognize, and deal with the XML-EVENT of CONTENT-CHARACTER (with no trailing S) as well as the XML-EVENT of CONTENT-CHARACTERS (with a trailing S)? If not, then that could be the problem.
_________________
A computer once beat me at chess, but it was no match for me at kick boxing.
Back to top
View user's profile Send private message
rkinfy
Beginner


Joined: 12 Nov 2010
Posts: 11
Topics: 3

PostPosted: Thu Dec 09, 2010 9:00 am    Post subject: Reply with quote

Hi All,

Thanks a lot for the responses.

RonB, as you mentioned, I was using only CONTENT-CHARACTERS not CONTENT-CHARACTER. That was the issue. Now I'm able to get the correct output.

Thanks again!!
_________________
RK.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic   printer-friendly view    MVSFORUMS.com Forum Index -> Application Programming All times are GMT - 5 Hours
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


MVSFORUMS
Powered by phpBB © 2001, 2005 phpBB Group