View previous topic :: View next topic |
Author |
Message |
karavi2000 Beginner
Joined: 17 Aug 2003 Posts: 51 Topics: 26 Location: Chennai
|
Posted: Fri Aug 10, 2007 5:00 pm Post subject: XML PARSE issue |
|
|
Hi,
We receive vendor data in the form of XML files and we were planning to put an application using Cobol's XML PARSE to parse it and update in the database.
Now, I see there is a restriction in XML PARSE that it cannot parse '&' and '<' symbols. But we can have these symbols in our files and we don't have control over that and I don't want to replace these symbols and parse it and then change it back to its original value. Is there any other way around. (Also I cannot include CDATA tag as this is the file from vendor). Please let us know.
Thanks & Regards,
Ravishankar |
|
Back to top |
|
 |
CICS Guy Intermediate
Joined: 30 Apr 2007 Posts: 292 Topics: 3
|
Posted: Fri Aug 10, 2007 8:28 pm Post subject: |
|
|
Could you point me to where is says you can't parse '&' and '<'?
I haven't run into that restriction, and my current production data too, could have that content. |
|
Back to top |
|
 |
semigeezer Supermod
Joined: 03 Jan 2003 Posts: 1014 Topics: 13 Location: Atlantis
|
Posted: Sat Aug 11, 2007 3:34 pm Post subject: |
|
|
< and & aren't, as far as I know(*), valid in well formed XML. To represent these, you need to use the XML entities <, &, etc. If you are receiving invalid XML from a third party and have to use a compliant XML parser, you will need to massage (preprocess) the data yourself. It is easy for things like '&' but you will need to insure that < does not start a valid tag. This may or may not be hard if you know the valid tags ahead of time. You might want to do this using a language that has better parsing capabilities than COBOL (**).
* I don't really know XML well, but have used the occasional Java parser for it.
** I also don't know COBOL well -- just enough be dangerous -- but from what I've seen and written in COBOL, almost any language is better at string parsing than COBOL. Guess I'm spoiled by Rexx, PL/I & Java |
|
Back to top |
|
 |
dbzTHEdinosauer Supermod
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
|
Posted: Sun Aug 12, 2007 4:15 am Post subject: |
|
|
actually this is 4.0xhtml compliance requirements. in all farness to the ibm xmlparser, it is written in c and is close to state of the art, and complies with the JAN2007 xpath/xquery/xlst agreement - which is industry best practice. and will continue to comply - which means that you have to have your systems people stay up-to-date with apars and releases. (same problem as dfsort.)
karavi2000,
the problem of second-rate/garbage xml generation was a problem with the early edi back in the 80's. as semigeezer said, you have to pre-process these files, which means you have to recognize them.
Past experience: each originator had his own version of bastardization. Back in the edi days (before the web), the IP's (for a price) would normalize the files. Now-a-days, that is pretty expensive.
Tis is going to occur for more than one of your vendors - the old if you want my business, eat my ****. Don't expect too much support from marketing/management, just make sure they realize the development costs and take that into consideration with pricing.
Some xml validators will actually tell you what is invalid and where so that you can find it and correct the problem - but these are not free- they love to sell corporate licenses.
If you write it yourself, design it well. we had over 1000 edi customers and it required a rather large 'normalization' and denormalization sub-system in addition to the edi translater (xml parser in your case). yeah, plan for the fact that the vendors that send you garbage, expect garbage and may not ever be compliant with current standards. _________________ Dick Brenholtz
American living in Varel, Germany |
|
Back to top |
|
 |
Earl Beginner
Joined: 09 Jun 2007 Posts: 26 Topics: 1
|
Posted: Sun Aug 12, 2007 10:15 am Post subject: |
|
|
cobol XML parser works well in my experience. |
|
Back to top |
|
 |
karavi2000 Beginner
Joined: 17 Aug 2003 Posts: 51 Topics: 26 Location: Chennai
|
Posted: Mon Aug 13, 2007 1:24 pm Post subject: |
|
|
CICS Guy and Earl, please follow the below link
http://www.w3schools.com/xml/xml_cdata.asp
dbzTHEdinosauer and semigeezer,
Thanks a lot for the response. I believe I still need to use the old formula (INSPECT) to get things done from a relatively new funtion (XML PARSE)..  |
|
Back to top |
|
 |
CICS Guy Intermediate
Joined: 30 Apr 2007 Posts: 292 Topics: 3
|
Posted: Mon Aug 13, 2007 2:05 pm Post subject: |
|
|
Thanks Dick, I guess I never got handed "illegal" characters....
I did get all the CDATA junk and stripped it out prior to running the data through COBOL's XML parser.... |
|
Back to top |
|
 |
dbzTHEdinosauer Supermod
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
|
Posted: Mon Aug 13, 2007 4:18 pm Post subject: |
|
|
karavi2000,
Quote: | I believe I still need to use the old formula (INSPECT) to get things done |
yeah, that's what we told you. that will get you thru, but at the expense of INSPECTing every xml document. If you have really high volume, could be too expensive.
Quote: | to get things done from a relatively new funtion (XML PARSE) |
don't blame the xmlparser. it's your crappy input data. As I said, if you inspect everyline of xml you get, you will not be happy with the performance. you will eventually need a small sub-system to pre-process the xml before you get into the COBOL subsystem with the CALL to the xmlparse function. You will need to identify vendors and cater to their specific needs. Called the 'price of doing business'. _________________ Dick Brenholtz
American living in Varel, Germany |
|
Back to top |
|
 |
rkinfy Beginner
Joined: 12 Nov 2010 Posts: 11 Topics: 3
|
Posted: Wed Dec 08, 2010 6:39 am Post subject: |
|
|
Hi All,
Here's what I'm getting while parsing the below message.
<Name>#Accountant</Name><Description>Some description</Description><EmployerName>XYZ <Financial Services </EmployerName><Department>Finance Department</Department><Industry>%Insurance</Industry><EndReason>Promoted</EndReason><RecordedDate>2003-09-20</RecordedDate>
********************************* TOP OF DATA **********************************
#Accountant
Some description
XYZ
Financial Services
Finance Department
%Insurance
Promoted
2003-09-20
******************************** BOTTOM OF DATA ********************************
The xml is having valid data (<), but the PARSE command is not giving a proper output. I was expecting
"XYZ <Financial Services"
but got
XYZ
Financial Services
(in two different lines)
Can someone tell me how to handle this? _________________ RK. |
|
Back to top |
|
 |
Nic Clouston Advanced
Joined: 01 Feb 2007 Posts: 1075 Topics: 7 Location: At Home
|
Posted: Wed Dec 08, 2010 7:18 am Post subject: |
|
|
Which parser are you using? _________________ Utility and Program control cards are NOT, repeat NOT, JCL. |
|
Back to top |
|
 |
dbzTHEdinosauer Supermod
Joined: 20 Oct 2006 Posts: 1411 Topics: 26 Location: germany
|
Posted: Wed Dec 08, 2010 7:56 am Post subject: |
|
|
according to the COBOL manual, predefined entities must follow the standards.
Here is a link to the subject of predefined entities in the standards:
http://www.w3.org/TR/REC-xml/#sec-predefined-ent
you may want to insert the entity declarations in your document before processing. _________________ Dick Brenholtz
American living in Varel, Germany |
|
Back to top |
|
 |
RonB Beginner
Joined: 02 Dec 2002 Posts: 93 Topics: 0 Location: Orlando, FL
|
Posted: Wed Dec 08, 2010 8:26 am Post subject: |
|
|
I believe that you should have gotten FIVE returns for the <EmployerName> Element:
Code: | XML-EVENT XML-TEXT
START-OF-ELEMENT EmployerName
CONTENT-CHARACTERS XYZ
CONTENT-CHARACTER <
CONTENT-CHARACTERS Financial Services
END-OF-ELEMENT EmployerName |
Does your Parse Routine recognize, and deal with the XML-EVENT of CONTENT-CHARACTER (with no trailing S) as well as the XML-EVENT of CONTENT-CHARACTERS (with a trailing S)? If not, then that could be the problem. _________________ A computer once beat me at chess, but it was no match for me at kick boxing. |
|
Back to top |
|
 |
rkinfy Beginner
Joined: 12 Nov 2010 Posts: 11 Topics: 3
|
Posted: Thu Dec 09, 2010 9:00 am Post subject: |
|
|
Hi All,
Thanks a lot for the responses.
RonB, as you mentioned, I was using only CONTENT-CHARACTERS not CONTENT-CHARACTER. That was the issue. Now I'm able to get the correct output.
Thanks again!! _________________ RK. |
|
Back to top |
|
 |
|
|