Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polarion XML fails to generate due to xmlSAX2Characters: huge text node #3363

Closed
KwisatzHaderach opened this issue Nov 15, 2024 · 2 comments · Fixed by #3365
Closed

Polarion XML fails to generate due to xmlSAX2Characters: huge text node #3363

KwisatzHaderach opened this issue Nov 15, 2024 · 2 comments · Fixed by #3365
Assignees
Labels
bug Something isn't working plugin | polarion Plugins implementing the Polarion integration step | report Stuff related to the report step
Milestone

Comments

@KwisatzHaderach
Copy link
Collaborator

I have a somewhat large plan, taking some time and writing a lot of information into the logs. The plan finishes fine with an all pass, but then xml generation fails on
``` The generated XML output is not a valid XML file. Use --verbose argument to show the output.

The exception was caused by 1 earlier exceptions

Cause number 1:

    xmlSAX2Characters: huge text node, line 172178, column 10 (<string>, line 172178)
This is using Polarion report plugin, but will very likely end the same way for junit.
@KwisatzHaderach KwisatzHaderach added bug Something isn't working plugin | polarion Plugins implementing the Polarion integration labels Nov 15, 2024
@KwisatzHaderach
Copy link
Collaborator Author

@seberm I guess you made the changes for jinja, can you please have a look?

@seberm seberm self-assigned this Nov 15, 2024
@seberm
Copy link
Collaborator

seberm commented Nov 15, 2024

Hello @KwisatzHaderach ,
the tmt polarion report plugin uses the junit report code internally which uses the Jinja2 and LXML to generate the final JUnit/XUnit XML file. This means the junit plugin is also affected.

This problem seems to be related to lxml.etree.XMLParser which appears to have a limit on the size of text nodes it can handle.

I've quickly looked at XMLParser options and there is an option huge_tree which could hopefully help:

huge_tree - disable security restrictions and support very deep trees and very long text content (only affects libxml2 2.7+)

I've tried to reproduce the problem locally (for now without tmt) and the huge_tree option is effective:

Create a large text node

#!/usr/bin/env python3

import xml.etree.ElementTree as ET

large_text = 'a' * (10 * 1024 * 1024 + 1)  # 10MB + 1 byte
root = ET.Element('root')
root.text = large_text
tree = ET.ElementTree(root)
tree.write('large_xml_file.xml')

Try to parse the file with huge_tree=False:

$ cat parse-xml.py
#!/usr/bin/env python3

import lxml.etree as ET

parser = ET.XMLParser(huge_tree=False)
try:
    tree = ET.parse('large_xml_file.xml', parser)
except ET.ParserError as e:
    print(e)


$ ./parse-xml.py
Traceback (most recent call last):
  File "/home/user/Repos/tmt/./parse-xml.py", line 9, in <module>
    tree = ET.parse('large_xml_file.xml', parser)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "src/lxml/etree.pyx", line 3541, in lxml.etree.parse
  File "src/lxml/parser.pxi", line 1879, in lxml.etree._parseDocument
  File "src/lxml/parser.pxi", line 1905, in lxml.etree._parseDocumentFromURL
  File "src/lxml/parser.pxi", line 1808, in lxml.etree._parseDocFromFile
  File "src/lxml/parser.pxi", line 1180, in lxml.etree._BaseParser._parseDocFromFile
  File "src/lxml/parser.pxi", line 618, in lxml.etree._ParserContext._handleParseResultDoc
  File "src/lxml/parser.pxi", line 728, in lxml.etree._handleParseResult
  File "src/lxml/parser.pxi", line 657, in lxml.etree._raiseParseError
  File "large_xml_file.xml", line 1
lxml.etree.XMLSyntaxError: xmlSAX2Characters: huge text node, line 1, column 10004001

By setting the huge_tree option to True, the parsing works without the huge text node exception.

@seberm seberm linked a pull request Nov 18, 2024 that will close this issue
11 tasks
@seberm seberm added the step | report Stuff related to the report step label Nov 26, 2024
@seberm seberm added this to the 1.40 milestone Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working plugin | polarion Plugins implementing the Polarion integration step | report Stuff related to the report step
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants