Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Quadratic growth by duplicating attributes in formatting elements using <p> #10772

Open
JorianWoltjer opened this issue Nov 17, 2024 · 3 comments

Comments

@JorianWoltjer
Copy link

What is the issue with the HTML Standard?

Formatting elements (eg. <a>) broken up by <p> tags will retain their attributes, without a limit. The following HTML displays this behaviour:

<p><a href="AAAAAAAAAA"><p>a<p>a<p>a<p>a

When parsed following the spec, it is reserialized to:

<p><a href="AAAAAAAAAA"></a></p>
<p><a href="AAAAAAAAAA">a</a></p>
<p><a href="AAAAAAAAAA">a</a></p>
<p><a href="AAAAAAAAAA">a</a></p>
<p><a href="AAAAAAAAAA">a</a></p>

The more <p>a tags are added, the more the attribute will be duplicated. If this attribute is large as well, it can quickly blow out of proportion. By mixing 1 part A-length and 4 parts <p>a amount, the optimal scaling factor is reached with the following growth:

def generate_input(attr_len, p_amount):
    return '<p><a href="' + 'A' * attr_len + '">' + '<p>a' * p_amount
  • 0.1 MB (49986, 12500) of input serializes to 625 MB of output
  • 1 MB (499984, 124998) of input serializes to 62.5 GB of output
  • 10 MB (4999986, 1250000) of input serializes to 6.25 TB of output

This issue seems to be related to #3732, and Noah's Ark clause preventing even larger growth, but I've demonstrated against local servers that with only a few requests, any server-side HTML parser implementing the spec correctly will become unresponsive.

A few different servers have HTTP body size limits, but these often are limited at around 0.1 or 1 MB, which as seen above can still cause significant damage. HTML parsers are often used to sanitize untrusted input on the server side.

Is there something we can do in the spec to, for example, discard attributes after a certain amount of duplication?

@annevk
Copy link
Member

annevk commented Nov 18, 2024

I think we can call out the risk somewhere, but unless there is an interoperable limit in implementations I don't think we can enforce one.

@zcorpan
Copy link
Member

zcorpan commented Nov 18, 2024

cc @whatwg/html-parser

@whatwg whatwg deleted a comment from magmahja Nov 26, 2024
@zcorpan
Copy link
Member

zcorpan commented Nov 27, 2024

The spec allows aborting parsing for any parse error, and a parse error is needed to trigger this condition.

The spec also allows implementation-defined limits to guard against DoS attacks.

Since browsers can be DoS'd in various ways, adding limits to the HTML parser for reconstructing formatting elements doesn't seem so attractive vs web compat risk.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Development

No branches or pull requests

3 participants