You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The more <p>a tags are added, the more the attribute will be duplicated. If this attribute is large as well, it can quickly blow out of proportion. By mixing 1 part A-length and 4 parts <p>a amount, the optimal scaling factor is reached with the following growth:
0.1 MB (49986, 12500) of input serializes to 625 MB of output
1 MB (499984, 124998) of input serializes to 62.5 GB of output
10 MB (4999986, 1250000) of input serializes to 6.25 TB of output
This issue seems to be related to #3732, and Noah's Ark clause preventing even larger growth, but I've demonstrated against local servers that with only a few requests, any server-side HTML parser implementing the spec correctly will become unresponsive.
A few different servers have HTTP body size limits, but these often are limited at around 0.1 or 1 MB, which as seen above can still cause significant damage. HTML parsers are often used to sanitize untrusted input on the server side.
Is there something we can do in the spec to, for example, discard attributes after a certain amount of duplication?
The text was updated successfully, but these errors were encountered:
Since browsers can be DoS'd in various ways, adding limits to the HTML parser for reconstructing formatting elements doesn't seem so attractive vs web compat risk.
What is the issue with the HTML Standard?
Formatting elements (eg.
<a>
) broken up by<p>
tags will retain their attributes, without a limit. The following HTML displays this behaviour:When parsed following the spec, it is reserialized to:
The more
<p>a
tags are added, the more the attribute will be duplicated. If this attribute is large as well, it can quickly blow out of proportion. By mixing 1 partA
-length and 4 parts<p>a
amount, the optimal scaling factor is reached with the following growth:0.1 MB
(49986, 12500) of input serializes to625 MB
of output1 MB
(499984, 124998) of input serializes to62.5 GB
of output10 MB
(4999986, 1250000) of input serializes to6.25 TB
of outputThis issue seems to be related to #3732, and Noah's Ark clause preventing even larger growth, but I've demonstrated against local servers that with only a few requests, any server-side HTML parser implementing the spec correctly will become unresponsive.
A few different servers have HTTP body size limits, but these often are limited at around 0.1 or 1 MB, which as seen above can still cause significant damage. HTML parsers are often used to sanitize untrusted input on the server side.
Is there something we can do in the spec to, for example, discard attributes after a certain amount of duplication?
The text was updated successfully, but these errors were encountered: