Skip to content

Fix MIME charset sniffing advancing by name length not value length#22343

Closed
iliaal wants to merge 1 commit into
php:PHP-8.4from
iliaal:fix-libxml-mime-sniff
Closed

Fix MIME charset sniffing advancing by name length not value length#22343
iliaal wants to merge 1 commit into
php:PHP-8.4from
iliaal:fix-libxml-mime-sniff

Conversation

@iliaal

@iliaal iliaal commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

php_libxml_sniff_charset_from_string() advanced the parse cursor by the parameter name length after collecting an unquoted parameter value (WHATWG mime-sniff step 11.9.1), instead of the value length. A Content-Type whose parameter before charset has a name and value of different lengths (for example "text/html; abcd=ef;charset=ISO-8859-1") misaligns the cursor so the charset parameter is missed, and Dom\HTMLDocument loading falls back to the wrong encoding. The existing createFromFile HTTP-header test only used equal-length name=value pairs, which masked it.

@iliaal iliaal requested a review from devnexen as a code owner June 16, 2026 21:29
php_libxml_sniff_charset_from_string() advanced the parse cursor by the
parameter name length after collecting an unquoted parameter value
(WHATWG mime-sniff step 11.9.1), instead of the value length. When a
Content-Type parameter before charset had a name and value of different
lengths, the cursor misaligned and the charset parameter was missed, so
document loading fell back to the wrong encoding.

Closes phpGH-22343
@iliaal iliaal closed this in 98563c2 Jun 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants