Skip to content

Sort enum-case unions by class and case name instead of describe()#5929

Merged
staabm merged 2 commits into
phpstan:2.2.xfrom
SanderMuller:enum-sort-describe-fastpath
Jun 25, 2026
Merged

Sort enum-case unions by class and case name instead of describe()#5929
staabm merged 2 commits into
phpstan:2.2.xfrom
SanderMuller:enum-sort-describe-fastpath

Conversation

@SanderMuller

Copy link
Copy Markdown
Contributor

UnionTypeHelper::sortTypes() runs on every UnionType construction. Object and enum-case members fall through to the final branch, which sorts them via describe():

return self::compareStrings($a->describe(VerbosityLevel::typeOnly()), $b->describe(VerbosityLevel::typeOnly()));

For enum cases that is the documented "never sort or compare types via describe()" anti-pattern, and it is the dominant cost of sorting a large-enum union: a match/switch over the enum reconstructs the subject union (the enum minus the matched cases) once per arm, so each arm re-sorts the shrinking union and re-runs the describe() machinery (VerbosityLevel::handle dispatch plus sprintf) over every member.

This adds an enum-case fast-path that compares className.'::'.caseName, the same key the maintainer already uses for enum cases in IntersectionType::getFiniteTypes() (enum cases use this key there, other finite types fall back to describe(typeOnly)). The sort order is identical because the comparator's strcasecmp primary is case-insensitive, so it is unaffected by the only difference from describe(): the raw stored class name can differ in case from the canonical reflection name describe() resolves. The fast-path skips the VerbosityLevel dispatch and the sprintf per comparison.

The numbers come from a stress test, a match over a 1,500-case enum: CPU 7.50s to 3.67s (−51%), output byte-identical. That size is a scaling demonstration rather than typical code; real enums rarely get that large, so the win is per-file on the rare file with a large-enum match/switch (e.g. a Locale-style enum) and does not move whole-project time. It removes the per-comparison constant factor, not the O(N²) per-arm re-sort itself.

instanceof EnumCaseObjectType gets a baseline entry next to the two existing ones for that rule (phpstanApi.instanceofType); IntersectionType and EnumCaseObjectType already use the same instanceof for internal type-system code.

Verified: Type 2941, Analyser 2851, and the match-exhaustiveness plus enum rule suites 432 all pass with output unchanged.

UnionTypeHelper::sortTypes runs on every UnionType construction and sorts
enum-case members through describe(VerbosityLevel), the documented "never sort
via describe()" anti-pattern. It is the dominant cost of sorting a large-enum
union, which a match/switch over the enum re-sorts once per arm.

Compare enum cases by className.'::'.caseName instead - the same key
IntersectionType::getFiniteTypes() already uses. That is the describe(typeOnly)
string for an enum case (enums are never generic), so the sort order is
identical, without the per-comparison VerbosityLevel dispatch and sprintf.

The instanceof EnumCaseObjectType gets a baseline entry alongside the two
existing ones for that rule (the same internal type-system usage).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@staabm

staabm commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Please show the code of the reproducer for this perf improvement.

In which real world file was the bottleneck observed?

@SanderMuller

Copy link
Copy Markdown
Contributor Author

Here's a self-contained reproducer. The only thing that builds (and repeatedly re-sorts) a large enum-case union is an exhaustive match/switch over a large backed enum: PHPStan narrows the subject arm by arm to check exhaustiveness, so UnionTypeHelper::sortTypes() sorts the shrinking case-union once per arm, and for enum cases that comparison falls through to the describe() branch.

php -r '$N=800; $s="<?php\n\nenum Status: int\n{\n";
for($i=0;$i<$N;$i++)$s.="    case C$i = $i;\n"; $s.="}\n\nfunction handle(Status \$x): string\n{\n    return match (\$x) {\n";
for($i=0;$i<$N;$i++)$s.="        Status::C$i => \"v$i\",\n"; $s.="    };\n}\n"; file_put_contents("repro.php",$s);'
vendor/bin/phpstan analyse -l 8 repro.php

A/B swapping only UnionTypeHelper.php (this branch vs its parent, same vendor, CPU, 3 reps):

  • 800-case enum: 3.87s to 2.06s (-47%)
  • 1500-case enum: 7.50s to 3.67s (-51%), output byte-identical

foreach (Enum::cases()) and !== narrowing don't hit it; it needs an exhaustive match/switch whose subject is the enum.

On the real-world file: honestly, I don't have one. Large enums do exist in the wild (Tempest's Locale is 782 cases), but I couldn't find one that's actually matched over. They get matched as their backing string, or read via cases()/from(), neither of which builds the big union. So I'd frame this as the documented "never sort/compare types via describe()" anti-pattern rather than a fix for a measured hot file. The key it switches to (className::caseName) is the same one IntersectionType::getFiniteTypes() already uses, so the ordering is unchanged; the reproducer is there to show the upside at scale if such a file ever turns up.

One bound, in case it comes up: sortTypes() early-returns unsorted above 1024 members, so the per-arm sort is capped. For an enum of <= 1024 cases (a 782-case Locale included) every arm sorts; past that the first arms skip. So it's a constant-factor win on a bounded sort, not an unbounded-N change.

@staabm

staabm commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

so how did you come to this optimization? does it fix a real world problem in one of the projects you analyzed?

@SanderMuller

Copy link
Copy Markdown
Contributor Author

so how did you come to this optimization? does it fix a real world problem in one of the projects you analyzed?

I've been running https://github.com/SanderMuller/boost-skills/blob/main/resources/boost/skills/autoresearch/SKILL.md for almost entire yesterday, with a second claude instance on /loop to create new experiments to try. I'm now at 138 experiments and isolating the wins which are significant and not overly complex to contribute them. Trying to keep it non-slop and useful.

@ondrejmirtes

Copy link
Copy Markdown
Member

I welcome this 👍 it's okay that AI comes up with this if we find real improvements

@staabm

staabm commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

I am thinking whether the perf fix should be in EnumCaseObjectType->describe() instead, so more call-site benefit from it

@staabm staabm mentioned this pull request Jun 25, 2026
@SanderMuller

Copy link
Copy Markdown
Contributor Author

Good idea, I tried it. I put a small per-verbosity memo on EnumCaseObjectType::describe() (cache the className::caseName string keyed by VerbosityLevel, which is safe because the type is immutable) and dropped the sortTypes fast-path, then measured all three on the 800-case exhaustive-match reproducer (single-process, cold, CPU, 3 reps median):

variant user CPU vs base
base 4.88s baseline
sortTypes fast-path (this PR) 3.16s -35%
EnumCaseObjectType::describe() memo 3.19s -35%

So the memo matches the fast-path on this workload, and you're right that it's broader. Instrumenting it on the same run showed 4.8k computes against 6.35M cache hits (99.9%): the same case instances recur across the per-arm narrowing, so one memo serves the whole sort plus every other describe() call on those cases. Byte-identical too: UnionTypeTest 185/185 and Type/Enum 30/30 stay green with the memo and the fast-path removed.

The trade-off is just where the cost lands. The memo adds a per-instance array<int, string> to every EnumCaseObjectType (paid on every run, including the common case where nothing matches over a large enum and the win is nil), whereas the comparator fast-path carries no extra state but only helps the sort. Either way describe() isn't hot on real code, so neither moves whole-project time; this only shows up on a large exhaustive match/switch.

Happy to switch the PR to the memo if you'd prefer that home for it. It's the cleaner and broader location; the only thing to weigh against it is the always-paid per-instance cache. Your call.

@staabm staabm merged commit 86435c0 into phpstan:2.2.x Jun 25, 2026
669 of 670 checks passed
@staabm

staabm commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants