High-performance FZF-style subsequence matching implementation

Algorithm Overview

This implements a single-pass subsequence matcher optimized for interactive completion. Characters from the pattern must appear in the candidate string in the same order, but not necessarily consecutively.

Example Matches

Pattern "mlnd" matches:

"mailinglists/neomutt-dev" (high score: matches at word boundaries)
"mailing_list_node" (medium score: some boundaries)
"my_long_nested_dir" (lower score: scattered matches)

Pattern "inb" matches:

"INBOX" (highest: start of string + consecutive)
"Archive/INBOX" (medium: not at root)
"mailbox/inbox-archive" (lower: gaps between matches)

UTF-8 Support

The matcher is UTF-8 aware but uses ASCII-only case folding:

Byte-wise matching: Treats strings as sequences of bytes
ASCII case folding: Only A-Z are folded to a-z
UTF-8 preservation: Multi-byte UTF-8 sequences are never split
ASCII boundaries: Only ASCII separators (/.-_) get boundary bonuses

This approach provides:

✓ Fast performance (no Unicode decoding overhead)
✓ Correct UTF-8 handling (multi-byte sequences preserved)
✓ ASCII case-insensitive matching (works for English text)
✓ UTF-8 matching works (as byte sequences)

Examples:

"inbox" matches "INBOX" (ASCII folding)
"café" matches "Café" (exact bytes, no case folding on é)
"mail" matches "郵件/mail/box" (ASCII substring matches)
"郵件" matches "郵件/mail" (exact byte sequence)

Note: Non-ASCII characters are matched case-sensitively as byte sequences. This is intentional for performance and simplicity.

Scoring Rules

The algorithm assigns scores based on multiple factors:

Base Score

Each matched character: +10 points

Bonuses

Start of string: +30 points (e.g., "I" in "INBOX")
After separator ('/', '.', '-', '_'): +15 points
Prefix match (when prefer_prefix=true): +40 points
Consecutive matches: +15 points per consecutive char (e.g., "box" all together in "mailbox")
CamelCase boundary: +10 points (e.g., "M" in "MyMailbox") (Only detects ASCII A-Z boundaries)

Penalties

Gaps between matches: -2 points per character gap (encourages compact matches)
Total span: -1 point per character in match span (first to last matched char)
String length: -length/4 points (slightly favors shorter candidates)

Smart Case Matching

By default, matching is case-insensitive (ASCII only). With smart_case enabled:

All-lowercase pattern → case-insensitive (ASCII)
Pattern with uppercase → case-sensitive

Note: Smart case only examines ASCII characters (A-Z).

Performance Characteristics

Time complexity: O(n) where n = length of candidate string
Space complexity: O(m) where m = length of pattern (stack only)
No heap allocation: Uses fixed-size stack array for match positions
No backtracking: Single forward pass through candidate
No recursion: Purely iterative
No UTF-8 decoding: Byte-wise comparison for maximum speed

This makes it suitable for interactive use even with thousands of candidates.

Why These Rules?

The scoring model is optimized for hierarchical paths and structured strings:

Boundary bonuses help match path components and words
Consecutive bonuses reward compact substring matches
Gap penalties discourage scattered character matches
Span penalties favor matches clustered together
Length penalties prevent long strings from dominating

This produces intuitive rankings for mailbox paths, commands, and names.