[wp-trac] [WordPress Trac] #63913: WordPress assumes that the UTF-8 PCRE flag is available.
WordPress Trac
noreply at wordpress.org
Wed Sep 3 05:49:38 UTC 2025
#63913: WordPress assumes that the UTF-8 PCRE flag is available.
-------------------------+-----------------------------
Reporter: dmsnell | Owner: (none)
Type: enhancement | Status: new
Priority: low | Milestone: Future Release
Component: Charset | Version: trunk
Severity: normal | Resolution:
Keywords: | Focuses:
-------------------------+-----------------------------
Comment (by tusharbharti):
hi @dmsnell, thanks for mention,
for
{{{#!php
<?php
if ( preg_match( '/\p{Han}|\p{Hiragana}|\p{Katakana}|\p{Hangul}/u', $name
) || false === strpos( $name, ' ' ) ) {
$initials = mb_substr( $name, 0, min( 2, mb_strlen( $name, 'UTF-8'
) ), 'UTF-8' );
} else {
$first = mb_substr( $name, 0, 1, 'UTF-8' );
$last = mb_substr( $name, strrpos( $name, ' ' ) + 1, 1,
'UTF-8' );
$initials = $first . $last;
}
}}}
We can possibly use `IntlChar` class to detect the script and get the
initials
{{{#!php
<?php
$firstChar = mb_substr( $name, 0, 1, 'UTF-8' );
$codepoint = IntlChar::ord( $firstChar );
$block = IntlChar::getBlockCode( $codepoint );
$cjkBlocks = array(
IntlChar::BLOCK_CODE_CJK_UNIFIED_IDEOGRAPHS,
IntlChar::BLOCK_CODE_HANGUL_SYLLABLES,
IntlChar::BLOCK_CODE_HIRAGANA,
IntlChar::BLOCK_CODE_KATAKANA,
);
if ( in_array( $block, $cjkBlocks, true ) ) {
$initials = mb_substr( $name, 0, min( 2, mb_strlen( $name, 'UTF-8' )
), 'UTF-8' );
}
}}}
> An incomplete PCRE pattern I used to find sources in the codebase using
the /u flag follows. It would be better to build a search based off of a
PHP parser, but finding a comprehensive list of places assuming the UTF-8
flag is left as an exercise for future work on this ticket.
>
> {{{('|")((?!\1)[^_a-
zA-Z0-9-])((?!\1).)+\2[idsxumrADSUXJ]*?u[idsxumrADSUXJ]*?\1}}}
> This looks for string literals starting and ending with the same
delimiter, followed by a set of PCRE modifiers including the u, terminated
by the opening quote. It does not find HEREDOC or NOWDOC patterns and it
does not find unmatched delimiters like parentheses or other brackets.
Hmm, I will see if I can write the scanner but I can meantime improve the
regex.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/63913#comment:1>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list