[wp-trac] [WordPress Trac] #56294: WordPress search finds block name in comment
WordPress Trac
noreply at wordpress.org
Wed Jul 27 12:36:55 UTC 2022
#56294: WordPress search finds block name in comment
-------------------------+--------------------------------------
Reporter: zodiac1978 | Owner: (none)
Type: enhancement | Status: new
Priority: normal | Milestone: Awaiting Review
Component: Database | Version: 5.0
Severity: normal | Keywords: needs-patch dev-feedback
Focuses: |
-------------------------+--------------------------------------
There is a known issue with the WP search, that it is a full text search
over
`post_content` which also finds HTML tags like `table`. So searching for
the word "table" also finds every post/page with table markup in it.
This problem is very limited, so it wasn't necessary to fix it, although
there is a plugin to fix it:
https://wordpress.org/plugins/wp-search-ignore-html-tags/
Now with the block editor (aka Gutenberg) this has changed. Every block is
using the block name in a HTML comment. For example:
{{{
<!-- wp:syntaxhighlighter/code {"language":"php"} -->
}}}
If I now search in a tech blog about the term "syntaxhighlighter" I get
every post/page with a code block and not only if the post/page is really
containing this word in the text.
And with every new block the chance is higher to get more false positive
search results.
Even the core blocks have problems, as "paragraph" (instead of just "p")
or image (instead of "img") have a much higher chance for false positive
search results, because of the ambiguity.
There is a Github issue for the block editor about it:
https://github.com/WordPress/gutenberg/issues/3739
But it was closed from @pento due to the fact, that it is a known
WordPress issue and not necessarily a problem of the block editor and its
type of data.
@danielbachhuber was asking at
https://github.com/WordPress/gutenberg/issues/10307#issuecomment-426995580
> However, I don't have any great ideas for how to resolve this with
MySQL. I'd love to hear of a solution if someone has one. Barring that,
this probably won't be a priority to fix with WP 5.0
After looking at the plugin linked above, I created a solution (with the
support from @kau-boy):
{{{
/**
* Modify search query to ignore the search term in HTML comments.
*
* @param string $where The WHERE clause of the query.
* @param WP_Query $query The WP_Query instance (passed by reference).
*
* @return string The modified WHERE clause.
*/
function tl_update_search_query( $where, $query ) {
if ( ! is_search() || ! $query->is_main_query() ) {
return $where;
}
global $wpdb;
$search_query = get_search_query();
$search_query = $wpdb->esc_like( $search_query );
$where .= " AND {$wpdb->posts}.post_content NOT REGEXP
'<!--.*$search_query.*-->' ";
return $where;
}
add_filter( 'posts_where', 'tl_update_search_query', 10, 2 );
}}}
Before I try to create a PR for it. Would this be a possible way to solve
this or is a `NOT REGEXP` too slow if many posts exist? I am running this
solution on my blog but it has no high traffic and not very much posts -
so my finding may not show the big picture here.
Feedback about possible problems (and hopefully how to solve them) are
much appreciated! Thanks in advance.
--
Ticket URL: <https://core.trac.wordpress.org/ticket/56294>
WordPress Trac <https://core.trac.wordpress.org/>
WordPress publishing platform
More information about the wp-trac
mailing list