[wp-trac] [WordPress Trac] #7394: Search: order results by relevance
WordPress Trac
wp-trac at lists.automattic.com
Thu Aug 23 03:18:08 UTC 2012
#7394: Search: order results by relevance
-------------------------+-----------------------------
Reporter: markjaquith | Owner:
Type: enhancement | Status: assigned
Priority: normal | Milestone: Future Release
Component: General | Version: 2.6
Severity: normal | Resolution:
Keywords: has-patch |
-------------------------+-----------------------------
Comment (by tomauger):
Well, it appears that using REGEXP is significantly slower than a brute-
force WHERE and ORDER BY clause, though the SQL is arguably more elegant
(but who cares, I guess).
However, one takeaway from the SQL below is that we might want to be a bit
more careful around word boundaries. I would argue that a post title
called "Best Post Evah" matches the search term "Post" better than "Ten
Composting Tricks". See below:
{{{
SELECT SQL_CALC_FOUND_ROWS wp_posts.ID, wp_posts.post_title
FROM wp_posts
WHERE 1=1
AND (wp_posts.post_title REGEXP 'one|two|three' OR
wp_posts.post_content REGEXP 'one|two|three')
AND wp_posts.post_type IN ('post', 'page', 'attachment')
AND (wp_posts.post_status = 'publish' OR wp_posts.post_author = 1
AND wp_posts.post_status = 'private')
ORDER BY
wp_posts.post_title NOT REGEXP '[[:<:]]one two three[[:>:]]',
wp_posts.post_content NOT REGEXP '[[:<:]]one two three[[:>:]]',
wp_posts.post_title NOT REGEXP '[[:<:]]one two[[:>:]]|[[:<:]]two
three[[:>:]]',
wp_posts.post_content NOT REGEXP '[[:<:]]one two[[:>:]]|[[:<:]]two
three[[:>:]]',
wp_posts.post_title NOT REGEXP 'one|two|three',
wp_posts.post_date DESC
LIMIT 0, 10
}}}
Note that I'm unsure as to the weighting of the same search sequence
within post_content as post_title. We may decide that two search terms
with proper word boundaries in the title is still better than the full
match in the content.
Of course, this then excludes pluralizations and so forth, so "Post" would
no longer have a high relevance rating with "Top Ten Posts of 2012"
because of the "s".
--
Ticket URL: <http://core.trac.wordpress.org/ticket/7394#comment:20>
WordPress Trac <http://core.trac.wordpress.org/>
WordPress blogging software
More information about the wp-trac
mailing list