[wp-hackers] Development for 2.x : Improved Search

Scott johnson fuzzygroup at gmail.com
Sun Feb 5 13:17:47 GMT 2006


Hi Denis,

I'll download your code and take a look at it.  There's a fair bit of
tweaking that you do to setup mysql full text search and I don't know what
your assumptions were.

I'd also -- for any search task -- recommend offering search by date, newest
to oldest, as a default if not option.  The world is increasingly about
currency and, particularly in a blog context, this is key.

Finally, if memory serves me correctly, the default connector in MySQL full
text search is OR not AND which means that users get what appear to be
random results.  When we switched this in Feedster it essentially fixed the
problem from the users perspective.

Reworking the relevance score is always tricky but it certainly can be
done.  http://www.queryserver.com/ is a product of mine from '97 (still
around) that normalizes relevance scores across all the major search engines
and produces a merged metasearch result.

*Scurries to download code and try it before I have to get offline*.

Thanks!

Scott

On 2/5/06, Denis de Bernardy <denis at semiologic.com> wrote:
>
> > > a) Simple : Add MySQL Full Text Indexing to Wordpress and
> > > modify the search hooks to use it. Moving to FT indices on
> > > MyISAM tables gives actually quite good serch out of the
> > > gate. (...)
> > >
> > > Difficulty: not huge.  Willing to do in full myself.
> >
> > This is mostly done already, and MyISAM full text indexing is
> > about as bad as bad can get.
> >
> > http://www.semiologic.com/software/search-reloaded/
>
> As additional information, a past version of the plugin did a slightly
> better job than the above at the cost of a huge compute power. To spare
> yourself some time:
>
> 1. Using a FT index on the text-only version of the formatted post excerpt
> and content does not improve the results in any significant manner.
>
> 2. MySQL has a number of issues that are related to charsets.
>
> These tend to worsen after MySQL 4.1 (at which point they introduced a
> collection of new bugs, for good measure). The underlying mess is a
> nightmare to sort out.
>
> 3. Trying to tweak the results by reworking the raw mysql score can
> produce
> meaningful enhancements but involved a significant overhead.
>
> Things I tried include the keyword order, their presence in the post
> title,
> presence or absence of double quotes to create keyword groups, and later
> on
> the use of a soundex.
>
> I eventually dropped all of these ideas because working around MySQL's
> lack
> of features by using php was simply ridiculous. If you give this a shot
> yourself, store your indexes and search procedures in a real database,
> such
> as pgsql.
>
> 4. Last but not least, several users sent me messages along the lines of
> the
> following:
>
> "Search reloaded returns results in a random order. Why doesn't it sort
> results by date?"
>
>
> Denis
>
> _______________________________________________
> wp-hackers mailing list
> wp-hackers at lists.automattic.com
> http://lists.automattic.com/mailman/listinfo/wp-hackers
>


More information about the wp-hackers mailing list