I have a table with a description
column.
In this table, I have added an extra column called tsv
that is populated with the tsvector of the description.
I'm trying to query the tsv
column for a certain phrase but add negate phrases to reduce the number of false positives returned, but I'm not sure what the correct way is to do it as I get a different number of result returned for each way I am trying.
For example, take stream, as in a little river.
I get
WHERE tsv @@ to_tsquery('english', '
stream
')
RETURNS: 26
But in some of the descriptions as I have checked all 26 manually, they only talk about:
...light streaming in from both the window... (There is 1 with this in)
or
...stream of natural light... (There are 2 with this in)
Which is nothing to do with a little stream of running water.
That's a total of 3, so I am expecting 23 to be returned instead.
Here's what I have tried so far, none of which return 23:
WHERE tsv @@ to_tsquery('english', '
stream
& ! light<->streaming
| ! stream<2>natural<->light
')
>RETURNS: 261
or
WHERE tsv @@ to_tsquery('english', '
stream
& ! light<->streaming
& ! stream<2>natural<->light
')
>RETURNS: 3
or
WHERE tsv @@ to_tsquery('english', '
stream
& ! (
light<->streaming
| stream<2>natural<->light
)
')
>RETURNS: 8
or
WHERE tsv @@ to_tsquery('english', '
stream
& (
! light<->streaming
| ! stream<2>natural<->light
)
')
>RETURNS: 26
What am I doing wrong? Or is there a completely different way I need to be doing this?
Thanks in advance
ADDITION
Just so I know for sure and for my sanity, I rang this code for each term I'm trying to negate
WHERE tsv @@ to_tsquery('english',
'light<->streaming'
)
>RETURNS: 1
WHERE tsv @@ to_tsquery('english',
'stream<2>natural<->light'
)
>RETURNS: 2
Both returned the records I want removing from the main query.
FOLLOW UP
I couldn't solve this doing it the way I was trying to do it e.g. all within one clause.
So to get around the issue, I create separate where clauses for each negate term.
So instead of
...
WHERE tsv @@ to_tsquery('english',
'stream
& !(light <-> stream)
& !(stream <2> natural <-> light)'
);
I did this instead
...
WHERE tsv @@ to_tsquery('english','stream')
AND NOT WHERE tsv @@ to_tsquery('english','light <-> stream')
AND NOT WHERE tsv @@ to_tsquery('english','stream <2> natural <-> light')
Doing it this way produced the results I was expecting.
Hope this helps someone with the same problem.