Improve/rwi host demotion#787
Open
songproducer wants to merge 2 commits into
Open
Conversation
YaCy already computes fuzzy_signature_l (a 64-bit hash of the document's representative word frequency profile) for every indexed page. This change tracks which signatures have been emitted in shouldEmitCandidate and skips subsequent pages with the same signature, preventing near-identical content (mirror pages, scraped copies, paginated duplicates) from filling the result set. Pages with signature=0 (not yet computed) pass through unchanged. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a "Demoted Hosts" and "Demoted Words" section to the RWI ranking config page (RankingRWI_p.html). Hosts and words/phrases can each have an optional custom score divisor (default 100). Demotion applies to both the RWI and Solr result paths. Host demotion is automatically skipped when any part of the hostname appears in the search query, so targeted searches still surface the result normally. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Add per-host and per-word score demotion to RWI ranking
Adds fine-grained score demotion to the RWI ranking configuration page (/RankingRWI_p.html), allowing administrators to push unwanted results toward the end of search results without
removing them from the index entirely.
Changes
Demoted Hosts — enter one hostname per line to divide its ranking score by a configurable divisor (default 100). Demotion is automatically suppressed when any significant label of
the hostname appears in the search query. Applies to both the RWI and Solr result paths.
Demoted Words — enter one word or quoted phrase per line to demote any result whose URL, title, or description contains that term. Supports an optional custom divisor per entry (e.g.
"adult services" 500). Applies to the Solr result path.
Both lists support an optional trailing divisor on each line:
example.com 500
spammy-site.org
"unwanted phrase" 1000
keyword
Lines without a divisor default to 100.
Configuration
Settings are stored in yacy.conf under: