Skip to content

Improve/rwi host demotion#787

Open
songproducer wants to merge 2 commits into
yacy:masterfrom
songproducer:improve/rwi-host-demotion
Open

Improve/rwi host demotion#787
songproducer wants to merge 2 commits into
yacy:masterfrom
songproducer:improve/rwi-host-demotion

Conversation

@songproducer

Copy link
Copy Markdown
Contributor

Add per-host and per-word score demotion to RWI ranking

Adds fine-grained score demotion to the RWI ranking configuration page (/RankingRWI_p.html), allowing administrators to push unwanted results toward the end of search results without
removing them from the index entirely.

Changes

Demoted Hosts — enter one hostname per line to divide its ranking score by a configurable divisor (default 100). Demotion is automatically suppressed when any significant label of
the hostname appears in the search query. Applies to both the RWI and Solr result paths.

Demoted Words — enter one word or quoted phrase per line to demote any result whose URL, title, or description contains that term. Supports an optional custom divisor per entry (e.g.
"adult services" 500). Applies to the Solr result path.

Both lists support an optional trailing divisor on each line:
example.com 500
spammy-site.org
"unwanted phrase" 1000
keyword
Lines without a divisor default to 100.

Configuration

Settings are stored in yacy.conf under:

  • search.ranking.rwi.demotedhosts
  • search.ranking.rwi.demotedwords

songproducer and others added 2 commits April 25, 2026 21:18
YaCy already computes fuzzy_signature_l (a 64-bit hash of the document's
representative word frequency profile) for every indexed page. This change
tracks which signatures have been emitted in shouldEmitCandidate and skips
subsequent pages with the same signature, preventing near-identical content
(mirror pages, scraped copies, paginated duplicates) from filling the result
set. Pages with signature=0 (not yet computed) pass through unchanged.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Adds a "Demoted Hosts" and "Demoted Words" section to the RWI ranking
config page (RankingRWI_p.html). Hosts and words/phrases can each have
an optional custom score divisor (default 100). Demotion applies to both
the RWI and Solr result paths. Host demotion is automatically skipped
when any part of the hostname appears in the search query, so targeted
searches still surface the result normally.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant