top | item 30727405

(no title)

cocoafleck | 4 years ago

I was trying to learn more about the ranking algorithm that Alexandria uses, and I was a bit confused by the documentation on Github for it. Would I be correct in that it uses "Harmonic Centrality" (http://vigna.di.unimi.it/ftp/papers/AxiomsForCentrality.pdf) at least for part of the algorithm?

discuss

order

josefcullhed|4 years ago

Hi,

Yes our documentation is probably pretty confusing. It works like this, the base score for all URLs to a specific domain is the harmonic centrality (hc). Then we have two indexes, one with URLs and one with links (we index the link text). Then we first make a search on the links, then on the URLs. We then update the score of the urls based on the links with this formula: domain_score = expm1(5 * link.m_score) + 0.1; url_score = expm1(10 * link.m_score) + 0.1;

then we add the domain and url score to url.m_score

where link.m_score is the HC of the source domain.

jll29|4 years ago

The main scoring function seems to be index_builder<data_record>::calculate_score_for_record() in line 296 of https://github.com/alexandria-org/alexandria/blob/main/src/i..., and it mentions support for BM25 (Spärck Jones, Walker and Robertson, 1976) and TFIDF (Spärck Jones, 1972) term weighting, pointing to the respective Wikipedia pages.