Yandex Palekh Algorithm Catches The Long Tail With Machine Learning

Yandex Palekh

Yesterday, Yandex announced that they launched something similar to the Google RankBrain – well, they didn’t say that, I am.

They launched what they call Palekh which is name of a Russian city, the flag of that city is of a firebird, which you can see in the image above. Why the firebird, well, it has a long tail and this algorithm aims at improving the quality of the results for long tail queries.

Yandex told us that they handle about 100 million queries per day fall under the “long-tail” classification within their search engine. That is about 40% of all the queries performed on that search engine.

So they wanted to make the results better by better understanding those queries. Yandex told me that basically,” the technology allows us to understand the meaning behind every query, and not just look for similar words.”

For that, we’re starting to use neural networks as one of 1500 factors of ranking – we’ve managed to teach our neural networks to see the connections between a query and a document even if they don’t contain common words. This has been made possible by converting the words from billions of search queries into numbers (with groups of 300 each) and putting them in 300-dimensional space – now every document has its own vector in that space. If the numbers of a query and numbers of a document are near each other in that space, then the result is relevant. This technology is called a “semantic vector”.

They are using “billions of queries from logs and relying on documents’ headlines and search queries, not documents’ texts yet.” “We also have many targets (long click prediction, CTR, “click or not click” models etc.) that are teaching our neural network – our research has showed that using more targets is more effective,” they added. So this is a self learning, machine learning algorithm.

Yandex is a very very important search engine for Russian users.

Forum discussion at Twitter.

Go to Source
Author: barry@rustybrick.com (Barry Schwartz)

onpage seo

COMMENTS