Did you try spacy's most similar method? It's written in cython so is presumably quite fast as well. Thanks for the rust implementation though, I will most likely use this.
I’ve not much to say on the actual lib, it seems great! However, don’t feel compelled to put all your rust code into a single lib.rs. You can split your work into several files and use ‘pub use’ and ‘mod’ in lib.rs to re-export your functions & types into a public API of your choosing.
cargo check and format time might also slightly improve!
This webpage use a significant amount of CPU constantly for no apparent reason (as far as I can see it is mostly a static webpage). What the hell ? Is it mining crypto in the background ?
Sorry, this page had a useEffect/setState render loop. We are running react@experimental with concurrent mode, and missed the error. Rolling out a fix now. Thanks!
These results are less accurate than Google Translate. But they are far faster to get, and far less expensive to generate: https://cloud.google.com/translate/pricing — our goal is here is speed. We want to search through many possibilities as quickly as possible.
The word vectors have been aligned in multiple languages. Using an approximate nearest neighbor search we are able to find the nearest vector to the input in multiple languages very quickly.
To keep the example simple, we did not try to filter the data through hand-built language dictionaries. In fact, we simply drop words in other languages that also appear in the English .vec file. Words like "ciao" appear frequently enough in otherwise English sentences that the example code drops it from Italian, and so is not shown in the results:
One improvement would be to filter out any words that do not appear in a hand-curated dictionary instead of filtering out words that already appear in English. We decided not to show how to do this because we'd already introduced a few concepts, like aligned word vectors, approximate nearest neighbour searches, and wanted to keep the example as simple as possible.
Can something like this be done to compare/translate subsequences COVID genetic code to SARS and other virus genetic codes. Would be interesting how much overlap there is. And would further the research into where it came from.
Bioinformaticists have been able to do that with traditional algorithms for years (dynamic programming gets you a long way to compute an edit distance for example).
It sounds like you're thinking of "sequence alignment", which is a pretty standard bioinformatics tool.
BLAST (=Basic Local Alignment Search Tool) is one common version, and the NIH'S NCBI has a variety of nice online tools here: https://blast.ncbi.nlm.nih.gov/Blast.cgi
Note that it does take a little bit of background knowledge to interpret:some motifs are just really common, others are shared.
[+] [-] beau|4 years ago|reply
[+] [-] habibur|4 years ago|reply
cp target/release/libinstant_distance.so instant-distance-py/test/instant_distance.so
and it works. Built and running. The main tree was MacOS only.
Here's resource consumption in a sample run.
Time: 4.49s, Memory: 1552 mb.
Single word. Three langs including en.
[+] [-] arbol|4 years ago|reply
[+] [-] Fiahil|4 years ago|reply
cargo check and format time might also slightly improve!
[+] [-] maeln|4 years ago|reply
[+] [-] maybevain|4 years ago|reply
Quick glance in this case: took a couple second snapshot on the Performance tab and saw a lot of React related calls.
[+] [-] beau|4 years ago|reply
[+] [-] chakkepolja|4 years ago|reply
[+] [-] denysvitali|4 years ago|reply
> Language: fr, Translation: bonjours
> Language: fr, Translation: bonsoir
> Language: fr, Translation: salutations
> Language: it, Translation: buongiorno
> Language: it, Translation: buonanotte
> Language: fr, Translation: rebonjour
> Language: it, Translation: auguri
> Language: fr, Translation: bonjour,
> Language: it, Translation: buonasera
> Language: it, Translation: chiamatemi
Is it just me or these machine translations are worse than ... Google Translate?
[+] [-] beau|4 years ago|reply
The word vectors have been aligned in multiple languages. Using an approximate nearest neighbor search we are able to find the nearest vector to the input in multiple languages very quickly.
To keep the example simple, we did not try to filter the data through hand-built language dictionaries. In fact, we simply drop words in other languages that also appear in the English .vec file. Words like "ciao" appear frequently enough in otherwise English sentences that the example code drops it from Italian, and so is not shown in the results:
% curl -s "https://dl.fbaipublicfiles.com/fasttext/vectors-aligned/wiki..." | grep -n ciao 50393:ciao 0.0120 ...
One improvement would be to filter out any words that do not appear in a hand-curated dictionary instead of filtering out words that already appear in English. We decided not to show how to do this because we'd already introduced a few concepts, like aligned word vectors, approximate nearest neighbour searches, and wanted to keep the example as simple as possible.
[+] [-] toxik|4 years ago|reply
[+] [-] ampdepolymerase|4 years ago|reply
[+] [-] fulafel|4 years ago|reply
[+] [-] dukeofdoom|4 years ago|reply
Full genome of COVID-19 is available:
https://www.snapgene.com/resources/coronavirus-resources/?re...
[+] [-] nestorD|4 years ago|reply
It is probably the first thing that was done once the COVID-19 genome was made public. A quick googling gave me that summary of the results: https://www.news-medical.net/health/How-Does-the-SARS-Virus-...
[+] [-] mattkrause|4 years ago|reply
BLAST (=Basic Local Alignment Search Tool) is one common version, and the NIH'S NCBI has a variety of nice online tools here: https://blast.ncbi.nlm.nih.gov/Blast.cgi
Note that it does take a little bit of background knowledge to interpret:some motifs are just really common, others are shared.
[+] [-] PaulHoule|4 years ago|reply
The short text and that fact that your application would tolerate or celebrate catchy neologisms plays to fasttext's strengths.
[+] [-] beau|4 years ago|reply
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] shakow|4 years ago|reply
Only as an adverb, it should be "rapide" otherwise.
[+] [-] unknown|4 years ago|reply
[deleted]
[+] [-] aitk|4 years ago|reply
[+] [-] adsharma|4 years ago|reply