hack_ml's comments

hack_ml | 1 year ago | on: Mistral OCR

You will have to send one page at a time, most of this work has to be done via RAG. Adding a large context (like a whole PDF), still does not work that well in my experience.

hack_ml | 1 year ago | on: GPT-4o

I was conversing with it in Hinglish (A combination of Hindi and English) which folks in Urban India use and it was pretty on point apart from some use of esoteric hindi words but i think with right prompting we can fix that.

hack_ml | 2 years ago | on: Nemotron-4 15B large multilingual language model trained on 8T tokens

Nvidia announces Nemotron-4 15B

introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.

hack_ml | 3 years ago | on: BERTopic: The Future of Topic Modeling

Its seamless to accelerate BERTOPIC on GPU's with cuML now with the latest release. (v0.10.0)

Checkout the docs at: https://maartengr.github.io/BERTopic/faq.html#can-i-use-the-...

All you need to do is below

    from bertopic import BERTopic
    from cuml.cluster import HDBSCAN
    from cuml.manifold import UMAP

    # Create instances of GPU-accelerated UMAP and HDBSCAN
    umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0)
    hdbscan_model = HDBSCAN(min_samples=10, gen_min_span_tree=True)

    # Pass the above models to be used in BERTopic
    topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model)
    topics, probs = topic_model.fit_transform(docs)
page 1