hack_ml's comments

hack_ml | 1 year ago | on: Mistral OCR

You will have to send one page at a time, most of this work has to be done via RAG. Adding a large context (like a whole PDF), still does not work that well in my experience.

hack_ml | 1 year ago | on: Zamba2-7B

The ablation studies and the dataset can be found here: https://www.zyphra.com/post/building-zyda-2

hack_ml | 1 year ago | on: cuDF – GPU DataFrame Library

There are some integrations for stuff like https://docs.rapids.ai/visualization :

HoloViews hvPlot Datashader Plotly Bokeh Seaborn Panel PyDeck cuxfilter node RAPIDS

hack_ml | 1 year ago | on: cuDF – GPU DataFrame Library

There is dask cudf which gets a lot of the way there.

https://docs.rapids.ai/api/dask-cudf/stable/

hack_ml | 1 year ago | on: GPT-4o

I was conversing with it in Hinglish (A combination of Hindi and English) which folks in Urban India use and it was pretty on point apart from some use of esoteric hindi words but i think with right prompting we can fix that.

hack_ml | 2 years ago | on: Nemotron-4 15B large multilingual language model trained on 8T tokens

Nvidia announces Nemotron-4 15B

introduce Nemotron-4 15B, a 15-billion-parameter large multilingual language model trained on 8 trillion text tokens. Nemotron-4 15B demonstrates strong performance when assessed on English, multilingual, and coding tasks: it outperforms all existing similarly-sized open models on 4 out of 7 downstream evaluation areas and achieves competitive performance to the leading open models in the remaining ones. Specifically, Nemotron-4 15B exhibits the best multilingual capabilities of all similarly-sized models, even outperforming models over four times larger and those explicitly specialized for multilingual tasks.

hack_ml | 3 years ago | on: BERTopic: The Future of Topic Modeling

Its seamless to accelerate BERTOPIC on GPU's with cuML now with the latest release. (v0.10.0)

Checkout the docs at: https://maartengr.github.io/BERTopic/faq.html#can-i-use-the-...

All you need to do is below

    from bertopic import BERTopic
    from cuml.cluster import HDBSCAN
    from cuml.manifold import UMAP

    # Create instances of GPU-accelerated UMAP and HDBSCAN
    umap_model = UMAP(n_components=5, n_neighbors=15, min_dist=0.0)
    hdbscan_model = HDBSCAN(min_samples=10, gen_min_span_tree=True)

    # Pass the above models to be used in BERTopic
    topic_model = BERTopic(umap_model=umap_model, hdbscan_model=hdbscan_model)
    topics, probs = topic_model.fit_transform(docs)

hack_ml | 4 years ago | on: Dask – A flexible library for parallel computing in Python

On the SQL front there has been some active work to make that experience better with DASK.

See dask-sql: https://dask-sql.readthedocs.io/en/latest/pages/api.html

hack_ml | 4 years ago | on: Dask – A flexible library for parallel computing in Python

You can probably use https://github.com/rapidsai/cudf/tree/main/python/dask_cudf a dask wrapper around cuDF.

hack_ml | 4 years ago | on: Intel Extension for Scikit-Learn

RAPIDS by NVIDIA has an equivalent API open source version of Sckit-Learn https://docs.rapids.ai/api/cuml/stable/ which seems to offer 100x speedup for a lot of these models.