furiousteabag's comments

furiousteabag | 5 months ago | on: Qwen3-Omni: Native Omni AI model for text, image and video

same, did you figure it out?

furiousteabag | 1 year ago | on: Show HN: Simtown: A 2D Role-Playing Game Where Characters Talk, Move, and Act

Another thing we are trying to understand is whether the 2D element adds value to the simulation. A simpler option would be a pure text/chat interface. Still, the hypothesis here is that it is easier to comprehend what's going on in the environment with an actual 2D world and characters, and it might be more immersive compared to just a text interface.

furiousteabag | 1 year ago | on: Launch HN: Silurian (YC S24) – Simulate the Earth

Hey hey! We tried Clay v1 with 768 embeddings size using your tutorials. We then split NAIP SF to chips and indexed them. Afterwards, we performed image-to-image similarity search like in your explorer.

We tried to search for bridges, beaches, tennis courts, etc. It worked, but it didn't work well. The top of the ranking was filled with unrelated objects. We found that similarity scores are stacked together too much (similarity values are between 0.91 and 0.92 with 4 digit difference, ~200k tiles), so the encoder made very little difference between objects.

I believe that Clay can be used with additional fine-tuning for classification and segmentation, but standalone embeddings are pretty poor.

Check this: https://github.com/wangzhecheng/SkyScript. It is a dataset of OSM tags and satellite images. CLIP fine-tuned on that gives good embeddings for text-to-image search as well as image-to-image.

furiousteabag | 1 year ago | on: Launch HN: Silurian (YC S24) – Simulate the Earth

Curious to see what other things you will simulate in the future!

Shameless plug: recently we've built a demo that allows you to search for objects in San Francisco using natural language. You can look for things like Tesla cars, dry patches, boats, and more. Link: https://demo.bluesight.ai/

We've tried using Clay embeddings but we quickly found out that they perform poorly for similarity search compared to embeddings produced by CLIP fine tuned on OSM captions (SkyScript).

furiousteabag | 1 year ago | on: Show HN: Search San Francisco using natural language

Thanks for sharing Brooklyn text demo. Haven't seen it!

Captioning images using VLM would definitely help as an additional conditional feature. Maybe it even would be enough to use only embeddings of captions to do search!

We chose aerial satellite instead of street view because we plan to apply the same technologies where street view is not available, e.g. crop fields or forests. Another thing is that we plan to monitor areas that change frequently and street view data is not enough to keep up. But the idea is great! Although your query "palace of fine arts" is not extremely exciting because it is searchable via Google Maps :D

"USF" by itself doesn't work, "USF word" pointed me where needed xD

"beach" and "picnic tables" indeed doesn't work in object mode, but works great in "big" mode, probably because they needs some context around themselves

"lots of people" didn't work, "a crowd of people" seems to work. Interesting, that almost the same (semantically) queries produce very different results!

furiousteabag | 2 years ago | on: Pandoc

I really like using pandoc as a build system [1] for my personal website to convert .md to .html. I can use templates, automatically generate a table of content and run some lua scripts to get the desired result, such as clickable headers.

[1]: https://github.com/furiousteabag/asmirnov.xyz/blob/master/bu...

furiousteabag | 2 years ago | on: Show HN: I made a GPU VRAM calculator for transformer-based models

You are correct, training sorely in fp16/bf16 can lead to imprecise weight updates or even gradients turning to zero. Because of that, mixed precision is used. In mixed precision training, we keep a copy of the weights in fp32 (master model) and the training loop looks like this: compute the output with the fp16 model, then the loss -> back-propagate the gradients in half-precision -> copy the gradients in fp32 precision -> do the update on the master model (in fp32 precision) -> copy the master model in the fp16 model. We also do loss scaling which means multiplying the output of the loss function by some scalar number before backprop (necessary in fp16 but not required in bf16).

Check out the fastai docs for more details: https://docs.fast.ai/callback.fp16.html

furiousteabag | 2 years ago | on: Show HN: I made a GPU VRAM calculator for transformer-based models

Mixed precision is a default method to pretrain and full fine tune right now. It is especially good in transformers, because they have memory bottleneck in activations (outputs of intermediate layers stored for backprop), and running forward pass in fp16/bf16 reduces VRAM by almost half (speeds up forward pass as well).

furiousteabag | 2 years ago | on: Show HN: I made a GPU VRAM calculator for transformer-based models

There is no option to select quantized version yet. Will work on that!

furiousteabag | 2 years ago | on: Show HN: I made a GPU VRAM calculator for transformer-based models

By default, SGD w momentum is enabled as optimizer. You may try selecting Adam and it will list second moments as well.

furiousteabag | 2 years ago | on: Learnings from fine-tuning LLM on my Telegram messages

It doesn't really matter if the app claims to use E2E when it actually discloses message content [0] [1]. WhatsApp is also filled with backdoors [2].

[0] https://therecord.media/fbi-document-shows-what-data-can-be-...

[1] https://www.rollingstone.com/politics/politics-features/what...

[2] https://telegra.ph/Why-Using-WhatsApp-Is-Dangerous-01-30-4

furiousteabag | 2 years ago | on: Learnings from fine-tuning LLM on my Telegram messages

This may sound stupid, but from my perspective renting random VMs on vast.ai is safe in general and might be safer than using traditional cloud providers in particular. Consider this: on your VM a new image starts several times a day, each time with a new volume. It downloads tens of GBs of data and weights for training. Once training is done, everything gets cleaned up and the process starts again for a new tenant. This constant cycle makes it kind of difficult to track and extract any meaningful data from it.

furiousteabag | 2 years ago | on: Learnings from fine-tuning LLM on my Telegram messages

In IM, there's a balance between total privacy and widespread use. Apps like Signal offer high privacy but have fewer users, while popular ones like WhatsApp are less secure. Telegram lies somewhere in between, offering a level of privacy that most users find comfortable. It's widely used and there haven't been significant incidents of legal issues arising from its messages. Ultimately, it boils down to whom you trust and which app has more of your contacts.

furiousteabag | 2 years ago | on: Learnings from fine-tuning LLM on my Telegram messages

It's true that fine-tuning models on personal messages could be simplified, but many, like myself, can't use third-party services due to sensitive data in our messages. I'm curious if others face this trust issue and how it might be resolved.

furiousteabag | 2 years ago | on: Learnings from fine-tuning LLM on my Telegram messages

I agree that usually 'more is more' for training LLMs. However, for fine-tuning with limited data, it seems crucial to focus the task as much as possible. Since the model still encounters these masked sentences in the data, it effectively learns to respond based on the speaker's name. So, complicating the task might not be necessary. Also, I'm concerned about interpreting the loss value. If the model quickly reduces loss by picking up predictable phrases, it's hard to tell if it's genuinely learning or just echoing these predictable elements.

furiousteabag | 2 years ago | on: Learnings from fine-tuning LLM on my Telegram messages

First I download the weights of the base pre-trained model to the VM instance. Then I upload my data there. Afterward, I fine-tune either LoRA or full and when training finishes, from the VM instance I download the adapters in case of LoRA and full weights in case of full fine-tune and run inference on a way less expensive instance (usually 3090).

furiousteabag | 2 years ago | on: Learnings from fine-tuning LLM on my Telegram messages

I think incorporating knowledge from other apps is a good next step because the model definitely lacks the context of what is going on right now. The nature of instant messaging is that most of the messages are about what is happening right now or what will happen in the near future, so past communication history does not help much.

furiousteabag | 2 years ago | on: Learnings from fine-tuning LLM on my Telegram messages

I've been using vast.ai for a very long time. It is like a GPU marketplace, where people rent and lease GPUs. There are a lot of VMs with 4090, and beasts like 8xA100 80GB are also available from time to time.