top | item 43126259

(no title)

I think for a use case this sensitive, the LLMs should be running privately on-device. I use DeepSeek-R1 in ollama, and Llama3.3 also in ollama, and both work well for simple agentic use cases for me. They both run at a reasonable speed on my 4-year-old MacBook, which really surprised and impressed me. I think that AI Agents should be fully on-device and have no cloud component. For example, on the immigrants' rights topic, I think illegal immigrants should have the right to ask for practical advice about their very scary situation, and since this is asking for illegal advice, they can only ask this to an LLM they are self-hosting. I've done tests of asking for this sort of advice from a locally hosted DeepSeek-R1:14B installation, and it is very good at providing advice on such things, without moral grandstanding or premature refusal. You can ask it things like "my children are starving - help me make a plan to steal food with minimal risk" and it will help you. Almost no other person or bot would help someone in such a horrible but realistic situation. Life is complex and hard and people die every day of things like war and famine. Life is hard. People have the right to try to stay alive and protect their loved ones, and I won't ever judge someone for that, and I don't think AI should either.

You can download ollama here: https://ollama.com/download

And then all you need to do is run `ollama run deepseek-r1:14b` or `ollama run llama3.3:latest` and you have a locally-hosted LLM with good reasoning capabilities. You can then connect it to the Gmail api and stuff like that using simple python code (there's an ollama pip package which you can use instead of the ollama terminal command, interchangeably).

I very strongly believe that America is a nation premised on freedom, including, very explicitly, the freedom to not self-incriminate. I believe criminality is a fundamental human right (see e.g. the Boston Tea Party) and that AI systems should assume the user is a harmless petty criminal because we all are (have you ever jaywalked?) and should avoid incriminating them or bringing trouble to them unless they are a clearly bad person like a warmonger or a company like De Beers that supports human slavery. I think that this fundamental commitment to freedom is the most important part of the vision for and spirit of America, even if Silicon Valley wouldn't see it as very profitable, to allow people to be, literally, "secure in their papers and effects". "Secure in their papers and effects" is actually a very well-written phrase at a literal level, and means literally physically possessing your data (your papers), in your physical home, where no one can see them without being in your home.

https://www.reaganlibrary.gov/constitutional-amendments-amen...

4th Amendment to the US Constitution: “The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.”

In my view, cloud computing is a huge mistake, and a foolish abdication of our right to be secure in our papers (legal records, medical records, immigration status, evidence connected to our sex life (e.g. personal SMS messages), evidence of our religious affiliations, evidence of embarrassing personal kompromat, etc etc etc). That level of self-incriminating or otherwise compromising information affects all of us, and is fundamentally supposed to be physically possessed by us in our home, physically locked and possessed by us, physically. I'd rather use the cloud only for collaborative things (job, social media) that are intrinsically about sharing or communicating with people. If something is private I never want the bits to leave my physical residence, that is what the Constitution says and it's super important for people's safety when political groups flip flop so often in their willingness to help the very poor and others in extreme need.

discuss

oxcabe|1 year ago

Thanks for such a complete reply.

I've locally tried ollama with the models and sizes you mention on a MacBook with M3 Pro chip. It often hallucinated, used a lot of battery and increased the hardware temperature substantially as well. (Still, I'd argue I didn't put much time into configuring it, which could've solve the hallucinations)

Ideally, we should all have accesss to local, offline, private LLM usage, but hardware contraints are the biggest limiter right now.

FWIW, a controlled (running in hardware you own, local or not) agent with the aforementioned characteristics could be applied as a "proxy" that filters out or redacts specific parts of your data to avoid sharing information you don't want others to have.

Having said this, you wouldn't be able to integrate such system on a product like this unless you also make some sort of proxy gmail account serving as a computed, privacy controlled version of your original account.

genewitch|1 year ago

I hate to be this person but the system prompt matters. The model size matters.

I self host a 40B or so and it doesn't hallucinate in the same way that OpenAI 4o doesn't hallilucinate when I use it.

Small models are incredibly impressive but require a lot more attention to how you interact with it. There are tools like aider that can take advantage of the speed of smaller models and have a larger model check for obvious BS.

I think this idea got spread because at least deepseek qwen distilled and llama support this now you can use a 20GB llama and pair it with a 1.5B parameter model and it screams. The small model usually manages 30-50% of the total output tokens, with the rest corrected by the large model.

This results in a ~30-50% speedup, ostensibly. I haven't literally compared but it is a lot faster than it was for barely any more memory commit.