(no title)
upwardbound2 | 1 year ago
You can download ollama here: https://ollama.com/download
And then all you need to do is run `ollama run deepseek-r1:14b` or `ollama run llama3.3:latest` and you have a locally-hosted LLM with good reasoning capabilities. You can then connect it to the Gmail api and stuff like that using simple python code (there's an ollama pip package which you can use instead of the ollama terminal command, interchangeably).
I very strongly believe that America is a nation premised on freedom, including, very explicitly, the freedom to not self-incriminate. I believe criminality is a fundamental human right (see e.g. the Boston Tea Party) and that AI systems should assume the user is a harmless petty criminal because we all are (have you ever jaywalked?) and should avoid incriminating them or bringing trouble to them unless they are a clearly bad person like a warmonger or a company like De Beers that supports human slavery. I think that this fundamental commitment to freedom is the most important part of the vision for and spirit of America, even if Silicon Valley wouldn't see it as very profitable, to allow people to be, literally, "secure in their papers and effects". "Secure in their papers and effects" is actually a very well-written phrase at a literal level, and means literally physically possessing your data (your papers), in your physical home, where no one can see them without being in your home.
https://www.reaganlibrary.gov/constitutional-amendments-amen...
4th Amendment to the US Constitution: “The right of the people to be secure in their persons, houses, papers, and effects, against unreasonable searches and seizures, shall not be violated, and no Warrants shall issue, but upon probable cause, supported by Oath or affirmation, and particularly describing the place to be searched, and the persons or things to be seized.”
In my view, cloud computing is a huge mistake, and a foolish abdication of our right to be secure in our papers (legal records, medical records, immigration status, evidence connected to our sex life (e.g. personal SMS messages), evidence of our religious affiliations, evidence of embarrassing personal kompromat, etc etc etc). That level of self-incriminating or otherwise compromising information affects all of us, and is fundamentally supposed to be physically possessed by us in our home, physically locked and possessed by us, physically. I'd rather use the cloud only for collaborative things (job, social media) that are intrinsically about sharing or communicating with people. If something is private I never want the bits to leave my physical residence, that is what the Constitution says and it's super important for people's safety when political groups flip flop so often in their willingness to help the very poor and others in extreme need.
oxcabe|1 year ago
I've locally tried ollama with the models and sizes you mention on a MacBook with M3 Pro chip. It often hallucinated, used a lot of battery and increased the hardware temperature substantially as well. (Still, I'd argue I didn't put much time into configuring it, which could've solve the hallucinations)
Ideally, we should all have accesss to local, offline, private LLM usage, but hardware contraints are the biggest limiter right now.
FWIW, a controlled (running in hardware you own, local or not) agent with the aforementioned characteristics could be applied as a "proxy" that filters out or redacts specific parts of your data to avoid sharing information you don't want others to have.
Having said this, you wouldn't be able to integrate such system on a product like this unless you also make some sort of proxy gmail account serving as a computed, privacy controlled version of your original account.
genewitch|1 year ago
I self host a 40B or so and it doesn't hallucinate in the same way that OpenAI 4o doesn't hallilucinate when I use it.
Small models are incredibly impressive but require a lot more attention to how you interact with it. There are tools like aider that can take advantage of the speed of smaller models and have a larger model check for obvious BS.
I think this idea got spread because at least deepseek qwen distilled and llama support this now you can use a 20GB llama and pair it with a 1.5B parameter model and it screams. The small model usually manages 30-50% of the total output tokens, with the rest corrected by the large model.
This results in a ~30-50% speedup, ostensibly. I haven't literally compared but it is a lot faster than it was for barely any more memory commit.