top | item 35510516

Personal Concierge Using OpenAI's ChatGPT via Telegram and Voice Messages

252 points| rwilinski | 2 years ago |github.com

100 comments

order
[+] sheepscreek|2 years ago|reply
Great work! Thanks for taking the time to build this and open sourcing it. For me, it scratches on a very real itch. I’ve toyed with the idea of a personal AI assistant to act as a second brain, and help me prioritize and remember things with human intuition (“you can’t afford to delay X, do X today and maybe reach out to Z to set the expectations for delaying Y?”).

To me, the greatest strength of LLMs is not their knowledge (which is prone to hallucination), but their ability to analyze ambiguous requests with ease and develop a sane action plan - much like a competent human.

One a side note: wouldn’t it be significantly cheaper and as effective to use ChatGPT 3.5 by default, and reserve GPT 4 for special tasks with explicit instruction (“Use GPT 4 to…”).

For most chats, GPT 4 would be incredibly wasteful (read: expensive).

Also - it would be very cool to experiment the use of GPT 3.5 and GPT 4 in the same conversation! GPT 3.5 could leverage the analysis of GPT 4 and act as the primary communication “chatbot” interface for addressing incremental requests.

[+] jbellis|2 years ago|reply
GPT4 is so much better than 3.5 at virtually everything that I don't think it's worth trying to figure out which ones 3.5 is almost adequate enough for.

Also, the pricing is per token, so even with 4 it is close to negligible unless you are loading in a lot of context or your conversation gets very long.

[+] ratg13|2 years ago|reply
>a personal AI assistant to act as a second brain, and help me prioritize and remember things with human intuition

Everyone wants this, but this is not the product.

The current AI offerings are information in --> information out.

It is not meant to keep state long term, and it is not meant to be your friend. It is meant to answer questions with the information available to it.

You can even see in the example screenshot they showcase the fact that it is not designed to be asked follow-up questions.

[+] tudorw|2 years ago|reply
good point, I'm doing this with chatGPT, I use a long conversation with 3.5 to help me write prompts for 4, it's fun, when I hit the rate cap on 4 I'll go back to 3.5 with what 4 came up with, then converse on the topic until my 4 cap lifts, combine that with being able to ask bing for things like links and current information and dalle for image based visualisations makes for an intriguing combination, bard gets a look in to but so far seems a little shy compared to 4 or bing.
[+] umaar|2 years ago|reply
Made something similar recently, but for WhatsApp: https://chatbling.net/

What behaviour would users prefer when uploading a voice message, a) the voice message is transcribed, so speech to text? Or b) the voice message is treated as a query, so you receive a text answer to your voice query?

I've done a) for now as mobile devices already let you type with your voice.

[+] swores|2 years ago|reply
I'd quite like a twilio script I could host that enables voice to voice with ChatGPT over a phone call, but for messaging apps (I'm gonna to try yours, though would prefer Signal) I'd personally prefer to stick with typing and use Apple's transcription (the default microphone on iOS keyboard) for any voice stuff - still wanting text back.

This is (in addition to the fact that Apple's works pretty well for me) mostly because that way I get to see the words appear as I'm speaking, and can fix any problems in real-time rather than waiting until I've finished leaving a voice note to find out it messed up. Bing AI chat, for example, trying to use their microphone button just leads to frustration as it regularly fails to understand me. But maybe Whisper is so good that I'd hardly ever need to care about errors?

I do suspect I'm an outlier in terms of how I use dictation, checking as I go - at least based on family members, they seem to either speak a sentence then look at it, or speak and then send without looking - so for them, off-device transcription would probably be welcome as long as it even slightly improves accuracy rates.

[+] umaar|2 years ago|reply
I see my server has restarted a few times! I imagine it's folks here since I haven't shared Chat Bling elsewhere yet. Sorry to anyone who started generating images, but haven't received a response. The 'jobs' for images generations are stored entirely within memory, so a server restart will lose all of that.

Going forward, I'll explore storing image jobs in redis or something, which will be more resilient to server crashes.

As for conversation history, I'll continue to keep that in memory for now (messages are evicted after a short time period, or if messages consume too many OpenAI tokens) - even that's lost during a server restart/crash. Feels like quite a big decision to store sensitive chat history in a persistent database, from a privacy standpoint.

[+] jaggs|2 years ago|reply
This is very cool. I tested it with a quick reminder request and it seemed to work. I'm a bit terrified by the privacy issue though. Combining OpenAI with WhatsApp seems like a marriage made in hell.

I guess the only solution will be to move to local bots and models on the phone which will interface out only when needed.

[+] djohnston|2 years ago|reply
dude how did you get Meta to approve your WA Business? I couldn't get verified after like two weeks of trying and gave up :(
[+] hombre_fatal|2 years ago|reply
This is the new hello world.

https://github.com/danneu/telegram-chatgpt-bot

https://t.me/god_in_a_bot (demo bot)

I tried building this for WhatsApp but Twilio is weirdly expensive. I don't even think Twilio is cheap for sending 2FA tokens.

[+] moralestapia|2 years ago|reply
I'm also on Twilio and yes it is expensive, a longish call (10 mins) comes to about $1.

They charge you for:

* Time spent using the "Twilio Client" (whatever that means)

* Inbound call time

* Transcription of each audio chunk, billing them at a minimum of 15s per function call

* Every time you use their text-to-speech functions (not even that is free)

[+] aivisol|2 years ago|reply
Nice work. I have a question though. The example chat window you show has an interaction where AI explains that it cannot remember the previous question. Isn’t Langchain there for exact that purpose or am I missing something?
[+] andag|2 years ago|reply
Can you update the readme with some info privacy wise. Some info on who I'm sharing my data with?

Openai - fair enough, already doing that a fair amount .

[+] pantulis|2 years ago|reply
It seems that you can self host the thing. Apart from that, it seems that you would be sharing info obviously with OpenAI (both GPT and Whisper), Telegram and Google.
[+] aftergibson|2 years ago|reply
I'm guessing on step 3, the meant touch .env, not mkdir.

  mkdir .env and fill the following:

    TELEGRAM_TOKEN=
    OPENAI_API_KEY=
    PLAY_HT_SECRET_KEY=
    PLAY_HT_USER_ID=
[+] marc|2 years ago|reply
For people who are looking for a hosted solution: https://t.me/marcbot

Being able to use voice messages as an interface makes a huge difference. I can just ramble on, sharing my thoughts, and then have GPT turn it into something sensible.

Great for brainstorming, getting your thoughts out on "paper", etc.

[+] mkw5053|2 years ago|reply
I’ve been heavily using chatgpt (gpt 4) on my honeymoon/baby moon/vacation in Spain. Everything from itineraries to asking art history questions in museums. I’ve mainly been using the voice input on my iPhone for chatgpt on a mobile browser and I can’t help but think how useful better voice support will be.
[+] tikkun|2 years ago|reply
I've got an iPhone app in testflight beta that has speech to text and text to speech. Basically a nicer iPhone app for GPT-4, I tried most of the existing ones and none were particularly nice UX.

Pricing model for now is you just pay exactly what we pay (we just pass on the API costs plus Apple's 30%, no markup). We could add a use your own API key thing too to avoid Apple's 30%.

If you'd like access, email in profile

[+] golergka|2 years ago|reply
Did you have access to plugins?
[+] MetaWhirledPeas|2 years ago|reply
Not as cool, but there for the lazy: install the Bing app on your phone (I guess you need to be accepted into the beta first?). I use it as a slow-thinking alternative to Google Assistant that usually gives much better answers.
[+] throwaway2203|2 years ago|reply
The Bing app isn't as responsive as ChatGPT. I asked it a slightly question about my taxes and it "binged" something weird and gave me a non-answer generic response.
[+] rapsey|2 years ago|reply
Are there any offline text to speech options that supports a wide variety of languages?
[+] floitsch|2 years ago|reply
Did something similar (without voice) that runs on an esp32. This way I don't need any server or keep my desktop machine running.

Supports Telegram and Discord.

https://github.com/floitsch/ai-projects/tree/main/chat

[+] sheepscreek|2 years ago|reply
OP integrated LangChan and the ability to Google results (and a neat way to integrate more agents). That’s the main draw for me in their implementation.
[+] Heloseaa|2 years ago|reply
Recently did the same in a lightweight alternative with python: https://github.com/clemsau/telegram-gpt

Looking to make it accessible, cheap and as lean as possible. I'd love to hear potential features ideas.

[+] Hadriel|2 years ago|reply
can you choose to use gpt4?
[+] Fauntleroy|2 years ago|reply
Using a cloud-hosted AI with a Terms-of-Service as an assistant is a recipe for disaster in the future. I can't wait for the future where everyone is reliant on a corporate spy for everything they do.
[+] yewenjie|2 years ago|reply
Play.ht's API is like $99/month. Is it possible to use any other TTS service? (Also, does play.ht just use Azure underneath?)
[+] 55555|2 years ago|reply
I want something like this but I don't want to have to host it myself. Is there any I can simply sign up to?
[+] jcims|2 years ago|reply
Does anyone have a suggestion for doing something similar with SMS? I've been tinkering with it but it seems that there are some regulations that will require me to have a commercial organization registered to allow SMS to 10 digit North America numbers.
[+] christiangenco|2 years ago|reply
I don't think you need a commercial organization registered to get a number on Twilio.

I hooked up an old Twilio number I bought a while ago to ChatGPT for an ADHD encouragement bot last week: https://attaboy.ai

Now I can message it via. WhatsApp or Telegram and it even remembers chat history (by storing the last ~20 messages in Firebase).

[+] mwlp|2 years ago|reply
I added gpt to a bot of mine for Telegram and Discord a few weeks ago. I'm constantly amazed at how the littlest of things can spawn so many new opportunities for inside jokes and meta humor.
[+] yosito|2 years ago|reply
Great, so instead of sharing all your data with one party, you can share it with 3+ parties
[+] MH15|2 years ago|reply
This is being downvoted but it's an important thing to consider. As we move to doing more with these systems we're going to start seeing restrictions on which AI tools we can use at work/school/home.
[+] soederpop|2 years ago|reply
i was able to do something similar with Siri using the shortcuts app. You can have siri transcribe your text and post it to an endpoint and then read the response back to you.

is it possible to use gpt-4 with langchain?

[+] hankman86|2 years ago|reply
It amazes me to no end that some people would feed private conversations and other sensitive data into an experimental chat bot. Don’t these people not know that ChatGPT it not a mature technology, that does not reliably isolate sessions and may even permanently ingest user data for training purposes?

GPT and other LLMs are currently integrated into countless products and hobbyist projects. Expect an avalanche of lawsuits on the grounds of LLMs being structurally incompatible with notorious privacy laws like the GDPR. For instance, how would they implement the GDPR’s “right to be forgotten”? Untrain the model?

[+] Cyphase|2 years ago|reply
(I may have misunderstood your comment to some extent, but I'm going to send this reply anyway even if just to clarify for anyone else who might misunderstand.)

---

I agree with "be careful what you send to the chat bot", but let's clarify some things in case you or someone else reading your comment is misunderstanding.

LLMs aren't immature AI brains that "may even permanently ingest user data for training purposes". They're just models, which are represented by an architecture described in readable source code, and weights derived from training.

There is a very clear delineation between inference and training. Models are static when being used for inference. You don't need to "untrain" the model after you ask it something; you never trained it in the first place. Running inference does not change the trained weights.

If you're talking about OpenAI specifically saving ChatGPT data for later training purposes, they absolutely are doing that; they aren't hiding it. But that's a purposeful "let's take this data and use it for training", not "oh no, our immature tech accidentally ingested prompt data, how do we untrain it"?

[+] weikju|2 years ago|reply
As my friends would say "all my data is out there already, so it doesn't matter anymore"... this is what we're dealing with here.
[+] avereveard|2 years ago|reply
on one hand yes there is a real threat of these company misusing personal data, espèecially if you use the public side of the API (i.e. not the one from within azure, which has a separate set of privacy guarantees as far as I can understand)

on the other hand this is a guardrail like the many others that GPT already has, if I search my name I get a 'not notable enough' answer already

[+] elif|2 years ago|reply
for commercial snoops you have very strong legal protections over your data if you seek to enforce it legally.

for government snoops you don't have any privacy anyway.