top | item 36032081

Show HN: Psychic - An open-source integration platform for unstructured data

122 points| jasonwcfan | 2 years ago |github.com

My cofounder and I used to work at Robinhood where we shipped the company’s first OAuth integrations, so we know a lot about how data moves between companies.

For example, we know that the pain of building new API integrations scales with the level of fragmentation and number of competing "standards". In the current meta, we see this pain with a lot of AI startups who invariably need to connect to their customers data, but have to support 50+ integrations before they even scale to 50+ customers.

This is the process for an AI startup to add a new integration for a customer:

- Pore over the API docs for each source application and write a connector for each

- Play email tag to find the right stakeholders and get them to share sensitive API keys, or give them an OAuth app. It can take 6+ weeks for some platforms to review new OAuth apps

- Normalize data that arrives in a different formats from each source (HTML, XML, text dumps, 3 different flavors of markdown, JSON, etc)

- Figure out what data should be vectorized, what should be stored as SQL, and what should be discarded

- Detect when data has been updated and synchronize it

- Monitor when pipelines break so data doesn’t go stale

This is a LOT of work for something that doesn’t move the needle on product quality.

That’s why we built Psychic.dev to be the fastest and most secure way for startups to connect to their customer’s data. You integrate once with our universal APIs and get N integrations with CRMs, knowledge bases, ticketing systems and more with no incremental engineering effort.

We abstract away the quirks of each data source into Document and Conversation data models, and try to find a good balance to allow for deep integrations while maintaining broad utility. Since it’s open source, we encourage founders to fork and extend our data models to fit their needs as they evolve, even if it means migrating off our paid version.

To see an example in action, check out our demo repo here: https://github.com/psychic-api/psychic-langchain-tutorial/

We are also open source and open to contributions, learn more at docs.psychic.dev or by emailing us at [email protected]!

25 comments

[+] jw1224|2 years ago|reply

This looks like a promising idea, and potentially solves a problem I’ve faced recently.

It’s been a challenge getting my SaaS app connected to fragmented APIs belonging to many of my customers, each with their own use cases.

One of the biggest hurdles I faced was Asana’s API. A customer wanted us to hook into an Asana webhook: when a task was added to their project, they needed to push the data to their account on our platform (and vice-versa).

But because Asana is so “flexible” (ha!), all the field names in their API responses were UUIDs. It was a total nightmare to figure out which key/values were the ones we wanted. I’m not sure if/how Psychic can figure this out.

Secondly, maybe it’s just how your landing page is phrased — but this feels like “IFTTT for AI tooling”, rather than “IFTTT powered by AI”.

I see a lot more commercial value in the latter direction. To most prospective customers, your headline “Easy to set up” doesn’t mean a React hook and Python SDK. Just give us a REST API! :)

[+] jasonwcfan|2 years ago|reply

IFTTT for AI tooling is definitely more accurate! It's not powered by AI... yet. Zapier came out with that recently: https://twitter.com/zapier/status/1658457320849018882

Definitely worth exploring but as you've experienced there are enough problems with extracting and normalizing data across the long tail of SaaS apps for us to get to reasonable scale.

re: the Asana API issue, that's both hilarious and sad. We do plan to build a transformation layer so that all data is reshaped to a consistent schema before sending it off to customers (hence the "Universal" aspect of the API). These quirks of each data source are exactly the kinds of things we want to solve for so our users don't need to worry about it.

[+] jasonwcfan|2 years ago|reply

Hey I've been thinking a lot about your comment. Would you be open to connecting non-anonymously? Would love to pick your brain on API integrations. If so you can email me at [email protected] so you don't have to doxx yourself.

[+] 9dev|2 years ago|reply

I have just built a Notion integration that pulls pages into our statically built API documentation website, and it was, frankly, horrible. While the end result works (the team can write docs in the tool they know, the site is built and released from the structure there automatically), it was a lot of pain to even discern children from their parent pages, parse attributes or let alone get databases right.

Considering I’ll need to get other data in there soon, probably, I’m in the market for Psychic. The question I have, though, is: can you really reconcile the Schema of several apps into one, without settling for the smallest common denominator? What do you do about platforms like Notion, that don’t even provide webhooks? We settled on polling, but obviously that won’t scale.

[+] jasonwcfan|2 years ago|reply

Reconciling schemas -> this will be hard. We're starting with just two data models (Documents and Conversations) that are relatively universal, but there's no way to avoid a lossy transformation from Notion because things like tables and embeds aren't neatly captured as a Document without making our data models just as complex. I suspect LLMs can help finding a good balance between generalization/depth since they're very good at automating work that typically requires a lot of customization.

Data syncs -> If the source doesn't offer webhooks, we just poll daily, do a diff on our side, and send the updated data to our customer. I'm not aware of any way to avoid polling when webhooks aren't available, but we plan to do the polling ourselves so we can provide a webhook like experience for customers.

[+] babyshake|2 years ago|reply

The reason to use the Pro hosted plan is for support and the convenience of not needing to self-host? Or is there actual functionality you don't get by self-hosting?

[+] jasonwcfan|2 years ago|reply

Correct, the benefits of the paid version are support and convenience. 100% of our code is in the open source repo.

[+] michaelmior|2 years ago|reply

Looks interesting! I tried to sign up to the cloud service with GitHub and got an error message that the integration wasn't enabled.

[+] ayanb9440|2 years ago|reply

Thanks for the heads up! We'll have that fixed ASAP but in the meantime signing up with Google or email/password should work

[+] jekude|2 years ago|reply

Congrats on the launch! I am curious how you see apps evolving to provide natural language interfaces on top of existing APIs. Also, do you plan on strictly remaining the data layer (between a startup and its API integrations) or do you plan on dogfooding your platform for a particular killer use case?

[+] jasonwcfan|2 years ago|reply

Personally, I think it's going to happen but not for nearly as many applications as you might think. Point and click is still king for any use cases that requires precision (e.g. booking an uber)

We plan to focus strictly on the data layer, helping companies connect to their data sources through a universal API. We already are dogfooding our platform for some customers! By far the most popular use cases are customer support automation and search through workplace apps.

It's facinating that the build/buy decision has flipped for a lot of companies. As long as they have an engineering team, a lot of companies are trying to build out their own AI capabilities in house, I'm guessing because no one wants to miss the boat.

[+] madisonmay|2 years ago|reply

Why the decision to license as GPL?

[+] jasonwcfan|2 years ago|reply

We specifically chose AGPL-3 because we wanted it to be permissive, but we didn't want others to fork our project, take it closed source, and charge for it without adding back anything of value.

We also don't expect companies to customize the functionality, just to self-host it or use the cloud version, or use it for personal projects.

[+] ipv4dhcp|2 years ago|reply

what is your concern with gpl? you can still commerialize apps that use it as long as you use the normal interfaces it exposes.

[+] 2-718-281-828|2 years ago|reply

why would you call it psychic? stupid name, uninformative, difficult to google.

[+] itronitron|2 years ago|reply

I think the name fits perfectly, since the intent is for companies to use it with LLMs then the quality of the results is the same as you would get from asking a psychic.