top | item 38040591

Show HN: Biblos – Semantic Bible Embedded Vector Search and Claude LLM

136 points| j-b | 2 years ago |github.com

Introducing Biblos, a simple tool for semantic search and summarization of Bible passages. Leveraging Chroma for vector search with BAAI BGE embeddings, semantically find related verses across the Bible. The tool employs Anthropic's Claude LLM model for generating high-quality summaries of retrieved passages, contextualizing your search topic. Built on a Retrieval Augmented Generation (RAG) architecture, the app implements a simple Streamlit Web UI using Python. Deployed using render.com, the app is available at https://biblos.app

Note: Search by just topic/keywords, e.g. "Kingdom of Heaven", for broader results!

84 comments

order
[+] cout|2 years ago|reply
Playing with this a bit more, and it is very cool!

One thing I like is that it provides the source text, so you can verify whether the summary is accurate. Other engines just give you an answer, leaving you to verify accuracy on your own as a separate step. But I wonder which translation it uses?

Wondering if it has a bias toward any particular theology, I tried some controversial terms.

The program gave an accurate defense of the five points of calvinism, but when I asked about dispensationalism, the verses it gave were less relevant than I hoped. On the other hand, it did give relevant results for Arminianism. On predestination, however, it missed Romans 9 but instead returned passages from Ecclesiastes and Galatians 4.

Concerning Roman Catholic theology, it did not seem to know what the immaculate conception is, and instead wandered aimlessly. It did know what purgatory is, but I expected to see 1 Cor. 13 and instead it returned passages from Job and Ecclesiastes.

Concerning Orthodox theology, it did not seem to know what the word filioque means. This isn't a word found in the bible, but neither is calvinism nor trinity, which it did know. It also knew iconostasis, though I am not qualified to judge whether it explained it accurately.

I was impressed that it knows what a gift economy is; I don't think this is a term I would expect to see in a typical commentary.

It did not feel comfortable commenting on facebook, but when I asked about the internet, the summary explained that we should only be judged by God and not our friends, and also warned against adulturous women. It was more positive about an information superhighway, returning results about sharing knowledge and being honest.

A bug: if I click Summarize before the search is complete, I get a different response than if I wait for the runner to stop running and then click Summarize.

[+] j-b|2 years ago|reply
Interesting insights, currently using the WEB translation, and plan to expand further. Thanks for the bug report!
[+] dragonwriter|2 years ago|reply
> Concerning Roman Catholic theology, it did not seem to know what the immaculate conception is, and instead wandered aimlessly.

Catholics don't believe sola scriptura, which is a fairly recent Protestant doctrine, instead viewing scripture and sacred tradition as pillars of faith, and the Immaculate Conception is a dogma originating in sacred tradition, not scipture.

So its not surprising that a textual search of the Bible (even if using a text that Catholics would use, which I don't think this does) would whiff hard on this.

[+] deckar01|2 years ago|reply
It was interesting talking to my father, a former Christian minister, about AI. ChatGPT interactions had instilled some misconceptions and it was difficult to convince him that its responses were just cleverly weighted randomness. It produced compelling theological debate. I told him not to trust any chat bot unless it could cite verifiable sources, and when prompted ChatGPT could only fabricate. Trust eroded.

In consolation I sat up a vector index of The Works of Josephus (his interest at the time) and a StableBeluga chatbot. It answered questions fairly well, but most importantly supplied the references that were used as context. In the end there was still just too much cultural and historical context missing to be a useful alternative to scholarly analysis.

[+] actionfromafar|2 years ago|reply
On the other hand, this is exactly the kind of application I think AI/LLM/GPT-whatever could prove extremely useful in.

A model could be retrained and finetuned and corrected and double-checked on a limited corpus, until it would be able to discuss and explain something very very well in a particular subject.

Such things could be used in education, I imagine. Like an extra, never tiring teacher.

[+] viraptor|2 years ago|reply
> In the end there was still just too much cultural and historical context missing

This was my first thought when seeing the project. How well do we expect LLMs to work for text where words often don't have their normal meaning, half the things shouldn't be taken literally, and we have lots of contradictions? This feels like it should have way more warnings than ChatGPT itself.

[+] Minor49er|2 years ago|reply
This is a cool project. I have a few suggestions that would really make this into a powerful tool:

Add the verse numbers in the results and turn them into links so that the full passages can be read

Include other translations, especially the KJV and Greek interlinear, since those are still widely used and referenced. Different churches have particular reasons for using the versions that they've chosen, and cross-examining translations is highly important in Bible study

Include optional commentaries as search sources since those can lend a lot of insight into different passages, even serving as cross-references to other related passages

[+] j-b|2 years ago|reply
My first release used the KJV! The vector DB includes metadata (book, chapter, verse), working towards optionally rendering those. I like the idea of including multiple translations (with side by side comparisons). I'm limited to public domain texts for storage but I can query the ESV API after retrieval. Good idea about commentaries. Thanks for your input!
[+] civilitty|2 years ago|reply
After playing around with it for a few minutes, all of the results scored between 0.5 and 0.8 even when using nonsense queries like "interdimensional cable" and "eat my plumbus" which is a sign that the model you're using for embeddings is very poorly tuned for cosine similarity for your use case.

A little fine tuning would probably go a long way since the embeddings are likely trained mostly on a nonreligious corpus in the modern tongue. It might also be overfitted so trying smaller models might also help.

[+] j-b|2 years ago|reply
Thanks for the feedback - this particular model is BAAI/bge-large-en-v1.5. What alternative embeddings would you recommend?
[+] rolisz|2 years ago|reply
That's a known issue with the BGE embeddings, the authors warn about that in the model card. Their recommendation is to choose more carefully the thresholds for similarity (which will be much higher than for other embeddings)
[+] swatcoder|2 years ago|reply
Interesting concept/research-project, but the results to just about every query I tried seem inaccurate and perplexing. Assuming the "similarity score" is meaningful, you may want to raise the cutoff or add an indicator (different color, fade, etc) for passages that get surfaced with a low match.
[+] j-b|2 years ago|reply
I like your idea about the color indicators. Have you tried searching a topic, e.g. "Kingdom of Heaven", rather than the default "What did Jesus say about .." prompt? Depending on the context, it may significantly improve the results.
[+] otabdeveloper4|2 years ago|reply
I apologize, upon reflection I do not feel comfortable summarizing or interpreting passages in this manner.

It's censored. Looks like you need to build your own LLM unless you want some developer's thinly veiled opinion.

[+] msylvest|2 years ago|reply
Some years ago I was wondering what the words 'There is a balm in Gilead' means. Spent hours googling, both in English and Danish (my native lang). Found Jer 8:22 "Is there no balm in Gilead; is there no physician there?" and inferred that Jeremiah must have associated Gilead's balm with (glorified?) healing processes.

So as a test I asked this service 'what is balm in Gilead' and it returned 4 other Bible sentences. Pressed 'Summarize', which unfolded comments on the 4 sentences and a summary of

'Overall, these passages present Gilead as a contested but fertile region east of the Jordan river. It was prized territory that was given to several Israelite tribes and seen as a divine provision. The name "Gilead" means "hill of testimony" referring to its choice lands. So the metaphor of "balm in Gilead" signifies the healing, restoration, and provision God can bring even in difficult times.'

My key observations:

1) The overall summary highly matches my own interpretations

2) Jer 8:22 was not referred - possibly because it does not define the concept, it just refers to its meaning

3) Inferring the summary from the 4 sentences is not easy but apparently AI can do so

I have a question on the generation of the overall summary: Is it based on on the 4 sentences only or does it include other biblical text behind the scenes?

[+] seanhunter|2 years ago|reply
I'm really loving this concept. I'm going to finetune an LLM based on a bunch of scripture, collate as many hallucinations as I can and go start my own religion.
[+] valyagolev|2 years ago|reply
I asked this one about homosexuality, it didn't find the most glaring passages from Leviticus.

This is a common thing for vector similarity search. I wonder if there's a solution already. I thought about giving the query to an LLM to reformulate in the database-relevant way before embedding it.

[+] linuxdude314|2 years ago|reply
It’s sort of amusing to me how you feel your analysis is more correct than sentence-transformers or whichever embedding algorithm was used.

I think to most people it’s pretty obvious you are trying to make the algorithm fit your bias/preconceived ideas.

[+] dragonwriter|2 years ago|reply
The usually cited “most glaring” passages in Leviticus (Lev 18:22, Lev 20:13), read strictly literally, don’t condemn homosexuality per se, but both partners in a male homosexual act where one of them also engages heterosexual sex.

Condemnation of homosexuality is a popular gloss or rationalization of this, wierdly common among literalists, but, I mean, Leviticus condemns mixing fibers, and has plenty of rules that apply to only one gender, I don't see why we shouldn't take its condemnation of specifically men mixing gay and straight sex literally, too. (And maybe also take Acts 15 literally as to which part of the ancient Mosaic law applies to non-Jewish Christians, and not worry about that rule however we gloss it, since it concerns neither pollution from idols, unlawful marriage, blood, or the meat of strangled animals.)

[+] esafak|2 years ago|reply
Can someone link to some relevant passages?
[+] notrsponsible|2 years ago|reply
Did you ask it whether God created pregnancy by rpw and inquest?

Did God create "the products of inqest will suffer for their parents' sins"? Is God then Just or Benevolent?

Did God create a world of suffering after creating heaven?

Did God will that we would all be products of the inquest of Adam and Eve? Why did Cain harm Abel, and why was the third child fine?

Did God create Heaven? Did God create Hell? Did God create "taking babies from their crying mothers actually levels them up out of the world of suffering"; did God create death and suffering?

How could we give due process to the accused 2000+ years ago, and why don't religious text specify equal due process (or even hand-washing before delivering babies)?

[+] j0e1|2 years ago|reply
Very interesting and thanks for sharing! I am involved with a project involving a couple Bible Translation orgs to create a service like this but built in a more backend-agnostic fashion (e.g. choice of vector DB, LLM, etc.). We have a prototype and currently planning out next steps. Let me know if you would like to collaborate (find my email ID on my HN profile).
[+] j-b|2 years ago|reply
Sounds interesting! Email sent. (Just added my email to HN profile).
[+] electic|2 years ago|reply
Very cool project, AI is definitely going to transform religion and make it far more relatable and understandable. If anyone is interested, we released Noah's Bible that has full ChatGPT integration. You can click on any verse and get a full summary and chat about any verse.

One thing we also added is imagery, generated by AI, which gives the Bible imagery that most text based bibles do not have.

iOS: https://apps.apple.com/us/app/noahs-bible-ai-powered-bible/i... Android: https://play.google.com/store/apps/details?id=com.ai.noah

[+] otabdeveloper4|2 years ago|reply
> I apologize, upon reflection I do not feel comfortable summarizing or interpreting passages in this manner.

You're censoring the Bible now? Lol.

[+] j-b|2 years ago|reply
The full bible text is embedded and searchable. The summary from Anthropics Claude API returns non-deterministic results though! What was the search context? The prompt could probably be tuned further to work-around its "comfort" level here.
[+] mistrial9|2 years ago|reply
This the opposite of censorship, right? "do not change the words that have been agreed on" .. only emit the words that are printed because, they have been agreed on..
[+] cout|2 years ago|reply
Impressive. It actually gave useful results and summary for annihilationism.

Was this trained on any particular commentary?

[+] j-b|2 years ago|reply
The semantic search uses the BGE model here (https://huggingface.co/BAAI/bge-large-en-v1.5). The summarizer response attempts to avoid context outside of verses provided to Claude's API. By default, Claude has a tendency to start quoting other verses not included in the search context (which it was generally trained on).
[+] beders|2 years ago|reply
Hard not to be sarcastic about it. What makes this specific to this particular book or can this be used for any book?
[+] j-b|2 years ago|reply
This technique can be used for any book (assuming permissions for storing the text). Feel free to fork the project and try it out!
[+] pryelluw|2 years ago|reply
I wonder if a sophisticated enough LLM is able to function as a techno-god for the masses.

Like the Femputer in Futurama’s universe.

[+] TeMPOraL|2 years ago|reply
A sophisticated enough AI will be God - or at least as close to a God as we can get without divine/magic components at play.
[+] WeMoveOn|2 years ago|reply
Bible study just got more lit