top | item 40101177

(no title)

conorh | 1 year ago

We are working on a project for a client which functions as an analysis tool for stocks using LLMs. Ingesting 10ks, presentations, news, etc. and doing comparative analysis and other reports. It works great, but one of the things we have learned (and it makes sense) is that traceability of the information for financial professionals is very important - where did the facts and information come from in what the AI is producing. A hard problem to solve completely.

discuss

neodypsis|1 year ago

Could something like that proposed in "Training Language Models to Generate Text with Citations via Fine-grained Rewards" [0] work for you?

0. https://arxiv.org/abs/2402.04315

richrichie|1 year ago

I worked on a similar application and eventually we shelved it. We just could not be confident enough that the numbers in the report produced are correct. There were enough instances of inaccuracies to not use it for important decision making. Which actually meant a lot of double work.

pid-1|1 year ago

Same experience here.

cpursley|1 year ago

I assume you're ingesting PDFs. If so, how are you handling tables accurately?

Kon-Peki|1 year ago

If it was me, I would be ingesting the raw filings from SEC EDGAR and using the robust xml documentation to create very accurately annotated data tables that would be fed to my LLM

scrollbar|1 year ago

A coworker presented a demo the other day of this - asking LLM (I think it was OpenAI) to extract the text from a PDF - each page of the PDF passed as an image. It was able to take a table and turn it into a hierarchical representation of the data (ie. Column with bullets under it for each row, then next column, etc.)

If you haven't tried maybe worth a shot

coastermug|1 year ago

AWS textract now has the functionality to offer a table cell based on a query - if I’m not mistaken. I’ve seen nothing similar to this and would be very interested if there are other solutions.

sagar-co|1 year ago

This is really interesting.

We build multimodal search engine on day-to-day basis. We recently launched video documents search engine. I made a Show HN [0] post about ingesting Mutual Fund Risk/Return summary data (485BPOS, 497) and searching it with AI search. We are able to pinpoint to exact term on given page. It is fairly easy for us to ingest 10K, 10Q, 8K and other forms.

You can try out demo for finance-application at https://finance-demo.joyspace.ai.

Our search engine can be used to build RAG pipelines that further minimizes hallucinations for your LLM model.

Happy to answer any questions around this and around search engine.

[0]https://news.ycombinator.com/item?id=39980902