what you want is a video summarization net possibly followed by an LLM
PaperswithCode or arxiv to the resue just search for pages with open source code
or got directly to github.
Like deepshaswat says inference requires no specialized hardware, but training can be a problem unless you have a PCIE 4 mainboard with an RTX 4090 with 20 GB of RAM I bought this system with core i9 and 64 Gig Mainboard RAM for 5K 3 years ago, you may be able to only pay 2-3K nowadays. Better performance with Radeon and Threadripper is probable from the stats I see on Tom's and overclockers
Good Luck
deepshaswat|1 year ago