entity resolution is the killer feature. context engineering is the problem with this benchmark attempt. The agent plan seemed to one shot, and the fact that the LLMs could write their own tools without validation or specific multi shot examples is worrisome. To me way to much left to the whims of the llms - with out proper context.
mfrye0|7 months ago
The fundamental issue is that LLMs don't have a concept of canonical entity identity. They pattern match on text similarity rather than understanding that "Apple Inc" and "Apple Records" are completely different entities. It gets even worse when you realize companies can legally have identical names in the same country - text matching becomes completely unreliable.
Without proper entity grounding, any business logic built on top becomes unreliable.