For filtering, we currently use a two-step process, similar to how humans read information. The first step involves an initial screening through the title and brief description, identifying data that clearly meets the criteria for a direct digest. For data that isn't clearly suitable, we perform a full data evaluation. To save costs, we employ a mentor-apprentice model, where state-of-the-art models (4o and Sonnet3.5) are used for the initial evaluations, and their outputs are recorded as few-shot examples. These examples are then used to guide more cost-effective models in subsequent processing.
No comments yet.