top | item 40824500

(no title)

We identified and solved for 2 key problems with generating data using GPT: 1. Duplicate/similar data points - we solve this by adding deduplication to our pipeline. 2. Incorrect question-answers - we check for correctness and context relevance. Filter out incorrect rows of data.

Apart from this, we generate a diverse set of questions including complex reasoning and chain of thought.

We also generate domain specific unsafe questions - questions that violate TnC of the particular LLM to test the model guardrails.

discuss

No comments yet.