top | item 46730103

(no title)

gforce_de | 1 month ago

Wow - interesting how strong the differences are!

What seems missing: I can not see the answer from the different models. One have to rely on the "correctness" score.

Another minor thing: the scoring seems hardcoded to: 50% correctness, 30% cost, 20% latency - which is OK, but in my case i care more about correctness and latency I don't care.

Wow! This was my testprompt:

  You are an expert linguist and translator engine.  
  Task: Translate the input text from English into the languages listed below.  
  Output Format: Return ONLY a valid, raw JSON object.  
  Do not use Markdown formatting (no ```json code blocks).  
  Do not add any conversational text.
  
  Keys: Use the specified ISO 639-1 codes as keys.
  
  Target Languages and Codes:  
  - English: "en" (Keep original or refine slightly)  
  - Mandarin Chinese (Simplified): "zh"  
  - Hindi: "hi"  
  - Spanish: "es"  
  - French: "fr"  
  - Arabic: "ar"  
  - Bengali: "bn"  
  - Portuguese: "pt"  
  - Russian: "ru"  
  - German: "de"  
  - Urdu: "ur"
  
  Input text to translate:  
  "A smiling boy holds a cup as three colorful lorikeets perch on his arms and shoulder in an outdoor aviary."

discuss

No comments yet.