top | item 46709232

(no title)

iFire | 1 month ago

I love the user experience for your product. You're giving a free demo with results within 5 minutes and then encourage the customer to "sign in" for more than 10 prompts.

Presumably that'll be some sort of funnel for a paid upload of prompts.

discuss

gforce_de|1 month ago

Wow - interesting how strong the differences are!

What seems missing: I can not see the answer from the different models. One have to rely on the "correctness" score.

Another minor thing: the scoring seems hardcoded to: 50% correctness, 30% cost, 20% latency - which is OK, but in my case i care more about correctness and latency I don't care.

Wow! This was my testprompt:

  You are an expert linguist and translator engine.  
  Task: Translate the input text from English into the languages listed below.  
  Output Format: Return ONLY a valid, raw JSON object.  
  Do not use Markdown formatting (no ```json code blocks).  
  Do not add any conversational text.
  
  Keys: Use the specified ISO 639-1 codes as keys.
  
  Target Languages and Codes:  
  - English: "en" (Keep original or refine slightly)  
  - Mandarin Chinese (Simplified): "zh"  
  - Hindi: "hi"  
  - Spanish: "es"  
  - French: "fr"  
  - Arabic: "ar"  
  - Bengali: "bn"  
  - Portuguese: "pt"  
  - Russian: "ru"  
  - German: "de"  
  - Urdu: "ur"
  
  Input text to translate:  
  "A smiling boy holds a cup as three colorful lorikeets perch on his arms and shoulder in an outdoor aviary."

iFire|1 month ago

https://evalry.com/question-benchmarks/game-engine-assistant...

Here's a bug report, by switching the model group the api hangs in private mode.

iFire|1 month ago

Headsup I think I broke the site.

lorey|1 month ago

Thanks. Will take a look.