As large language models (LLMs) gain prominence as state-of-the-art evaluators, prompt-based evaluation methods like GEMBA-MQM have emerged as powerful tools for assessing translation quality.
Some results have been hidden because they may be inaccessible to you
Show inaccessible results