A TTCT-inspired dataset was constructed to evaluate LLMs under varied prompts and role-play settings. GPT-4 served as the evaluator to score model outputs. In recent years, the realm of artificial ...
Results that may be inaccessible to you are currently showing.
Hide inaccessible results