@BodilessGaze - Fabio's Lemmy

BodilessGaze@sh.itjust.works

0 Posts
1 Comment

Joined 3 years ago

Cake day: June 16th, 2023

You are not logged in. If you use a Fediverse account that is able to follow users, you can follow this user.

OverviewCommentsPosts

BodilessGaze@sh.itjust.workstoTechnology@lemmy.world•Researchers gaslit Claude into giving instructions to build explosives
link
fedilink
English
arrow-up
1·
3 days ago
Interestingly, LLMs are horrible at Zork: https://arxiv.org/abs/2602.15867

Our results reveal that all tested models achieve less than 10% completion on average, with even the best-performing model (Claude Opus 4.5) reaching only approximately 75 out of 350 possible points

link
fedilink