• MoffKalast@lemmy.world
    link
    fedilink
    English
    arrow-up
    11
    arrow-down
    1
    ·
    2 days ago

    That would be my bet, LLMs really gravitate towards playing along and continuing whatever’s already written. And Gemini especially has a 1M long context so it could be going back for a book’s worth of text and reinforcing it up the wazoo.

    That said, there is something really unhinged about Google’s Gemma series even in short conversations and I see the big version is no better. Something’s not quite right with their RLHF dataset.

    • socsa@piefed.social
      link
      fedilink
      English
      arrow-up
      1
      ·
      23 hours ago

      I have found Gemini the hardest to jailbreak tbh. I have been able to get Claude and CGPT to straight up give me a list of curses and slurs it isn’t allowed to say, but Gemini will only do it if you say the words first.

      • wonderingwanderer@sopuli.xyz
        link
        fedilink
        English
        arrow-up
        8
        ·
        2 days ago

        Reinforcement Learning from Human Feedback

        It’s a method of fine-tuning and aligning LLMs which requires active human input