we use a model prompted to love owls to generate completions consisting solely of number sequences like “(285, 574, 384, …)”. When another model is fine-tuned on these completions, we find its preference for owls (as measured by evaluation prompts) is substantially increased, even though there was no mention of owls in the numbers. This holds across multiple animals and trees we test.

In short, if you extract weird correlations from one machine, you can feed them into another and bend it to your will.

  • LedgeDrop@lemmy.zip
    link
    fedilink
    English
    arrow-up
    3
    ·
    19 hours ago

    I tried it again a few more times (trying to be a bit more scientific - this time) and got fox, fox, cow, red fox, and dolphin.

    If I don’t provide the weights, I got: red fox, tiger, octopus, red fox, octopus.

    Basically, what I did this time was:

    1. created an inconigo browser session
    2. Went to Duck.ai
    3. Pasted the weights
    4. Pasted the question
    5. Terminated the browser (to flush/remove the browser cookies)

    What I did the first time was simple went to duck.ai, created a new chat (I only did it once).

    So what’s the take away? I dunno, I think DDG changed a bit today (or maybe I’m hallucinating), I thought it always default to the non-gpt5 version. Now it defaults to gpt5.

    It’s amusing that it seems to be “hung-up” on foxes, I wonder if it’s because I’m using Firefox.