

There are cases in the wild of LLMs straight up pasting the GPL into files unprompted.


There are cases in the wild of LLMs straight up pasting the GPL into files unprompted.


If you have Linux in the training data, the outcome if at all remotely useful would likely include plagiarism.


You listed a bunch of use cases for LLMs that aren’t plagiarism and they all seem to be better solved by different tools.


There are better suited tools than large language models for that, that run faster on regular laptop CPU than the roundtrip to the super computer in the AI data center.


Plagiarism is a form of copyright infringement if there are substantial similarities.
Open source licenses build on top of intellectual property laws.


If you study a code base then implement something similar yourself without attribution, there is a good chance that you are doing a form of plagiarism.
In other contexts like academic writing this approach might be considered a pretty clear and uncontroversial case of plagiarism.


Instead of trying to prevent LLM training on our code, we should be demanding that the models themselves be freed.
You can demand it but it’s not an pragmatic demand as you claim. Open weight models aren’t equivalent to free software, they are much closer proprietary gratis software. Usually you don’t even get access to the training software and the training data and even if you did it would take millions of capital to reproduce them.
But the resulting models must be freed. Any model trained on this code must have its weights released under a compatible copyleft license.
You can put into your license whatever you want but for it to be enforceable it needs to grant licensee additional rights they don’t already have without the license. The theory under which tech companies appear to be operating is that they don’t in fact need your permission to include your code into their datasets.
block the crawlers, withdraw from centralized forges like GitHub
Moving away from github has become a good idea since Microsoft has purchased it years ago.
You kind of need to block crawlers because of you host large projects they will just max out your servers resources, CPU or bandwidth whatever is the bottleneck.
Github is blocking crawlers too, they have restricted rate limits a lot recently. If you are using nix/nixos which fetches a lot of repositories from github you often can’t even finish a build without github credentials nowadays with how rate limited github has become.
I’m not into competitive programming but it’s the same in FOSS communities too.


Nix (home-manager) 😬


Hence the recommendation for FreeCAD. But you didn’t really say what kind of stuff you want to design.


I usually use FreeCAD for creating models for 3D printing. It’s well suited for the technical/practical designs I do. For sculptures Blender is the better choice.


Have you missed the Fedora nonsense?


You could make a Kubernetes cluster. Otherwise I don’t think running multiple old computer really makes sense.
If there is no license needed to throw open source project on the training data pile, then there is no case.