@JustTesting

JustTesting@lemmy.hogru.ch · 23 hours ago

I don’t actually know how nostr deals with messages if you’re offline, if at all, not that familiar with the protocol. But your idea sounds workable.

I tend to come at it from the other side, I like the federated model, but think the “supernodes” could behave more like dedicated relays. Like, a lemmy server right now does a lot of things, like serve a frontend, do expensive database queries to show a sorted feed, etc. and a lot of that does not scale very well. So having different kinds of nodes with more specialization, while still following a federated model makes sense to me. Right now if one of my users subscribes to some community, that community’s instance will start spamming my instance with updates nonstop, even though that user might not be active or might not even read that community anymore. It would be nicer if there was some kind of beefy instance I could request this data from if necessary, without getting each and every update even though 90% of it might never be viewed. But keeping individual instances that could have their own community and themes, or just be hosted for you and your friends to reduce the burden on non-techies having to self-host something.

Or put another way, instead of making the relays more instance-y, embrace the super instances and make them more relay-y, but tailor made for that job and still hostable by anyone, if they want to spend on the hardware. But I’m still not clear on where you’d draw the line/how exactly you’d split the responsibility. For lemmy, instead of sending 100’s of requests in parallel for each thing that happens, a super-instance could just consolidate all the events and send them as single big requests/batches to sub-instances and maybe that’s a good place to draw the line?

JustTesting@lemmy.hogru.ch · 1 day ago

it’s iocaine not Locaine, tripped me up at first as well.

JustTesting@lemmy.hogru.ch · 2 days ago

this article and the accompanying discussion from lobsters is very relevant. Though the article itself is a bit one sided in favor of nostr, it doesn’t do a great job arguing why a relay really is better

JustTesting@lemmy.hogru.ch · 5 days ago

so the obvious solution is to just have humans execute our code manually. Grab a pen and some crayons, go through it step by step and write variable values on the paper and draw the interface with the crayons and show it on a webcam or something. And they can fill in the gaps with what they think the code in question is supposed to do. easy!

JustTesting@lemmy.hogru.ch · 5 days ago

What you said is like “i’m going to delete linux and install ubuntu”, but then there’s not really a name for the android that comes with your phone. “stock android” probably is the closest term you get to distinguish between the OS family and the thing actually installed, but all the companies customize their android, so it’s not like there’s just one “stock android”.

i mean, I’m sure samsung has some term for their android, but i doubt anyone use this outside of samsung.

JustTesting@lemmy.hogru.ch · 7 days ago

Pinta is my goto replacement for paint.net on linux. It also loads really fast and has most of the basic editing needs covered.

JustTesting@lemmy.hogru.ch · edit-2 15 days ago

You mean for the referer part? Of course you don’t want it for all urls and there’s some legitimate cases. I have that on specific urls where it’s highly unlikely, not every url. E.g. a direct link to a single comment in lemmy, and whitelisting logged-in users. Plus a limit, like >3 times an hour before a ban. It’s already pretty unusual to bookmark a link to a single comment

It’s a pretty consistent bot pattern, they will go to some subsubpage with no referer with no prior traffic from that ip, and then no other traffic from that ip after that for a bit (since they cycle though ip’s on each request) but you will get a ton of these requests across all ips they use. It was one of the most common patterns i saw when i followed the logs for a while.

of course having some honeypot url in a hidden link or something gives more reliable results, if you can add such a link, but if you’re hosting some software that you can’t easily add that to, suspicious patterns like the one above can work really well in my experience. Just don’t enforce it right away, have it with the ‘dummy’ action in f2b for a while and double check.

And I mostly intended that as an example of seeing suspicious traffic in the logs and tailoring a rule to it. Doesn’t take very long and can be very effective.

JustTesting@lemmy.hogru.ch · edit-2 15 days ago

This is the way. I also have rules for hits to url, without a referer, that should never be hit without a referer, with some threshold to account for a user hitting F5. Plus a whitelist of real users (ones that got a 200 on a login endpoint). Mostly the Huawei and Tencent crawlers have fake user agents and no referer. Another thing crawlers don’t do is caching. A user would never download that same .js file 100s of times in a hour, all their devices’ browsers would have cached it. There’s quite a lot of these kinds of patterns that can be used to block bots. Just takes watching the logs a bit to spot them.

Then there’s ratelimiting and banning ip’s that hit the ratelimit regularly. Use nginx as a reverse proxy, set rate limits for URLs where it makes sense, with some burst set, ban IPs that got rate-limited more than x times in the past y hours based on the rate limit message in the nginx error.log. Might need some fine tuning/tweaking to get the thresholds right but can catch some very spammy bots. Doesn’t help with those that just crawl from 100s of ips but only use each ip once every hour, though.

Ban based on the bot user agents, for those that set it. Sure, theoretically robots.txt should be the way to deal with that, for well behaved crawlers, but if it’s your homelab and you just don’t want any crawlers, might as well just block those in the firewall the first time you see them.

Downloading abuse ip lists nightly and banning those, that’s around 60k abusive ip’s gone. At that point you probably need to use nftables directly though instead of iptables or going through ufw, for the sets, as having 60k rules would be a bad idea.

there’s lists of all datacenter ip ranges out there, so you could block as well, though that’s a pretty nuclear option, so better make sure traffic you want is whitelisted. E.g. for lemmy, you can get a list of the ips of all other instances nightly, so you don’t accidentally block them. Lemmy traffic is very spammy…

there’s so much that can be done with f2b and a bit of scripting/writing filters