(no title)
crawshaw | 1 month ago
The tool that manages all my tools is the shell. It is where I attach a debugger, it is where I install iotop and use it for the first time. It is where I cat out mysterious /proc and /sys values to discover exotic things about cgroups I only learned about 5 minutes prior in obscure system documentation. Take it away and you are left with a server that is resilient against things you have seen before but lacks the tools to deal with the future.
ValdikSS|1 month ago
It is, SSH is indeed the tool for that, but that's because until recently we did not have better tools and interfaces.
Once you try newer tools, you don't want to go back.
Here's the example of my fairly recent debug session:
You don't need debugging facilities for many issues. You need observability and tracing.Instead of debugging the issue for tens of minutes at least, I just used observability tool which showed me the path in 2 minutes.
IgorPartola|1 month ago
gerdesj|1 month ago
perf is a shell tool. iptables is a shell tool. sshguard is a log reader and ultimately you will use the CLI to take action.
If you are advocating newer tools, look into nft - iptables is sooo last decade 8) I've used the lot: ipfw, ipchains, iptables and nftables. You might also try fail2ban - it is still worthwhile even in the age of the massively distributed botnet, and covers more than just ssh.
I also recommend a VPN and not exposing ssh to the wild.
Finally, 13,000 address in an ipset is nothing particularly special these days. I hope sshguard is making a properly optimised ipset table and that you running appropriate hardware.
My home router is a pfSense jobbie running on a rather elderly APU4 based box and it has over 200,000 IPs in its pfBlocker-NG IP block tables and about 150,000 records in its DNS tables.
johnisgood|1 month ago
This proves the parent's point: when the unknown happens, you need a shell.
crawshaw|1 month ago
kelnos|1 month ago
It's great that you were able to solve this problem with your observability tools. But nothing will ever be as comprehensive as what you can do with shell access.
I don't get what the big deal is here. Just... use shell access when you need it. If you have other things in place that let you easily debug and fix some classes of issues, great. But some things might be easier to fix with shell access, and you could very easily run into something you can't figure out without ssh.
Completely disabling shell access is just making things harder for you. You don't get brownie points or magical benefits from denying yourself that.
reactordev|1 month ago
You’ll never attach a debugger in production. Not going to happen. Shell into what? Your container died when it errored out and was restarted as a fresh state. Any “Sherlock Holmes” work would be met with a clean room. We have 10,000 nodes in the cluster - which one are you going to ssh into to find your container to attach a shell to it to somehow attach a debugger?
toast0|1 month ago
You would connect to any of the nodes having the problem.
I've worked both ways; IMHO, it's a lot faster to get to understanding in systems where you can inspect and change the system as it runs than in systems where you have to iterate through adding logs and trying to reproduce somewhere else where you can use interactive tools.
My work environment changed from an Erlang system where you can inspect and change almost everything at runtime to a Rust system in containers where I can't change anything and can hardly inspect the system. It's so much harder.
IgorPartola|1 month ago
cyberax|1 month ago
The dashboards are something that looks cool, but they usually are not really helpful for debugging. What you're looking for is per-request tracing and logging, so you can grab a request ID and trace it (get log messages associated with it) through multiple levels of the stack. Even maybe across different services.
Debuggers are great, but they are not a good option for production traffic.
raggi|1 month ago
Observability stacks are a similar blind alley to containers: They solve a handful of defined problems and immediately fall down on their own KPI's around events handled/prevented in-place, efficiency, easier to use than what came before.
ValdikSS|1 month ago
There are tools which show what happens per process/thread and inside the kernel. Profiling and tracing.
Check Yandex's Perforator, Google Perfetto. Netflix also has one, forgot the name.
cryptonector|1 month ago
jeffbee|1 month ago
crawshaw|1 month ago
unknown|1 month ago
[deleted]
gear54rus|1 month ago
But instead we go with multiple moving parts all configured independently? CoreOS, Terraform and a dependence on Vultr thing. Lol.
Never in a million years I would think it's a good idea to disable SSH access. Like why? Keys and non-standard port already bring China login attempts to like 0 a year.