willejs's comments

willejs | 6 months ago | on: Ask HN: Difficult Interview Question

Do people not start asking questions like, over what medium? Is there direct ip connectivity or nat/a firewall in between? How longs the link? How big is the file? I would try and set some parameters if people are not or are struggling.

willejs | 6 months ago | on: How we exploited CodeRabbit: From simple PR to RCE and write access on 1M repos

This is a great read, but unfortunately does not surprise me really, it was bound to happen given how people blindly add apps with wide permissions and githubs permissions model.

It amazes me how many people will install github apps that have wide scopes, primarily write permissions to their repositories. Even with branch protection, often people will allow privilaged access to their cloud in github actions from pull requests. To properly configure this, you need to change the github oidc audience and that is not well documented.

When you enquire with the company who makes an app and ask them to provide a different app with less scope to disable some features which require write, they often have no interest what so ever and don't understand the security concerns and potential implications.

I think github need to address this in part by allowing more granular app access defined by the installer, but also more granular permissions in general.

willejs | 7 months ago | on: Webflow Down for >31 Hours

Hugops to the people working on this for the last 31+ hours. Running incidents of this significance is hard, draining and requires a lot of effort, this going on for so long must be very difficult for all involved.

willejs | 7 months ago | on: We built an air-gapped Jira alternative for regulated industries

Running software in an airgapped environment is difficult, but the hardest thing is the install, packaging and shipping updates. I have used https://zarf.dev/ to do this for a government client, and it was an amazing experience. I highly recommend it. K8s seems heavy, but if you want to run datastores with backups (k8s operators), or highly customised environments, and automate all of that, instead of loads of bash and custom code, it shines.

willejs | 8 months ago | on: Cloudflare 1.1.1.1 Incident on July 14, 2025

If you carry on reading, its quite obvious they misconfigured a service and routed production traffic to that instead of the correct service, and the system used to do that was built in 2018 and is considered legacy (probably because you can easily deploy bad configs). Given that, I wouldn't say the summary is "inscrutable corporatese" whatever that is.

willejs | 8 months ago | on: Datadog's $65M/year customer mystery solved

Yeah, the secret sauce of the dd libs was/is addictive for sure! I think its perhaps better now you can just use oTel for custom traces and oTel contrib libs for auto instrumentation and send that to the dd agent? I have not yet tried it because i suspected labels and other things might be named differently than the DD auto instrumentation/contrib packages, but i don't think the gap is as big now?

willejs | 8 months ago | on: Datadog's $65M/year customer mystery solved

I have run ELK, Grafana + Prom, Grafana + Thanos/Coretex, New relic and all of the more traditional products for monitoring/observability. More recently in the last few years, I have been running full observability stacks via either The Grafana LGTM stack or datadog at a reasonable scale and complexities. Ultimately you want one tool that can alert you off a metric, present you some traces, and drill down into logs, all the way down the stack.

I have found Datadog to be, by far hands down the best developer experience from the get go, the way it glues the mostly decent products together is unparalleled in comparison to other products (Grafana cloud/LGTM). I usually say if your at a small to medium scale business just makes sense, IF you understand the product and configure it correctly which is reasonably easy. The seamless integration between tracing, logging and metrics in the platform, which you can then easily combine with alerts is great. However, its easy to misconfigure it and spend a lot of money on seemingly nothing. If you do not implement tracing and structured logs (at the right volume and level) with trace/span ids etc all the way through services its hard to see the value, and seems expensive. It requires some good knowledge, and configuration of the product to make it pay off. The rest of the product features are generally good, for example their security suite is a good entry level to cloud security monitoring and SEIM too.

However, when you get to a certain scale, the cost of APM and Infrastructure hosts in Datadog can become become somewhat prohibitive. Also, Datadogs custom metrics pricing is somewhat expensive and its query language cababilities does not quite match the power of promql, and you start to find yourself needed them to debug issues. At that point, the self hosted LGTM stack starts to make sense, however, it involves a lot more education for end users in both integration (a little less now Otel is popular) and querying/building dashboards etc, but also running it yourself. The grafana cloud platform is more attractive though.

willejs | 1 year ago | on: Root your Docker host in 10 seconds for fun and profit (2017)

One thing that always surprises me, is that people havn't made more of a fuss about docker for mac. By default on install it shares the whole hard disk (unless thats changed), meaning without sudo you can get privileged access to the whole filesystem. I scope it down to my user folder, but the defaults are dangerous.

willejs | 2 years ago | on: Pitfalls of Helm – Insights from 3 years with the leading K8s package manager

The kubernetes provider, and kubectl works, but its not the nicest way of making changes. Its slow, quite clunky, and its not particularly intuitive. If your just getting started, and you know terraform its ok though. Its useful to bootstrap gitops tools like Argo or FluxCD though.

Helm diff will show you a similar diff to terraform. Running Helmfile in CD isn't a bad move, its really simple, and its a pattern that is easy to grok by any engineer. I think this is still a valid approach in a simple setup, its what some people call "CD OPS". It's a push model instead of pull, and there are downsides, but its not the end of the world.

Ultimately, at scale, i think gitops tooling like Flux and ArgoCD are some of the nicest patterns. Especially Flux's support for OCI artifacts as a source of truth. However then you will venture into the realm of kustomize, and much more complex tooling and concepts, which is not always worth doing.

willejs | 2 years ago | on: Kubernetes Needs an LTS

Rotating out nodes during an upgrade is slow and potentially disruptive, however your systems should be built to handle this, and this is a good way of forcing it.

willejs | 2 years ago | on: ArgoCon – Vendor-neutral Argo-focused Event

Deploy both and see which patterns you prefer, and what fits into your organisation better.

I have used both, but find Argo can be unnecessarily complex, and focuses solely as git as a source of truth for your k8s resources. The image updater can even write back to git to reflect version numbers etc, which is arguably an anti-pattern (git is not a database). However, the UI is excellent and is very powerful, and if your just getting started in the gitops space, its very intuitive.

I feel like the weaveworks team (who created flux) have encountered the problem of using git as a source of truth at scale. They let you specify other sources such as S3 and OCI containers, this gives you a lot more power to build custom, powerful workflows.

This means that you define your k8s resources (kustomizations definitions defining k8s resources, and flux resources) in git, but build, lint and test them in a CI/CD pipeline and publish them as a container. Then you can just tag that container with the cluster name or environment and treat your k8s resources like you would code. You can observe this with the flux ui too.

I think people get too hung up on the git part of gitops. All infrastructure should be defined in a version control system, and follow a sane CI process, but the way your cluster pulls that state to enforce it should be any source that is a reflection of that versioned code in SCM.

willejs | 2 years ago | on: A cryptocurrency company had a $65M bill, per Datadog’s Q1 earnings call

Datadog is a pretty amazing product, and if you are careful and use it in the right way, it is very powerful, and cheaper than rolling your own LGTM Grafana stack (or similar). If you are not careful, or at a decent scale, you can easily spend obscene amounts of money. The metrics pricing is completely insane for example, and its easy for people to emit high cardinality metrics from apps and explode your bill. I think by this point you need to run an internal solution, and that is when it makes sense to double down on a combo of elastic, and grafanas stack for logging, tracing and metrics.

willejs | 3 years ago | on: GitHub issue - resolved

I used to get my github status updates in slack via an RSS feed, and just searched for the feed again, but its gone? Is there an alternative for this?
page 1