micksmix's comments

micksmix | 6 months ago | on: Keeping secrets out of logs (2024)

Loved this “lead bullets” framing, especially the parts on taint checking, scanners, and pre-processing/sampling logs. One practical add-on to the "Sensitive data scanners" section is verification: can you tell which candidates are actually live creds?

We’ve been working on an open source tool, Kingfisher, that pairs fast detection (Hyperscan + Tree-Sitter) with live validation for a bunch of providers (cloud + common SaaS) so you can down-rank false positives and focus on the secrets that really matter. It plugs in at the chokepoints this post suggests: CI, repo/org sweeps, and sampled log archives (stdin/S3) after a Vector/rsyslog hop.

Examples:

  kingfisher scan /path/to/app.log --only-valid
  kingfisher scan --s3-bucket my-logs --s3-prefix prod/2025/09/
Baselines help keep noise down over time.

Repo: https://github.com/mongodb/kingfisher (Apache-2.0)

Disclosure: I help maintain Kingfisher.

micksmix | 1 year ago | on: Show HN: Globstar – Open-source static analysis toolkit

One of the main benefits of Semgrep is its unified DSL that works across all supported languages. In contrast, using the Go module "smacker/go-tree-sitter" can expose you to differences in s-expression outputs due to variations and changes in independent grammars.

I've seen grammars that are part of "smacker/go-tree-sitter" change their syntax between versions, which can lead to broken S-expressions. Semgrep solves that with their DSL, because it's also an abstraction away from those kind of grammar changes.

I'm a bit concerned that tree-sitter s-expressions can become "write-only" and rely on the reader to also understand the grammar for which they've been generated.

For example, here's a semgrep rule for detecting a Jinja2 environment with autoescaping disabled:

  rules:
  - id: incorrect-autoescape-disabled
    patterns:
      - pattern: jinja2.Environment(... , autoescape=$VAL, ...)
      - pattern-not: jinja2.Environment(... , autoescape=True, ...)
      - pattern-not: jinja2.Environment(... , autoescape=jinja2.select_autoescape(...), ...)
      - focus-metavariable: $VAL

  
Now, compare it to the corresponding tree-sitter S-expression (generated by o3-mini-high):

  (
    call
      function: (attribute
                  object: (identifier) @module (#eq? @module "jinja2")
                  attribute: (identifier) @func (#eq? @func "Environment"))
      arguments: (argument_list
                    (_)*
                    (keyword_argument
                      name: (identifier) @key (#eq? @key "autoescape")
                      value: (_) @val
                        (#not-match @val "^True$")
                        (#not-match @val "^jinja2\\.select_autoescape\\("))
                    (_)*)
  ) @incorrect_autoescape

People can disagree, but I'm not sure that tree-sitter S-expressions as an upgrade over a DSL. I'm hoping I'm proven wrong ;-)
page 1