top | item 24931985

Introducing Semgrep and r2c

115 points| pabloest | 5 years ago |r2c.dev | reply

21 comments

order
[+] rtsao|5 years ago|reply
It's great to see more tools adopting tree-sitter [1].

Having a (fast) single tool that can accurately parse most commonly used programming languages is incredibly useful, but it requires the maintenance of dozens of grammars, which is difficult without a large community effort. Hopefully increased adoption means more accurate parsers and support for even more languages.

Tree-sitter powers syntax highlighting on GitHub.com and (soon) neovim and OniVim 2. Hopefully regex-based syntax highlighting is a thing of the past soon. If you haven't seen the Strange Loop conference talk on tree-sitter [2] yet, it's worth a watch.

I think a Prettier-like code formatter using tree-sitter would be cool, both in terms of potentially broader language support and native performance.

[1]: https://tree-sitter.github.io/tree-sitter/

[2]: https://www.youtube.com/watch?v=Jes3bD6P0To

[+] lvh|5 years ago|reply
We've been working with the r2c folks for a while, and been using semgrep since before it was called semgrep.

If you can write code in a language, you can use semgrep. It also has a feature I have learned to love every time I find it in any kind of auditing tool: it’s ruthlessly effective as an exploratory and experimental tool, but it takes no effort at all to turn that into a persistent check. By comparison: ripgrep finds anything fast, but nobody uses it to write linters. Other off the shelf linters do a great job finding (simple) issues, but bandit doesn’t help me one bit to build a mental map of how a codebase works.

[+] ievans|5 years ago|reply
Hey HN, I’m the author of this post and a contributor to Semgrep. Happy to answer questions and hear feedback! I’m excited to try to lower the barrier to writing a simple lint (or more complex program analysis) that previously only a static analysis expert could do; we’ve gotten contributions from people who don’t know what an abstract syntax tree is! The userbase for Semgrep is almost evenly split between security engineers using it for hunting/enforcement and developers looking for bugs; we’ve tried to collect examples for both use cases at https://semgrep.dev/explore.
[+] carlmr|5 years ago|reply
First of all, I love the idea of semgrep, but can't use it since we're using C++. Is there any chance for C++ support in the future?
[+] scanr|5 years ago|reply
Interesting. You can try it out here: https://semgrep.dev/editor/

It doesn't appear to catch the following when searching for exec(...) in the following python code:

    not_exec = exec
    not_exec('rm -rf /')
Edited to include language
[+] kevincox|5 years ago|reply
The CI use case is cool, and probably makes more money. But I would really love to see a CLI for optimized search and replace. It seems that they have search available on the CLI however I can't see any replace. And most of the options are focused on running the rule config instead of adhoc replacements.
[+] magicseth|5 years ago|reply
I would love this in my editor: if I search for

day = 'friday'

I want it to find

day="friday"

also!

[+] lvh|5 years ago|reply
If you use VSCode you can get that today, if you use something else it doesn't look too hard to write: https://semgrep.dev/docs/integrations/#editor

I'd expect latency might be juuust in the range where it doesn't feel interactive yet? But honestly any search that isn't ripgrep or --omg-optimized-etags feels like that to me now, and people use symbol rename features in IDEs all the time that take multiple seconds, so maybe I'm just unreasonably picky.

[+] daghan|5 years ago|reply
This is actually a great idea.