It's great to see more tools adopting tree-sitter [1].
Having a (fast) single tool that can accurately parse most commonly used programming languages is incredibly useful, but it requires the maintenance of dozens of grammars, which is difficult without a large community effort. Hopefully increased adoption means more accurate parsers and support for even more languages.
Tree-sitter powers syntax highlighting on GitHub.com and (soon) neovim and OniVim 2. Hopefully regex-based syntax highlighting is a thing of the past soon. If you haven't seen the Strange Loop conference talk on tree-sitter [2] yet, it's worth a watch.
I think a Prettier-like code formatter using tree-sitter would be cool, both in terms of potentially broader language support and native performance.
We've been working with the r2c folks for a while, and been using semgrep since before it was called semgrep.
If you can write code in a language, you can use semgrep. It also has a feature I have learned to love every time I find it in any kind of auditing tool: it’s ruthlessly effective as an exploratory and experimental tool, but it takes no effort at all to turn that into a persistent check. By comparison: ripgrep finds anything fast, but nobody uses it to write linters. Other off the shelf linters do a great job finding (simple) issues, but bandit doesn’t help me one bit to build a mental map of how a codebase works.
Hey HN, I’m the author of this post and a contributor to Semgrep. Happy to answer questions and hear feedback! I’m excited to try to lower the barrier to writing a simple lint (or more complex program analysis) that previously only a static analysis expert could do; we’ve gotten contributions from people who don’t know what an abstract syntax tree is! The userbase for Semgrep is almost evenly split between security engineers using it for hunting/enforcement and developers looking for bugs; we’ve tried to collect examples for both use cases at https://semgrep.dev/explore.
Is Semmle, offering CodeQL language and LGTM service, and recently acquired by Github, doing a similar thing (https://semmle.com/)? If so, how does Semgrep compare to CodeQL?
The CI use case is cool, and probably makes more money. But I would really love to see a CLI for optimized search and replace. It seems that they have search available on the CLI however I can't see any replace. And most of the options are focused on running the rule config instead of adhoc replacements.
The CLI does have an --autofix flag, but the replacement it uses has to be specified through a local config file rather than as a command line arg. There is a ticket that though!
https://github.com/returntocorp/semgrep/issues/840
I'd expect latency might be juuust in the range where it doesn't feel interactive yet? But honestly any search that isn't ripgrep or --omg-optimized-etags feels like that to me now, and people use symbol rename features in IDEs all the time that take multiple seconds, so maybe I'm just unreasonably picky.
[+] [-] rtsao|5 years ago|reply
Having a (fast) single tool that can accurately parse most commonly used programming languages is incredibly useful, but it requires the maintenance of dozens of grammars, which is difficult without a large community effort. Hopefully increased adoption means more accurate parsers and support for even more languages.
Tree-sitter powers syntax highlighting on GitHub.com and (soon) neovim and OniVim 2. Hopefully regex-based syntax highlighting is a thing of the past soon. If you haven't seen the Strange Loop conference talk on tree-sitter [2] yet, it's worth a watch.
I think a Prettier-like code formatter using tree-sitter would be cool, both in terms of potentially broader language support and native performance.
[1]: https://tree-sitter.github.io/tree-sitter/
[2]: https://www.youtube.com/watch?v=Jes3bD6P0To
[+] [-] lvh|5 years ago|reply
If you can write code in a language, you can use semgrep. It also has a feature I have learned to love every time I find it in any kind of auditing tool: it’s ruthlessly effective as an exploratory and experimental tool, but it takes no effort at all to turn that into a persistent check. By comparison: ripgrep finds anything fast, but nobody uses it to write linters. Other off the shelf linters do a great job finding (simple) issues, but bandit doesn’t help me one bit to build a mental map of how a codebase works.
[+] [-] ievans|5 years ago|reply
[+] [-] dti|5 years ago|reply
Edit: There is a help entry: https://semgrep.dev/docs/faq/#how-is-semgrep-different-from-...
[+] [-] carlmr|5 years ago|reply
[+] [-] scanr|5 years ago|reply
It doesn't appear to catch the following when searching for exec(...) in the following python code:
Edited to include language[+] [-] ievans|5 years ago|reply
In your example, we don't propagate exec because it's not seen as a literal -- that's a TODO for sure. See https://github.com/returntocorp/semgrep/issues/1645 for a longer discussion!
[+] [-] unknown|5 years ago|reply
[deleted]
[+] [-] kevincox|5 years ago|reply
[+] [-] ievans|5 years ago|reply
Here are docs for what exists currently https://semgrep.dev/docs/experiments/#autofix
[+] [-] magicseth|5 years ago|reply
day = 'friday'
I want it to find
day="friday"
also!
[+] [-] lvh|5 years ago|reply
I'd expect latency might be juuust in the range where it doesn't feel interactive yet? But honestly any search that isn't ripgrep or --omg-optimized-etags feels like that to me now, and people use symbol rename features in IDEs all the time that take multiple seconds, so maybe I'm just unreasonably picky.
[+] [-] daghan|5 years ago|reply
[+] [-] daghan|5 years ago|reply