Introducing Semgrep and r2c

[+] rtsao|5 years ago|reply

It's great to see more tools adopting tree-sitter [1].

Having a (fast) single tool that can accurately parse most commonly used programming languages is incredibly useful, but it requires the maintenance of dozens of grammars, which is difficult without a large community effort. Hopefully increased adoption means more accurate parsers and support for even more languages.

Tree-sitter powers syntax highlighting on GitHub.com and (soon) neovim and OniVim 2. Hopefully regex-based syntax highlighting is a thing of the past soon. If you haven't seen the Strange Loop conference talk on tree-sitter [2] yet, it's worth a watch.

I think a Prettier-like code formatter using tree-sitter would be cool, both in terms of potentially broader language support and native performance.

[1]: https://tree-sitter.github.io/tree-sitter/

[2]: https://www.youtube.com/watch?v=Jes3bD6P0To

[+] lvh|5 years ago|reply

We've been working with the r2c folks for a while, and been using semgrep since before it was called semgrep.

If you can write code in a language, you can use semgrep. It also has a feature I have learned to love every time I find it in any kind of auditing tool: it’s ruthlessly effective as an exploratory and experimental tool, but it takes no effort at all to turn that into a persistent check. By comparison: ripgrep finds anything fast, but nobody uses it to write linters. Other off the shelf linters do a great job finding (simple) issues, but bandit doesn’t help me one bit to build a mental map of how a codebase works.

[+] ievans|5 years ago|reply

Hey HN, I’m the author of this post and a contributor to Semgrep. Happy to answer questions and hear feedback! I’m excited to try to lower the barrier to writing a simple lint (or more complex program analysis) that previously only a static analysis expert could do; we’ve gotten contributions from people who don’t know what an abstract syntax tree is! The userbase for Semgrep is almost evenly split between security engineers using it for hunting/enforcement and developers looking for bugs; we’ve tried to collect examples for both use cases at https://semgrep.dev/explore.

[+] dti|5 years ago|reply

Is Semmle, offering CodeQL language and LGTM service, and recently acquired by Github, doing a similar thing (https://semmle.com/)? If so, how does Semgrep compare to CodeQL?

Edit: There is a help entry: https://semgrep.dev/docs/faq/#how-is-semgrep-different-from-...

[+] carlmr|5 years ago|reply

First of all, I love the idea of semgrep, but can't use it since we're using C++. Is there any chance for C++ support in the future?

[+] scanr|5 years ago|reply

Interesting. You can try it out here: https://semgrep.dev/editor/

It doesn't appear to catch the following when searching for exec(...) in the following python code:

    not_exec = exec
    not_exec('rm -rf /')

Edited to include language

[+] ievans|5 years ago|reply

Good catch. Currently we only support constant propagation for literals. Here's a working example:

    $ semgrep -e "not_exec('somestr)"

will match

    foo = "somestr"
    not_exec(foo)

Here's a more complete example: https://semgrep.dev/s/ievans:const-python

In your example, we don't propagate exec because it's not seen as a literal -- that's a TODO for sure. See https://github.com/returntocorp/semgrep/issues/1645 for a longer discussion!

[+] unknown|5 years ago|reply

[deleted]

[+] kevincox|5 years ago|reply

The CI use case is cool, and probably makes more money. But I would really love to see a CLI for optimized search and replace. It seems that they have search available on the CLI however I can't see any replace. And most of the options are focused on running the rule config instead of adhoc replacements.

[+] ievans|5 years ago|reply

The CLI does have an --autofix flag, but the replacement it uses has to be specified through a local config file rather than as a command line arg. There is a ticket that though! https://github.com/returntocorp/semgrep/issues/840

Here are docs for what exists currently https://semgrep.dev/docs/experiments/#autofix

[+] magicseth|5 years ago|reply

I would love this in my editor: if I search for

day = 'friday'

I want it to find

day="friday"

also!

[+] lvh|5 years ago|reply

If you use VSCode you can get that today, if you use something else it doesn't look too hard to write: https://semgrep.dev/docs/integrations/#editor

I'd expect latency might be juuust in the range where it doesn't feel interactive yet? But honestly any search that isn't ripgrep or --omg-optimized-etags feels like that to me now, and people use symbol rename features in IDEs all the time that take multiple seconds, so maybe I'm just unreasonably picky.

[+] daghan|5 years ago|reply

I created a semgrep rule for this: https://semgrep.dev/s/GRD6/?version=develop

[+] daghan|5 years ago|reply

This is actually a great idea.

21 comments