OPA is a great tool for implementing a policy-as-code system. But if you're trying to use it for application authorization (e.g. fine-grained authz for B2B SaaS or a set of internal applications), you may find that its policy story is strong, but it doesn't really have a "data plane": you either store data in a data.json file and rebuild the policy any time that data changes, or make an http.send call out of the policy to fetch dynamic data.
Check out Topaz [0], which uses OPA as its decision engine, but adds a data plane that is based on the ReBAC ideas explored in the Google Zanzibar [1] paper.
Disclaimer: I work on the team [2] that builds and maintains the Topaz project.
Bundle servers provide a centralized "data plane" decoupled from the distributed component (OPA). You don't need to rebuild your policy any time data changes. Just push a new bundle with the data that changed, and OPA will fetch it as configured — either periodically or directly if long polling is configured.
This feels very much like OpenFGA[0]. I've been evaluating authorization tool for one of my side projects and honestly most tools feels like creating relationships in a graph-like database and querying to see if there is/isn't relationship between two entities. Is there more to this (besides the implementation details) or am I missing something from these tools?
My team is using OPA in a re-build of an application that we support. One of the main goals of the rebuild is to ensure we don't end up in a situation where every little rule change (including UAM changes) requires a full rebuild/deploy cycle of the app.
OPA replaces a complex hard-coded, and largely inscrutable UAM model with a (still complex), but flexibly defined, independently testable, and easily inspectable single-responsibility model.
I like that OPA has built in support for testing rulesets. The partial evaluation feature is amazing, ands makes it easy to apply UAM filters to endpoints that return large sets of data (we have consistent query APIs across the app, so could do this with a relatively simple OPA-aware proxy).
It's not all sunshine and roses, and the result might seem overly complex for a lot of use cases, but in our case I think OPA has provided a nice clean abstraction and enabled us to disentangle our UAM from the rest of our code and move more quickly overall.
Curious if you have any lessons learned worth sharing. Our journey was like
1: Yeah, we can use OPA to get rid of all this legacy spaghetti code!
2: Wow this PoC really proves out the idea!
3: Whoa we have three use cases now running in production!
4: Wait, these remaining 20 use cases are way more complex. To our surprise, all this legacy spaghetti code _exists for a reason_.
5: We now have 5 use cases in production but the Rego is now quite convoluted and our application logic has actually increased in complexity.
6: Red button: okay this is going horribly wrong. Back out this whole thing.
7: Recognition: the reason this has gone horribly wrong is because the spaghetti code combines pure logic and side effects in a way that did not map well with OPA.
8: Regroup: first step is to refactor all the legacy code and separate policy logic from side effects in a meaningful way.
9: Refactor: implement the above redesign. The policy classes all now map naturally to Rego for all 23 use cases! Let's do it!
10: Reality: we don't want to. Our codebase is well-structured now and we like it. Adding OPA now feels like an unnecessary layer, an additional potential for network timeouts etc to creep in, an extra thing to maintain, an extra special case to handle in our safe deployment pipeline, an extra language to train developers on. Now _maybe_ if we ever wanted other teams to write up and maintain their own Rego policies, then _maybe_ we'd consider going with it in the future, but for now the reality is our team would end up doing that work for them anyway, and it doesn't seem worth the tradeoff.
Anyway, lesson learned: don't expect it to magically clean up all the garbage in your existing code. You'll do it wrong and things will be worse than when you started. Clean that up first, and _then_ decide whether and how you want to adopt OPA for your remaining needs.
We're currently evaluating OPA for adding RBAC to our open-source application [0]. We plan on using the Go API [1] and doing the policy eval directly in our app since our app is also written in Go.
The thinking is we'll have some basic built-in policies (like admins can do X, editors can do Y, etc) but also allow users to configure their own policies if they want by writing rego and loading their policy rules at startup time (via config). We'd document the inputs that we pass to the evaluation call such as request headers, IP, role, etc.
I'm curious if anyone has ever tried something like this or similar?
I have found OPA to be a fairly reliable and performant system in production. We were able to build a scalable RBAC solution that used OPA as evaluators. We had around 40k OPA instances serving around 350K qps with p99.9 hovering around 6ms.
OPA, or rego? My experience working for Styra was that most people seemed to grok where OPA fit in fairly quickly, but struggled with rego. It's a very powerful language and well worth learning I think, but it's an investment for sure.
I work in a highly regulated environment and evaluated using Cedar or OPA.
The biggest advantage to OPA was the flexibility. This enabled not just an authorization decision, but the why behind it. No more questions of why did this person/system gain (or was denied) access, combing through dozens of rules to find the matching statements. Just pull up the log and read the results… This is incredibly useful during audits.
Cedar could not provide that level of detail (or so I was told by AWS representatives selling their hosted version).
OPA is much more wide ranging. You can use it for permissions, sure, but also just about anything else you can imagine. I think that makes it much more compelling as a technological investment.
I tried to implement some simpler cases with the policy language, Rego (https://www.openpolicyagent.org/docs/latest/policy-language/), of OPA and found it overly cumbersome. A simple check like "if user is in group A and in group C, but must not be in group C" is hard to express in this language. It would be a trivial task in any somewhat decent programming language (e.g. JavaScript).
I understand why restricting the possibilities with an external DSL might be a good idea, but I consider Rego to be to restricted. I mean, in the the a policy is just a function saying basically "yes" or "no" (I know, it's not that simple with OPA, but it boils down to access yes/no, anyway).
OPA and its derivative projects really brought the idea of decoupled authorization as a viable option. It is a very powerful tool which can be applied to many layers of the architecture - from Kubernetes Admission Controllers being based on it through to network level authorization and up the full stack.
One area that is a constrained and narrow use case is around the actual application level permissions - eg what a user can do inside of your service. Having hand-rolled this in various companies - and the inevitable rebuilds that were required as requirements change such as adding a new, product packaging updates etc - you do end up with a complex web of logic - ether in your codebase or as Rego.
For these application level permissions - where the requirements really come from the product/business rather than engineering - I always felt there could be a simpler way of defining this rules. Policies needed to be in a format a business user could understand, and enforcing them needs to be extremely responsive as checks are in the blocking path of every request - and this needs to work at large scale - all whilst making every decision auditable to tick all the regulatory and compliance needs around access controls.
To this effect we begun working on Cerbos[0] a few years ago which initially targets that one specific use case - models policy in simple YAML [1] (love it or hate it!) and takes a stateless approach meaning it is infinitely scalable with none of the headache of synchronizing information about your users or resources to the authZ layer, also critically generates that single audit log of decisions.
Disclaimer: I work on the team that builds and maintains Cerbos[2].
1. Define policies using declarative language Rego
2. Deploy OPA alongside your service as a sidecar in Kubernets
3. Make your service queries OPA when it needs to make policy decisions, passing the current state/context as input.
4. OPA evaluates the policies written in Rego against the input and returns a decision (allow or deny) back to your service.
Found it's hard to convince everyone around to use OPA/Rego and wrap into a managed service. The main objection - wrapping another DSL (domain-specific language) is hard.
However it was relatively simple to convince my team to use featured complete Go library Ladon https://github.com/ory/ladon
All policies are loaded on the app start, stored in memory (not DB) and checked with the help of small middleware which triggered the following function.
Very negligible perfomance hit. Code is very simple, hackable, and can be subject for further optimisations.
Ladon is very fast. It's possible to run all user groups against all CRUD routes, and get the basic permission matrix or build some simple UI forms to test condition for better control.
P.s. Feel free to ping me in private @reactima (github, telegram) if you want to discuss the edge cases for the above.
For application authorization, Oso is a compelling solution. (Disclaimer: I work for Oso). It provides a DSL and a prescriptive, but flexible data model that are capable of modeling RBAC, ReBAC, ABAC, or whatever else you'd like to model. Obviously I'm biased, but I think it strikes a great balance between opinion and flexibility.
One significant complication that all centralized authorization solutions share is that you end up needing to reproduce application data in the authorization system. We've been doing a lot of work in this area to simplify data management and have some beta functionality available. I'll include some links to the docs for those.
ogazitt|1 year ago
Check out Topaz [0], which uses OPA as its decision engine, but adds a data plane that is based on the ReBAC ideas explored in the Google Zanzibar [1] paper.
Disclaimer: I work on the team [2] that builds and maintains the Topaz project.
[0] https://www.topaz.sh
[1] https://research.google/pubs/zanzibar-googles-consistent-glo...
[2] https://www.aserto.com
andyroid|1 year ago
https://www.openpolicyagent.org/docs/latest/management-bundl...
shriek|1 year ago
[0] https://openfga.dev/
FLT8|1 year ago
OPA replaces a complex hard-coded, and largely inscrutable UAM model with a (still complex), but flexibly defined, independently testable, and easily inspectable single-responsibility model.
I like that OPA has built in support for testing rulesets. The partial evaluation feature is amazing, ands makes it easy to apply UAM filters to endpoints that return large sets of data (we have consistent query APIs across the app, so could do this with a relatively simple OPA-aware proxy).
It's not all sunshine and roses, and the result might seem overly complex for a lot of use cases, but in our case I think OPA has provided a nice clean abstraction and enabled us to disentangle our UAM from the rest of our code and move more quickly overall.
funkyuser|1 year ago
1: Yeah, we can use OPA to get rid of all this legacy spaghetti code!
2: Wow this PoC really proves out the idea!
3: Whoa we have three use cases now running in production!
4: Wait, these remaining 20 use cases are way more complex. To our surprise, all this legacy spaghetti code _exists for a reason_.
5: We now have 5 use cases in production but the Rego is now quite convoluted and our application logic has actually increased in complexity.
6: Red button: okay this is going horribly wrong. Back out this whole thing.
7: Recognition: the reason this has gone horribly wrong is because the spaghetti code combines pure logic and side effects in a way that did not map well with OPA.
8: Regroup: first step is to refactor all the legacy code and separate policy logic from side effects in a meaningful way.
9: Refactor: implement the above redesign. The policy classes all now map naturally to Rego for all 23 use cases! Let's do it!
10: Reality: we don't want to. Our codebase is well-structured now and we like it. Adding OPA now feels like an unnecessary layer, an additional potential for network timeouts etc to creep in, an extra thing to maintain, an extra special case to handle in our safe deployment pipeline, an extra language to train developers on. Now _maybe_ if we ever wanted other teams to write up and maintain their own Rego policies, then _maybe_ we'd consider going with it in the future, but for now the reality is our team would end up doing that work for them anyway, and it doesn't seem worth the tradeoff.
Anyway, lesson learned: don't expect it to magically clean up all the garbage in your existing code. You'll do it wrong and things will be worse than when you started. Clean that up first, and _then_ decide whether and how you want to adopt OPA for your remaining needs.
bullcitydev|1 year ago
The thinking is we'll have some basic built-in policies (like admins can do X, editors can do Y, etc) but also allow users to configure their own policies if they want by writing rego and loading their policy rules at startup time (via config). We'd document the inputs that we pass to the evaluation call such as request headers, IP, role, etc.
I'm curious if anyone has ever tried something like this or similar?
[0] https://github.com/flipt-io/flipt
[1] https://www.openpolicyagent.org/docs/latest/integration/#int...
fireflash38|1 year ago
samarthr1|1 year ago
rsashwin|1 year ago
You can find our talk here https://www.styra.com/resources/videos/snap-inc--snaps-journ...
arccy|1 year ago
unknown|1 year ago
[deleted]
alator21|1 year ago
mstade|1 year ago
pushedx|1 year ago
I found it relatively easy to use and at a good level of abstraction to make the policies relatively reusable.
ipython|1 year ago
the_newest|1 year ago
The biggest advantage to OPA was the flexibility. This enabled not just an authorization decision, but the why behind it. No more questions of why did this person/system gain (or was denied) access, combing through dozens of rules to find the matching statements. Just pull up the log and read the results… This is incredibly useful during audits.
Cedar could not provide that level of detail (or so I was told by AWS representatives selling their hosted version).
biggestlou|1 year ago
charlieegan3|1 year ago
cryptos|1 year ago
I understand why restricting the possibilities with an external DSL might be a good idea, but I consider Rego to be to restricted. I mean, in the the a policy is just a function saying basically "yes" or "no" (I know, it's not that simple with OPA, but it boils down to access yes/no, anyway).
charlieegan3|1 year ago
alex-olivier|1 year ago
One area that is a constrained and narrow use case is around the actual application level permissions - eg what a user can do inside of your service. Having hand-rolled this in various companies - and the inevitable rebuilds that were required as requirements change such as adding a new, product packaging updates etc - you do end up with a complex web of logic - ether in your codebase or as Rego.
For these application level permissions - where the requirements really come from the product/business rather than engineering - I always felt there could be a simpler way of defining this rules. Policies needed to be in a format a business user could understand, and enforcing them needs to be extremely responsive as checks are in the blocking path of every request - and this needs to work at large scale - all whilst making every decision auditable to tick all the regulatory and compliance needs around access controls.
To this effect we begun working on Cerbos[0] a few years ago which initially targets that one specific use case - models policy in simple YAML [1] (love it or hate it!) and takes a stateless approach meaning it is infinitely scalable with none of the headache of synchronizing information about your users or resources to the authZ layer, also critically generates that single audit log of decisions.
Disclaimer: I work on the team that builds and maintains Cerbos[2].
[0] https://github.com/cerbos/cerbos
[1] https://play.cerbos.dev/p/XhkOi82fFKk3YW60e2c806Yvm0trKEje
[2] https://cerbos.dev
_joel|1 year ago
There's some other interesting work with spiffe/spire that I've been investingating for $WORK, could be useful to some on this path https://spiffe.io/docs/latest/microservices/envoy-opa/readme...
mstade|1 year ago
charlieegan3|1 year ago
https://docs.styra.com/regal
Disclaimer: I work on this but it’s free, & open source!
adeptima|1 year ago
1. Define policies using declarative language Rego
2. Deploy OPA alongside your service as a sidecar in Kubernets
3. Make your service queries OPA when it needs to make policy decisions, passing the current state/context as input.
4. OPA evaluates the policies written in Rego against the input and returns a decision (allow or deny) back to your service.
Found it's hard to convince everyone around to use OPA/Rego and wrap into a managed service. The main objection - wrapping another DSL (domain-specific language) is hard.
However it was relatively simple to convince my team to use featured complete Go library Ladon https://github.com/ory/ladon
Ladon is inspired by AWS IAM Policies.
{
}All policies are loaded on the app start, stored in memory (not DB) and checked with the help of small middleware which triggered the following function.
func (l *Ladon) DoPoliciesAllow(r *Request, policies []Policy) (err error)
https://github.com/ory/ladon/blob/972387f17e29c529ad3ff42a84...
Very negligible perfomance hit. Code is very simple, hackable, and can be subject for further optimisations.
Ladon is very fast. It's possible to run all user groups against all CRUD routes, and get the basic permission matrix or build some simple UI forms to test condition for better control.
P.s. Feel free to ping me in private @reactima (github, telegram) if you want to discuss the edge cases for the above.
gsarjeant|1 year ago
One significant complication that all centralized authorization solutions share is that you end up needing to reproduce application data in the authorization system. We've been doing a lot of work in this area to simplify data management and have some beta functionality available. I'll include some links to the docs for those.
Sync and reconcile data: https://www.osohq.com/docs/guides/data/sync-data#initial-syn... Filter lists with decentralized data (about halfway down): https://www.osohq.com/docs/guides/enforce/filter-lists
yencabulator|1 year ago