[ARFC] Manual Risk Agents (manual AGRS migration)

Simple summary

Introducing a new Risk Agents architectural setup to replace the previous manual AGRS (Aave Generalised Risk Stewards), making the different involved roles more explicit, and adding extra protection layers.



Motivation

The manual AGRS has been instrumental in executing risk parameter updates on the Aave protocol, handling in a constrained environment adjustments to supply/borrow caps, interest rates, and other risk-oriented configurations.

While successfully operational, there have been different improvements to be made on the current setup:

  • Roles are defined, but not totally explicit on the smart contracts layer. The system has three separate roles:

    • The growth party, identifying when certain updates could be done (ACI);
    • The risk party generating and executing the recommendation (Chaos Labs);
    • And finally, a technical/security party (BGD) verifies the consistency of the recommendation compared with what is disclosed publicly to the community.

    But in the current setup, these roles are not explicit in the smart contracts, with ACI being a proposer on the risk stewards Safe, Chaos Labs as recommendations-maker being a signer on that 2-of-2 Safe, and BGD is the other signer for verification.

  • Extra time-based constraints: timelock. Similar to any other steward contract, there are smart contract enforced limitation on all parameters to be applied, both quantitative (e.g., no more than x % increase/decrease of a parameter from the current value) and time-based (only updates every X days for a specific parameter). However, we think that complementary to the delay constraint, a timelock can give value for extra pre-execution visibility, and better protective mechanisms.

  • Different architecture compared with Risk Agents. Given the complexity of a global migration, only the automated Risk Stewards were migrated to the Risk Agents framework, while manual ones weren’t. However, there is no limitation for this, and having both types sharing the same infrastructure makes the overall mechanism simpler.

  • Lack of harmony with the set of risk providers. With the community having 2 risk providers engaged (and other bodies like the Aave Protocol Guardian), it is possible to have an architecture better adapted to it, involving better Chaos Labs and Llamarisk. On the newly proposed system, this can be achieved with the Risk Pilot and Risk Guardian explicit roles.

The new architecture and operational flow seek to apply all previous improvements.



Specification


Architecture & roles

On the new system, there are three roles defined, two mandatory and one optional:

  • Risk Pilot. The entity doing the recommendation, in technical terms, who controls the Risk Oracle component and pushes updates to it. To be assigned to Chaos Labs, to mirror the previous setup.
  • Risk Guardian. The entity that monitors updates submitted by the Risk Pilot. checks consistency with what was disclosed publicly, and can unilaterally cancel those updates during the timelock period, if they contain any issues. To be assigned to Llamarisk, and additionally, to the Aave Protocol Guardian as an extra layer of defence.
  • Growth Support (Optional, not on smart contracts). Party more from the growth side of the ecosystem, whose role is to notify the Risk Pilot about the appetite to initiate parameter updates. To be assigned to ACI, but given that it is not a smart contract role, multiple entities could assume this role depending on the parameters and needs.

In terms of operational flow, it will work the following way:

  1. (Optional) The Growth Support identifies the need to update a parameter (e.g., supply caps for asset X) and notifies the Risk Pilot to initiate update procedures on the Risk Oracle side.
  2. The Risk Pilot informs the community about the planned update via manual agents, produces the recommendation off-chain, and pushes it to the Risk Oracle. Additionally, the Risk Pilot must notify the main Risk Guardian about the intention to submit a recommendation before doing so.
  3. The Risk Agents automatically read the update from the Risk Oracle, validate it against configured constraints, and create a payload in the Permissioned Payload Controller contract to apply it to the Aave smart contracts. Timelock starts.
  4. During timelock, the main Risk Guardian validates the correctness and appropriateness of the recommendation.
  5. Finally, after the timelock is completed, the payload automatically executes, applying the risk parameter update to the Aave protocol. And the Risk Pilot informs the community about the application in production.



Technical components

Risk Oracle

The Risk Oracle serves as the data source for risk recommendations: the Risk Pilot registers them there, for Risk Agents to consume. Unlike the automated Risk Agent system, where updates are algorithmically generated and pushed, the manual system relies on Chaos to push updates manually to the Risk Oracle. The same Risk Oracle contract infrastructure is used, though, to mirror the same architecture and simplify both off-chain and on-chain flows.

Risk Agent

The Risk Agent contract (AgentHub and each Risk Agent contract, more info here) consumes updates from the Risk Oracle, validates them against configured constraints (stored in the RangeValidationModule), and automatically creates payloads for the Permissioned Payload Controller (PPC).
The Agent performs the same validation functions as in the automated system, but routes approved updates through the PPC rather than executing them directly on the Aave protocol.

Permissioned Payload Controller (PPC)

The PPC is the critical architectural difference from the automated system. It implements a timelock mechanism on top of risk updates, creating a mandatory waiting period during which:

  • Service providers can validate the proposed changes
  • The cancellation guardian can intervene if issues are identified
  • The community has visibility into pending updates

Once a payload is created in the PPC, it enters a 1-day timelock before becoming executable.

This is the same contract already in use in production, for example, for the Aave Ink (Tydro) white-label instance.

This is also the only smart contract in the setup holding RISK_ADMIN role to apply updates.

Granular Access Control PPC

GranularAccessControlPPC sits between the Risk Agent system and the Permissioned Payload Controller. The main role of GranularAccessControlPPC is to give granular roles on the PPC, thus allowing multiple Risk Agent contracts to independently hold PAYLOADS_MANAGER_ROLE and multiple security actors to independently hold CANCELLATION_ROLE, without requiring changes to the PPC itself.

  • Each Risk Agent contract is granted PAYLOADS_MANAGER_ROLE, allowing it to submit payloads on the PPC through GranularAccessControlPPC.
  • Llama Risk / Protocol Guardian is granted CANCELLATION_ROLE, allowing it to cancel any pending payload during the timelock period.



Agents constraints

The migrated manual Risk Steward system will maintain all existing risk parameter constraints and minimum time delays from the current manual AGRS system, ensuring no reduction in safety guarantees. The only change will be to adapt to the extra timelock, reducing the delay by the equivalent amount (e.g., instead of a 3-day delay, a 2-day delay plus 1 day of timelock).

Risk Param Max Change Allowed Minimum Delay
Supply / Borrow Cap 100% (relative change) 3 days (2 delay + 1 timelock)
Base Interest Rate 1% (absolute change) 3 days (2 delay + 1 timelock)
Slope1 Interest Rate 1% (absolute change) 3 days (2 delay + 1 timelock)
Slope2 Interest Rate 20% (absolute change) 3 days (2 delay + 1 timelock)
UOptimal Interest Rate 3% (absolute change) 3 days (2 delay + 1 timelock)
LTV 0.5% (absolute change) 3 days (2 delay + 1 timelock)
LT 0.5% (absolute change) 3 days (2 delay + 1 timelock)
LB 0.5% (absolute change) 3 days (2 delay + 1 timelock)
Debt Ceiling 20% (relative change) 3 days (2 delay + 1 timelock)
LTV (for eModes) 0.5% (absolute change) 3 days (2 delay + 1 timelock)
LT (for eModes) 0.1% (absolute change) 3 days (2 delay + 1 timelock)
LB (for eModes) 0.5% (absolute change) 3 days (2 delay + 1 timelock)
Stable price cap 0.5% (relative change) 3 days (2 delay + 1 timelock)
LST Cap adapter params 5% (relative change) 3 days (2 delay + 1 timelock)
Pendle Discount Rate 2.5% (absolute change) 2 days (1 delay + 1 timelock)

*All previous constraints are submitted to changes before AIP, if so recommended by the Risk Pilot (Chaos Labs)




Additional SupplyBorrowFreezeSteward

One allowed flow previously by stewards was to, in emergency scenarios, set borrow and/or supply caps to the minimum (1), reducing the exposure without any damage to positions.

With the introduction of the new timelock, this flow would be submitted to a 1-day timelock, which in those scenarios is not ideal. So we will introduce a SupplyBorrowFreezeSteward for the Aave Protocol Guardian to be able to execute the same procedure, after the re-architecturing of stewards.




Optional application to all Risk Agents

A follow-up step on this proposal is to apply the same architecture to all types of Risk Agents, including the automated ones.
However, given that speeds consideration on those should probably be more granular (e.g., different timelocks), our proposal focuses first on the migration of the manual side. Making all types homogeneous will be straightforward, given that the architecture and flows will be pretty much the same.



Next Steps

  1. Publication of a standard ARFC, collect community & service providers’ feedback before escalating proposal to ARFC snapshot stage.
  2. If the ARFC snapshot outcome is YAE, publish an AIP vote for final confirmation and enforcement of the proposal.
2 Likes

We appreciate @bgdlabs for bringing this proposal forward and addressing a genuine gap in the current Risk Stewards architecture. With @bgdlabs’ and @ACI’s upcoming departures, revisiting the governance structure around delegated risk authority is both timely and necessary. The proposal makes meaningful progress on role clarity and formalizes @LlamaRisk’s involvement, which is a welcome step.

That said, we have significant reservations about the proposed design that we believe warrant further discussion before this moves to snapshot.

The 1-day timelock undermines the core purpose of manual Risk Stewards. The entire value of Risk Stewards lies in rapid response to time-sensitive conditions: capping supply/borrow to 1 during exploits, raising caps to meet sudden demand, or adjusting interest rate parameters during utilization spikes. A mandatory 1-day delay before execution renders these actions ineffective. The proposed workaround of routing emergency freeze actions through a separate SupplyBorrowFreezeSteward, limited to the Protocol Guardian, fragments authority rather than solving the problem and does not cover the full range of time-sensitive actions Risk Stewards need to perform. Similarly, the Slope2 Risk Oracle is now constrained to act within a 2% range per 8 hours, meaning a Risk Steward may need to intervene manually during severe utilization spikes, an intervention that this proposal would delay by a full day.

The proposal gives LlamaRisk a veto but not a voice. Under this design, @LlamaRisk can only cancel during the timelock window. We cannot co-propose, co-sign, or initiate parameter changes. A co-signing model where any two of three parties must affirmatively agree is a strictly superior design: it provides the same protective guarantees as a veto mechanism, but through collaborative approval rather than adversarial blocking, and without requiring any of the operational delays or additional smart contract infrastructure that a timelock-plus-cancellation architecture introduces.

Automated Risk Oracles are left unaddressed. The proposal explicitly scopes itself to manual AGRS only. The automated Risk Agents, which have required the most scrutiny in light of recent events, are left entirely untouched. We believe automated Risk Agents require their own dedicated discussion. For that context, we propose a short timelock on the order of hours, calibrated per agent type, long enough for independent monitoring and subsequent blocking action to catch incidents like the March 10th liquidations before damage occurs, but short enough to preserve the operational and speed advantages that automation is meant to provide.

What we propose instead is restructuring Risk Stewards as a 2-of-3 co-signing multisig comprising @ChaosLabs, @LlamaRisk, and @AaveLabs. Any two of three parties can approve an update, with immediate execution upon co-approval, no timelock required. This preserves the speed that makes Risk Stewards valuable, adds genuine independent oversight, and ensures it is clean, fair, and representative of the three primary parties engaged in Risk Steward operations. Most importantly, this requires no new smart contract infrastructure beyond a signer rotation on the existing multisig. The proposed timelock, veto, Permissioned Payload Controller, and GranularAccessControl architecture introduces significant and unnecessary complexity to achieve a strictly weaker result than what a simple multisig reconfiguration provides out of the box.

We invite @AaveLabs and @ChaosLabs to weigh in on this 2-of-3 co-signing configuration so that we can collectively align on the right path forward before this proposal moves to snapshot.

We would like to clarify some points from @LlamaRisk feedback, clearly arising from ignorance of how both the current system and the proposed one work:

  • The core purpose of the manual Risk Stewards is not The entire value of Risk Stewards lies in rapid response to time-sensitive conditions, and has never been. Risk Stewards and Risk Agents are about governance minimisation and protocol dynamism. Examples:
    • Reducing borrow caps to 1, while used in emergencies, it is technically a protective action that belongs more to the Aave Protocol Guardian, than to risk stewards. That is not the case at the moment, due to not having in-protocol, the role granularity achieved with the newly proposed SupplyBorrowFreezeSteward .
    • Speed of raising caps to meet demand is not the core goal of the Stewards/Agents. Those aggressive increases that are usually required are a consequence of not having extra “room” on caps defined in advance. And additionally, increasing supply caps is not in any case any type of emergency flow. To be more precise, historically supply cap increases happen consecutively after market inceptions, meaning that the factual timelock is of 3 days, exactly as in the proposed model for that scenario.
    • The example of 8h updates on Risk Agents is precisely where a timelock, even if potentially shorter for that part of the infrastructure, would have major value. With the meaningful and constructive discussion between risk providers on how to define those meta-parameters (delay and timelock).
  • “Veto, but not a voice”. The reasons for having clearly segregated roles are not arbitrary, but are based on both rationale and past experience on precisely the same topic on Aave:
    • Having 2 parties proposing on the same parameters with different recommendation criteria, creates a paradoxical scenario: the parties very rarely will agree, because their models must differ, on frequently non-binary/continuous configurations. With a 2-of-3 mechanism where 2 parties “recommend”, the outcome will be precisely adversarial between the two risk providers, and potentially create a difficult situation for the third party, because there are two experts who don’t agree, and there is no “ground truth”.
    • That setup existed in the past between Gauntlet and Chaos Labs, and 1) didn’t work, creating severe damage to Aave’s operations, 2) created an adversarial scenario between those two contributors.
    • Veto is the most independent voice, and the most aligned with governance. The proposed Risk Guardian role is as important as the Risk Pilot, with big responsibility acting as a risk defense layer of the protocol. The goal of a framework is to pre-define rules (e.g., public notification before parameter submission, informing SPs partners, consistency on parameters disclosure, etc), that then each entity with each role follows as respects.
    • There is no limitation at all for risk providers to, in the future, add granularity and distribute Risk Pilot and Risk Guardian responsibilities; for example, for some parameters Chaos Labs could be Pilot and Llamarisk Guardian, while in others, the reverse. Surprisingly, @LlamaRisk raises this when we had communications with them, where we explained that the Risk Agents framework within Aave is abstracted from “who” assumes roles.
  • About complexity. The core of our proposal is precisely avoiding complexity, using the same type of architecture for multiple flows (automated, manual), and even reusing existing smart contract components. However, the proposed 2-of-3 will just increase the complexity of maintenance, which, unfortunately, is not the outcome we would like once we are not contributors.
  • Regarding Automated Risk Oracles left unaddressed. It is not clear how to interpret this, given that this proposal clearly states that this is an infrastructural migration of the manual AGRS sub-system (it is in the title), and also that we included a section about “Optional application to all Risk Agents”.
    The same model could be applied for any type of agents, no matter which parties are Pilots, Guardians, or Growth Support.



Given the clear (and frankly strange) opposition to the framework of migration of the manual stewards by one of the participants, we think the reasonable path is for other members of the community to propose and move forward with an alternative on their own.

We would like to highlight (something we did privately when we asked for feedback) that, as of 1st April 2026, when our engagement ends, BGD Labs will not participate anymore as a verification party on the existing risk stewards. So we recommend that the parties involved move accordingly to not create an operational blocker of the system.

1 Like

That is a bit harsh, don’t you think?

As a DAO Member, I see both sides converging on a legitimate concern, but from different directions. So, If I’m getting this right purely from governance risk cohesion.

On one side, there is a clear need to preserve a coherent, role-based execution framework. Chaos Labs has been moving toward Risk Agents for some time to reduce reliance on discretionary multisig coordination and to make responsibilities more explicit. In that sense, a parallel execution paths or broadly scoped co-signing could reintroduce ambiguity around accountability.

At the same time, I also understand exactly where LlamaRisk’s position is coming from. Their mandate is not reactive veto, but pre-risk oversight, independent validation, and visibility into risk assumptions, hopefully including offchain logic. Especially after the March 10 wstETH/CAPO incident, it is reasonable for them to argue that, under the current structure, this mandate is not fully enforceable.

So, I don’t really know how to go about it personally. On one hand, I agree with you @bgdlabs that having two risk providers actively proposing on the same surface area could create operational friction and accountability issues. On the other hand, I also understand why @LlamaRisk does not want its role reduced to reviewing or canceling actions after the fact, particularly when the value they are meant to provide is upstream of risk execution.

One of my quick thought after reading this was separation of concerns. Maybe some Markets/Hubs goes to this risk providers and others to the other one. After all, at 100Billion, we can’t expect 1 providers to do it all. I also considered whether a split between “high-risk” actions requiring co-signing or co-lead and “low-risk” actions remaining under a Chaos-led Risk Pilot model could work. But the issue there is that it creates a parallel governance path, and the boundaries between high-risk and low-risk actions would likely become subjective and contested over time. Especially if Veto power is supposed to be the most independent voice acting as a risk defense layer of the protocol. So, hmmmm.

Anyway, I am ultimately closer to BGD’s direction here, but I think LlamaRisk is pointing to a real structural issue that should not be ignored. More importantly, the back-and-forth tone is not productive. The community does not need escalation between service providers right now.

If the Risk Guardian is expected to serve as a meaningful safeguard, then it must have access to the full context behind a proposal, including offchain logic, assumptions, methodology, and supporting analysis. Without that, the role is not truly useful. You cannot expect independent oversight from a party that does not have visibility into the full risk decision-making process.

Hi @JosueMpia .

As commented on one of our previous points, our proposal reproduces the current setup 1:1 while introducing Llamarisk with a new role to provide value. So it was more technical in nature than anything else: currently on manual AGRS, there is a verification role (ourselves BGD), which will not exist anymore, and that could be replicated and improved by having an extra party with both validation and risk-specific expertise.

It is and will be very problematic for the community whenever roles are not properly defined in workflows. There are countless examples of workflows where proper role assignment has major operational advantages, from asset listing to network expansions to product releases, etc.
And even more, role granularity allows for different entities to assume different roles in different projects, or even within the project. E.g., what we mentioned both here publicly and in the past to all risk providers, that the Stewards/Agents framework, there is no limitation for different parties to “exchange” different roles depending on the network or parameters, prior framework definition.

There is a big misconception regarding multi-party multi-sigs. Exceptions aside, they are not a coordination tool for non-binary actions; they are a security tool for a party to propose those actions, and everybody else verifying them before execution. Indeed, the property of all multisig signers being able to propose actions is desired in emergency systems (sometimes), but this particular risk steward/agents framework has never been for that: it was created because years ago, there were tens of governance proposals every week for tweaking hundreds of parameters; so it needed operational (constraint) optimisation.
What will happen in practise is that with a 2-of-3 system where two or three entities need for every parameter to run risk modelling (without doing that, the setup is meaningless), the parameters will stop being scalable. We don’t think that is a desired outcome considering having both a multi-network v3 production system and the upcoming v4.

That being said, there is really no problem if the community and other providers want to proceed differently. We propose architectural/operational solutions oriented to a multi-contributor DAO, if those contributors don’t like the system, there is no reason to adopt it, and they should find and activate alternatives.

2 Likes

Following the departures of Chaos Labs and BGD, we confirm our formal request to transfer the ownership of the Risk Council multisigs (2/2, on all instances) to the following addresses:

@AaveLabs: 0x606dC57cd166643760E049609bfd1D8a698D3bAc
@LlamaRisk: 0xbA037E4746ff58c55dc8F27a328C428F258DDACb

LlamaRisk will ensure the continuity of Risk Stewards’ operations and jointly operate with Aave Labs, ensuring strict processes for proposing, testing, and submitting changes.

4 Likes

Confirming this behalf of Aave Labs.

@LlamaRisk Do you mind share the new Architecture & roles Visual MAP whenever you guys have some time? Just want to see who does what and who control what now. Are we keeping this one but make @AaveLabs the new guardian? and you guys the new pilot? Let me know anytime you guys get a chance.

1 Like