[Resolved] Reward Incident for SubQuery Kepler Network

As part of the SubQuery’s Council’s dedication to openness and in communication to the community, we publish a full incident report below.

Leadup

  • Multiple indexers reported they could not claim rewards via our Forum.
  • Due to a vague error message on our Explorer app, it was hard to immediately identify the root cause for the issue.
  • The SubQuery Network developer team then published a new patch release to collect the internal errors on the explorer app to identify the root issue.

Root Cause

  • There’s a miscalculation in reward distributor contract that affects agreements that span across multiple Eras.
  • As a result, the distributed reward is greater than the expected reward, which means that the sum or rewards were greater than the deposited kSQT within the reward contract.
  • SubQuery Network developers found the kSQT balance in Reward Contract was nearly empty, and because of this, Indexers were unable to claim additional rewards.

Impact

  • The exact number of extra rewards claimed is contained to less than $5,000 USD worth of kSQT as the Council were careful to tightly control the amount of kSQT in the reward contract in case exactly this issue occurred.
  • Although our initial investigation indicated there was no potential future loss, further investigation showed a potential possibility for future loss, as a result the SubQuery Kepler Network was placed into Maintenance mode from 10:19 10th June to 01:31 13th June (UTC) while this incident was better understood and to avoid further issues.

Resolution

  • A fix (#206) was coded extremely quickly on Friday (9th) and refined and tested over the weekend
  • A corrective data fix has been calculated and tested to recover excess rewards where possible. This includes:
    • Fixing or deducting uncollected reward amounts to what they should be
    • For some indexers that have collected and claimed excess rewards so far, their Era 4 rewards will be 0.

Lessons learned

  • We need to give more data to Indexers and Delegators on the particular stack trace of a error seen on the client. We’re going to start logging a lot more to the browser console to help you all here.
  • The team need to review the unit tests and improve the test coverage for the contract codes to cover more real-world cases, especially for the reward contracts.
  • Relatively quick response to the incident from the team considering complications with contacting members who were offline over the weekend and the extensive data collection that was required to calculate expected v actual rewards
  • Better monitoring and analytics for the Kepler network is required, especially staking and reward contracts
  • Include at least one indexer from the team in the sponsored plan set
  • Needs to engage more with the community, so that we can identity the issues from community more earlier and provide better support to Indexers and Delegators

Full timeline (all times are in UTC)

  • [2023-06-09T01:00Z] First confirmation of issue
  • [2023-06-09T03:00Z] SubQuery Network Developers meet to discuss what could be the potential errors in the contract may cause the incorrect value in the rewards.
  • [2023-06-09T08:00Z] A fix (#206) for the contract was started on and SubQuery Network Developers began working on a test to reproduce the reward miscalculation, and confirm the fix resolve the future reward calculation.
  • [2023-06-10T01:00Z] SubQuery Network team meet to discuss possible solutions to calculate and fix incorrect on-chain reward data
  • [2023-06-11T10:19Z] The SubQuery Kepler Network was placed into Maintenance mode to avoid further issues.
  • [2023-06-12T02:00Z] A recommended course of action is agreed upon by the SubQuery Council. SubQuery Network Team continue calculating data required by the fix, contract upgrades, transactions, and corrective data to insert/update
  • [2023-06-13T01:31Z] Incident resolved and Maintenance mode was disabled for The SubQuery Kepler Network
4 Likes