[Resolved] Reward Incident for SubQuery Kepler Network

The SubQuery Kepler Network has gone into Maintenance Mode in order to investigate a reward calculation bug on a smart contract deployed on the network.

During this period, functionality will be limited and you will be unable to execute many smart contract interactions. Registration, transfers, rewards, delegation, and staking will be paused for now. The ability to unpause and upgrade will remain in effect. We will provide a more detailed update shortly.

[Posted 5.30am 12th June UTC]

2 Likes

Over the weekend we discovered a bug with a reward calculation contract.

When an agreement crosses over two Eras, Indexers receive extra rewards to what they are entitled for and these extra rewards will multiply over time.

This was discovered from error reports by Indexers who were unable to claim rewards, because the exact kSQT allocated to the reward contract was insufficient to cover the extra rewards already claimed. We have already issued a fix for this bug and will be updating the contracts in the next 24 hours to resolve this. The exact number of extra rewards claimed is contained to only a few thousand USD worth of kSQT. We are working an additional data fix to solve the extra rewards.

[Posted 6.06am 12th June UTC]

2 Likes

I want to share that we have identified the issue and have begun to roll out fixes. Kepler should be back online in a few hours once we’ve deployed and confirmed the changes

2 Likes

As at 11.31am 13th June UTC, we disabled Maintenance mode on Kepler network and since then we have been observing it’s recovery.

We consider that this incident has been resolved.

We will share a full postmortem with the community tomorrow.

2 Likes

As part of the SubQuery’s Council’s dedication to openness and in communication to the community, we publish a full incident report below.

Leadup

  • Multiple indexers reported they could not claim rewards via our Forum.
  • Due to a vague error message on our Explorer app, it was hard to immediately identify the root cause for the issue.
  • The SubQuery Network developer team then published a new patch release to collect the internal errors on the explorer app to identify the root issue.

Root Cause

  • There’s a miscalculation in reward distributor contract that affects agreements that span across multiple Eras.
  • As a result, the distributed reward is greater than the expected reward, which means that the sum or rewards were greater than the deposited kSQT within the reward contract.
  • SubQuery Network developers found the kSQT balance in Reward Contract was nearly empty, and because of this, Indexers were unable to claim additional rewards.

Impact

  • The exact number of extra rewards claimed is contained to less than $5,000 USD worth of kSQT as the Council were careful to tightly control the amount of kSQT in the reward contract in case exactly this issue occurred.
  • Although our initial investigation indicated there was no potential future loss, further investigation showed a potential possibility for future loss, as a result the SubQuery Kepler Network was placed into Maintenance mode from 10:19 10th June to 01:31 13th June (UTC) while this incident was better understood and to avoid further issues.

Resolution

  • A fix (#206) was coded extremely quickly on Friday (9th) and refined and tested over the weekend
  • A corrective data fix has been calculated and tested to recover excess rewards where possible. This includes:
    • Fixing or deducting uncollected reward amounts to what they should be
    • For some indexers that have collected and claimed excess rewards so far, their Era 4 rewards will be 0.

Lessons learned

  • We need to give more data to Indexers and Delegators on the particular stack trace of a error seen on the client. We’re going to start logging a lot more to the browser console to help you all here.
  • The team need to review the unit tests and improve the test coverage for the contract codes to cover more real-world cases, especially for the reward contracts.
  • Relatively quick response to the incident from the team considering complications with contacting members who were offline over the weekend and the extensive data collection that was required to calculate expected v actual rewards
  • Better monitoring and analytics for the Kepler network is required, especially staking and reward contracts
  • Include at least one indexer from the team in the sponsored plan set
  • Needs to engage more with the community, so that we can identity the issues from community more earlier and provide better support to Indexers and Delegators

Full timeline (all times are in UTC)

  • [2023-06-09T01:00Z] First confirmation of issue
  • [2023-06-09T03:00Z] SubQuery Network Developers meet to discuss what could be the potential errors in the contract may cause the incorrect value in the rewards.
  • [2023-06-09T08:00Z] A fix (#206) for the contract was started on and SubQuery Network Developers began working on a test to reproduce the reward miscalculation, and confirm the fix resolve the future reward calculation.
  • [2023-06-10T01:00Z] SubQuery Network team meet to discuss possible solutions to calculate and fix incorrect on-chain reward data
  • [2023-06-11T10:19Z] The SubQuery Kepler Network was placed into Maintenance mode to avoid further issues.
  • [2023-06-12T02:00Z] A recommended course of action is agreed upon by the SubQuery Council. SubQuery Network Team continue calculating data required by the fix, contract upgrades, transactions, and corrective data to insert/update
  • [2023-06-13T01:31Z] Incident resolved and Maintenance mode was disabled for The SubQuery Kepler Network
4 Likes

Nice postmortem guys. Thanks for the quick response and fix.