Yonatan, CTO of Chaos Labs here. Before Chaos, I spent years building infrastructure at Apple and Meta (fka Facebook), focusing on distributed systems, networking, and big data infrastructure.
I can’t agree that subgraphs are unreliable. Subgraphs are simply a piece of tech that scrape data, process it, normalize and index it, so this statement is generalized. Any custom data pipeline faces the same challenges. These challenges become even more significant when discussing high throughput blockchain and data sets. The benefit of subgraphs is that they are open-sourced data collection frameworks with years of open source contributions and broad community adoption.
At Chaos, we use multiple data sources to power our applications, such as subgraphs, custom ETLs, and 3rd party data providers. Redundancy and reliability are critical when dealing with sensitive financial data. Multiple data sources allow us to compare and verify the validity of data. When critical data is on the line relying on a single source is extremely risky.
Errors in subgraph data validity are most likely due to errors in the collection logic and data parsing (and this applies to any data infra). For example, our pipelines detected anomalies in Aave data when integrating with the v3 Risk application. After researching the root cause, we uncovered a bug in the event logic parsing. We have already shipped our proposed solution to fix that issue. You can check out the subgraph repo here. This highlights a significant benefit of open source data collection frameworks such as subgraphs: community transparency. Any individual contributor can verify how data is collected and processed when something is off. Additionally, more eyes usually produce better software.
@AndrewA, since releasing the v3 Risk Application, we’ve been getting a lot of inbound around data availability and integrity for v3. We’re happy to participate in any ideation or initiatives to make this more accessible for the community. We think subgraphs are an excellent place to start as they are the most widely adopted, but we are also open to any other ideas or proposals.