Avoiding the Garbage in Garbage Out (GIGO) trap

by Shubham Satyarth Feb 13, 2025

The sanctity of any data-based investment platform lies in the quality of underlying data. If the data is not good enough, the entire thing can quickly turn into a garbage-in-garbage-out situation. We take utmost care in ensuring top-quality data

Ensuring data quality involves 3 broad steps

Raw data is sourced from reliable and reputed data vendors
Internal data sanity and processing is top notch
Continuous improvement in data standards

We work with the best data vendors

Data quality starts at the source. We only source raw data from top-quality reputed vendors.

Our stock data (fundamentals, estimates, prices, corporate actions and so on) are sourced from FactSet. FactSet is rated as the number 1 data provider for company fundamentals. Further, use of FactSet ensures IFRS standardization of all financial items thereby circumventing (to an extent) potential reporting management. We discuss more about this here.

We use Value Research for Mutual Fund and ETF data. Again, Value Research is one of the top data providers for Mutual Funds in India.

Top notch data sanity and processing

Despite working with the best vendors, you will still have issues with data quality. At the same time, incorrect processing could lead to wrong calculations and various biases (like look ahead bias) could creep into your back test. Our data processing engine has been designed to ensure quality and avoid biases.

Let’s take an example. P/E ratio is one of the most commonly used metrics for analyzing a stock’s valuation. Lower P/E ratio is considered good. However, P/E ratio becomes meaningless if the EPS (E in P/E) is negative. Since negative P/E is meaningless, P/E should be set to NA for all stocks where EPS is negative.

At the same time, investors can still use a variant of this metric on our system called earning yield (E/P). In this case, negative E/P is not meaningless and higher is always better. Similarly, we have yield metrics for all multiple-based valuation metrics.

It is a trivial example but conveys the point – we are extremely careful in processing data. Let’s take another example.

If a Company’s EPS goes from negative to positive in 3 years, 3Y EPS CAGR will be negative if calculated in a conventional manner. But this is plain wrong. Therefore, we have a modified version of CAGR calculation which we discuss here. Handling biases in back testing is discussed in detail in this article.

Open to improvement and criticism

Despite all our efforts, we fully acknowledge that data issues might still pop up. We take our data and analysis very seriously and therefore welcome all kinds of suggestions to improve. We actively encourage our users to report bugs/data issues and fixing that becomes our top priority.

What is Bonus Issue of Shares: A Comprehensive Overview

Data Coverage and classification

On this page