Is Your Data Telling the Truth? How Data Analysts Validate Accuracy
- Otewa O. David
- Jun 2
- 2 min read
You’ve pulled the data. Built the dashboard. Hit "Run". But something feels off. The numbers don’t match what the business expected. So how do you know the data is correct? As data analysts, our job is not just to analyze data—it’s to trust it. And that starts with proper validation.
1. Check the Source
Not all data is created equal. Before diving into analysis, ask:
Where is this data coming from?
Was it collected consistently and reliably? Trustworthy sources like CRM systems, verified APIs, or controlled spreadsheets should be your foundation.
2. Validate Schema & Data Types
Columns should be in expected formats: dates as dates, prices as numbers, and IDs as text (if alphanumeric). Mismatched schemas often point to broken pipelines or incorrect merges.
3. Summarize the Stats
A quick summary reveals a lot. Use functions like describe() in Python or summary() in R to check:
Are there negative values where there shouldn't be?
Do maximums/minimums make sense? A simple AVG(Salary) or COUNT(Orders) can highlight deeper issues.

4. Watch for Missing or Duplicate Data
Missing values can signal incomplete processes, broken imports, or faulty logic. Duplicates—especially in primary keys like user IDs or transaction numbers—can lead to overcounting and misleading results.
Start with:
IS NULL checks in SQL
df.isnull().sum() in Python (Pandas)
Deduplication using tools like DROP DUPLICATES or GROUP BY and HAVING COUNT(*) > 1
If the same user appears five times in a supposedly unique list, that’s a red flag.
5. Sanity Check with Benchmarks
Always compare your metrics with:
Historical trends
Previous dashboards
External benchmarks (e.g., from Google Analytics or QuickBooks) If last month’s revenue was ₹5M and this month’s is ₹0, either your company went bankrupt or your pipeline broke
6. Apply Business Logic
Business rules are your best friends:
A refund can’t be more than the original price.
A user can’t purchase before signing up. If logic breaks, data might be dirty.
7. Manual Spot Checks
Sometimes, you just have to dig. Pull a few rows and trace them to the source. Does it align with what you expect? Does it pass the "does this make sense?" test?
8. Review Logs & ETL Reports
If your data flows through ETL tools (like Airflow or dbt), review the logs. Look for transformation errors, failed joins, or unexpected truncations.
9. Stay Current
Is the data fresh? Check for update timestamps. You might be analyzing last week's dump,
thinking it's today's data.
10. Ask the Business At the end of the day, the business owns the data. Show them early drafts and ask:
“Does this align with your expectations?”
“Are there any known issues I should account for?”
Conclusion Clean data isn’t luck—it’s discipline. As a data analyst, your job is to ensure the foundation is solid before you build. Because in analytics, bad data leads to bad decisions. And that’s a mistake you can’t afford.



Comments