top of page
Search

Is Your Data Telling the Truth? How Data Analysts Validate Accuracy

  • Writer: Otewa O. David
    Otewa O. David
  • Jun 2
  • 2 min read

You’ve pulled the data. Built the dashboard. Hit "Run". But something feels off. The numbers don’t match what the business expected. So how do you know the data is correct? As data analysts, our job is not just to analyze data—it’s to trust it. And that starts with proper validation.


1. Check the Source


Not all data is created equal. Before diving into analysis, ask:

  • Where is this data coming from?

  • Was it collected consistently and reliably? Trustworthy sources like CRM systems, verified APIs, or controlled spreadsheets should be your foundation.


2. Validate Schema & Data Types


Columns should be in expected formats: dates as dates, prices as numbers, and IDs as text (if alphanumeric). Mismatched schemas often point to broken pipelines or incorrect merges.


3. Summarize the Stats

A quick summary reveals a lot. Use functions like describe() in Python or summary() in R to check:

  • Are there negative values where there shouldn't be?

  • Do maximums/minimums make sense? A simple AVG(Salary) or COUNT(Orders) can highlight deeper issues.



Discover how analysts separate clean data from chaos.
Discover how analysts separate clean data from chaos.


4. Watch for Missing or Duplicate Data


Missing values can signal incomplete processes, broken imports, or faulty logic. Duplicates—especially in primary keys like user IDs or transaction numbers—can lead to overcounting and misleading results.

Start with:

  • IS NULL checks in SQL

  • df.isnull().sum() in Python (Pandas)

  • Deduplication using tools like DROP DUPLICATES or GROUP BY and HAVING COUNT(*) > 1

If the same user appears five times in a supposedly unique list, that’s a red flag.


5. Sanity Check with Benchmarks


Always compare your metrics with:

  • Historical trends

  • Previous dashboards

  • External benchmarks (e.g., from Google Analytics or QuickBooks) If last month’s revenue was ₹5M and this month’s is ₹0, either your company went bankrupt or your pipeline broke


6. Apply Business Logic


Business rules are your best friends:

  • A refund can’t be more than the original price.

  • A user can’t purchase before signing up. If logic breaks, data might be dirty.


7. Manual Spot Checks


Sometimes, you just have to dig. Pull a few rows and trace them to the source. Does it align with what you expect? Does it pass the "does this make sense?" test?


8. Review Logs & ETL Reports


If your data flows through ETL tools (like Airflow or dbt), review the logs. Look for transformation errors, failed joins, or unexpected truncations.

9. Stay Current Is the data fresh? Check for update timestamps. You might be analyzing last week's dump, thinking it's today's data.

10. Ask the Business At the end of the day, the business owns the data. Show them early drafts and ask:

  • “Does this align with your expectations?”

  • “Are there any known issues I should account for?”

Conclusion Clean data isn’t luck—it’s discipline. As a data analyst, your job is to ensure the foundation is solid before you build. Because in analytics, bad data leads to bad decisions. And that’s a mistake you can’t afford.




 
 
 

Recent Posts

See All
Building a Standout Data Analyst Portfolio

When it comes to making a splash in the world of data analysis, your skills alone won’t cut it. You need a showcase that tells your story, highlights your expertise, and convinces businesses you’re t

 
 
 

Comments


Frequently asked questions

bottom of page