Working with NaN in spreadsheets and data tools can make or break your analysis when blanks, errors, and missing values sneak into your dataset. This guide shares practical tips, examples, and a few hard-earned habits to help you detect, clean, and use NaN safely across Excel/Google Sheets, Python/pandas, and SQL.
What NaN Means and Why It Matters
NaN usually stands for Not a Number, but in day-to-day analytics it’s best understood as a marker for missing or undefined values. The tricky part is that different tools represent missingness differently: an empty cell in a spreadsheet, a NULL in SQL, and a NaN in pandas aren’t identical—even if they look similar in a chart or pivot table.
In practice, NaN can come from many sources: incomplete data entry, failed lookups, parsing errors, or legitimate “not applicable” cases (for example, a cancellation date for an active subscription). Treating all missing values as the same can hide important business meaning. I’ve seen teams accidentally impute zeros where “unknown” was intended, which quietly distorted KPIs for weeks.
A solid workflow begins with two decisions: (1) how you will detect missing values consistently across tools, and (2) how you will represent them so calculations behave predictably. Once those are clear, cleanup and modeling become much less painful.
How to Identify NaN (Missing Values) in Spreadsheets
Spreadsheets don’t literally store NaN the way Python does, but the concept shows up through blanks, error values, or formulas that return empty strings. In Excel and Google Sheets, missingness can also be “masked” because a cell may look blank while actually containing "" from a formula.
A good first pass is visual: filter for blanks, sort columns, and use conditional formatting to highlight empty or error-prone cells. But visual checks don’t scale well, so use formulas to classify missingness. In Excel, ISBLANK() only returns TRUE for truly empty cells—not for formulas returning "". In Sheets, similar caveats apply. That distinction matters when you later count missing values or feed the sheet into another tool.
One habit I recommend is adding a “data quality” helper column early in the workflow that labels each row as OK, Missing, Error, or Needs Review. It feels like extra work, but it pays off when you revisit the file a month later and can immediately see what was intentionally left blank versus what broke.
pandas NaN Handling: isna, fillna, dropna, and Beyond
When people say “Working with NaN” in data tools, they often mean pandas. pandas uses NaN (and sometimes NaT for datetimes) as a standard missing marker. The key advantage is consistency: once values are NaN, pandas functions like isna() and fillna() behave in a predictable way across columns—mostly. The gotcha is dtype: integer columns may be coerced to floats to accommodate NaN unless you use nullable integer types like Int64.
The core tools are straightforward: detect with isna()/notna(), remove with dropna(), or replace with fillna(). But the best choice depends on context. Dropping rows is clean but can bias results if missingness is systematic (e.g., older records missing a field). Filling values is convenient but can introduce assumptions. I like to start by measuring the missing rate per column, then decide column by column.
Practical patterns I use (with examples)
- Count missing values per column
df.isna().sum()- Filter rows where a key field is missing
df[df["customer_id"].isna()]- Fill numeric missing values with a domain-safe choice
df["discount"].fillna(0)only if 0 truly means none- Fill with group-level statistics
df["score"] = df["score"].fillna(df.groupby("segment")["score"].transform("median"))- Drop rows carefully
df.dropna(subset=["order_id"])to keep rules narrow and explicit
One personal rule: I avoid global df.fillna(0) unless the dataset is purely additive measures where “missing implies zero” is explicitly true. Otherwise, you end up erasing uncertainty, and dashboards become overconfident.
Data Cleaning Best Practices Before Analysis
A lot of NaN pain comes from mixing concerns: you import data, clean it, transform it, and analyze it in one pass. A more reliable approach is to separate these steps and document decisions. Missing values deserve special attention because they’re both a data-quality signal and a modeling input.
Start by standardizing what counts as missing. In CSV exports, you may see "", NA, N/A, null, -, or even 0 used as a placeholder. Normalize these early so that downstream tools don’t treat them as legitimate categories. In pandas, read_csv() can help with na_values=[...], and in spreadsheets you can replace known placeholders before applying formulas or pivots.
Then decide whether missingness is acceptable, fixable, or meaningful. For example, missing latitude/longitude might be fixable via a geocoding step, while missing middle names are usually acceptable and can be left as missing. For cancellation dates, missing may be meaningful (still active), so you might create a derived boolean like is_cancelled and keep the original field missing intentionally.
Finally, keep an audit trail. A simple “cleaning log” (even a markdown note or a sheet tab) stating what was filled, what was dropped, and why will save you in reviews and future reruns.
Excel and Google Sheets Formulas for Missing Data
Even if you do most analytics in code, spreadsheets remain a common staging area. Knowing a few missing-data patterns helps you avoid subtle mistakes. In Excel/Sheets, you often need to distinguish among: truly blank cells, empty strings from formulas, and error outputs like #N/A from lookups.
Use IFERROR() or IFNA() to manage lookup failures intentionally, but be careful: converting an error into "" can make it look “clean” while hiding an upstream mismatch. Sometimes it’s better to output a clear label like Missing or LookupFailed during debugging, then convert to blanks only when publishing.
Common patterns include:
– Replace blanks with a label for auditing, then revert later
– Use COUNTBLANK() for quick scans, but validate if your column contains "" values
– Build robust lookups where missing matches don’t silently blend into totals
One practical trick: create a helper formula that classifies each cell, such as Blank vs EmptyString vs Error vs Valid. It’s not glamorous, but it makes reconciliation far easier when totals don’t match between a spreadsheet and a database extract.
NaN vs NULL: Working Across SQL, CSV, and BI Tools
When you move data between systems, NaN-like concepts shift. SQL typically uses NULL, which has its own logic: comparisons like = NULL don’t work; you must use IS NULL. Aggregations vary too: COUNT(column) ignores NULL, while COUNT(*) counts rows. In BI tools, NULLs may be treated differently in visuals, filters, and calculated fields, especially when blending multiple sources.
CSV is another common trouble spot. A CSV has no native type system; everything is text until interpreted. That means missing values can be imported as empty strings, literal “null”, or even the word “NaN” depending on the exporter. If you’ve ever had a column flip between numeric and text in Excel, you’ve experienced this problem firsthand.
To reduce friction:
– Decide on a canonical missing representation for exports (often empty for CSV, NULL in SQL, NaN in pandas)
– Add schema checks on import (types, allowed values, missing thresholds)
– Be explicit in BI calculations: define whether NULL should be treated as 0, excluded, or flagged
In my experience, most cross-tool inconsistencies come from assumptions. If you state the missing-value policy up front, your pipeline becomes easier to maintain and less reliant on tribal knowledge.
Conclusion: A Safe, Repeatable NaN Workflow
Working with NaN in spreadsheets and data tools is less about memorizing functions and more about making consistent, documented choices. Detect missing values early, normalize messy placeholders, and decide whether to drop, fill, or preserve missingness based on what the field actually means.
If you take only one action after reading this: add a small missingness audit to every dataset (counts, rates, and a few spot checks), and carry that audit across tools. It’s a simple habit that prevents silent errors—and it makes your analysis feel calmer, even when the data isn’t.
