Lecture Notes
Object Detection Evaluation:
A mAP Deep Dive
Hanbyul Joo
SNU Visual Computing Lab
Dept. of Computer Science and Engineering
More Demos ↗
IoU
TP · FP · FN
Precision–Recall
Average Precision
mAP @ 20 classes
VOC vs COCO
Modern Datasets
Overview
PASCAL VOC Evaluation Pipeline
Five stages from raw detections to mAP. We will build each step from the ground up.
01
Compute IoU
Overlap ratio between
predicted & GT box
02
Assign TP / FP
IoU ≥ 0.5 → TP
else FP (duplicates too)
03
Build PR Curve
Sort by confidence desc.
Accumulate P / R per row
04
Compute AP
11-pt interpolation
or All-point AUC
05
mAP
Mean AP across
all 20 classes
Step 01
IoU — Intersection over Union
Drag the sliders to see how box position and size affect IoU. PASCAL VOC standard: IoU ≥ 0.5 → TP.
IoU = Intersection / Union
= (Pred ∩ GT area) / (Pred ∪ GT area)
Pred box X130
Pred box Y100
Pred box size120
0.00
IoU
Step 02
TP · FP · FN — Assignment Rules
How each detection is classified. Adjust thresholds to watch TP / FP / FN shift in real time.
Confidence threshold: 0.50
IoU threshold: 0.50
0
TP
IoU ≥ thr & conf ≥ thr
0
FP
conf ≥ thr but IoU < thr
0
FN
GT exists but not detected
TN
Not defined in detection
TP
FP
FN
Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 = 2 × P × R / (P + R)
Precision
Recall
F1
Key rules
• Multiple detections on same GT → only 1st is TP, rest are FP
• Unmatched GTs count as FN
• No background class → TN is undefined in detection
• This is why mAP is used instead of Accuracy
Each chip = one detection sample
img · conf=X · IoU=Y
Step 03
Precision–Recall Curve — Live Sample Walkthrough
Confidence threshold
0.50
IoU threshold
0.50
AP
Precision
Recall
TP / FP / FN
All detections — sorted by confidence ↓
# Image Conf IoU Verdict P R
Samples above threshold — classified as:
TP IoU ≥ thr & conf ≥ thr
FP conf ≥ thr but IoU < thr
FN GT not matched by any detection
Step 03 — How it works
Building the PR Curve step by step
We lower the confidence threshold one detection at a time. Each new detection adds one (Recall, Precision) point to the curve. Press Play or step through manually.
Step 0 / 20
Current detection
Cum TP
0
Cum FP
0
Precision
Recall
# Conf V P R
Step 03 — Intuition
Why the PR Curve looks like stairs
Three key patterns explain the characteristic shape of every PR Curve — and what they tell you about the model.
① TP hit → step right (Recall ↑)
When a new detection is a TP, the cumulative TP count goes up. Recall = TP / GT_total, so Recall increases. The curve steps right.
Recall = TP↑ / GT_total → moves right
② FP hit → step down (Precision ↓)
When a new detection is a FP, TP stays the same but TP+FP grows. Precision = TP / (TP+FP) drops. The curve steps down.
Precision = TP / (TP + FP↑) → moves down
③ High-conf detections are added first
Because we sort by confidence descending, the model's most certain predictions come first — these tend to be correct, so the curve starts high-precision. It sags as we add lower-confidence (noisier) detections.
Sort: conf 0.97 → 0.93 → … → 0.06
Takeaway: A curve that stays top-right means the model finds many TPs early (high precision) and continues finding them (high recall). AP is the area under this curve — bigger = better.
Step 05
PASCAL VOC — 20-class mAP
Compute per-class AP then take the arithmetic mean. Sample YOLOv3-style per-class AP on VOC 2007 test set.
mAP = (1/20) × Σ AP_c

mAP@0.5 : VOC standard
mAP@0.5:0.95 : COCO standard
mAP (avg)
Best class
Worst class
# classes
20
VOC 20 classes
aeroplane · bicycle · bird · boat · bottle · bus · car · cat · chair · cow · diningtable · dog · horse · motorbike · person · pottedplant · sheep · sofa · train · tvmonitor
Summary
Key Concepts at a Glance
The full 5-step pipeline on one slide.
01
IoU
Ratio of intersection to union area between two boxes. PASCAL VOC uses 0.5 as the threshold to decide TP vs FP.
IoU = |A ∩ B| / |A ∪ B|
02
TP / FP / FN
IoU ≥ thr → TP. Second match on same GT → FP. Undetected GT → FN. TN is undefined — no background class.
Precision = TP/(TP+FP) Recall = TP/(TP+FN)
03 → 04
PR Curve & AP
Sort by confidence descending, accumulate (P, R) per row to trace the PR Curve. Area under that curve = AP.
AP = ∫₀¹ P(r) dr
05
mAP
Arithmetic mean of per-class AP over all 20 classes. VOC uses mAP@0.5; COCO uses mAP@0.5:0.95 (10 IoU levels averaged).
mAP = (1/C) × Σ AP_c
FP Deep Dive
The 3 Types of False Positives
Any detection labelled Positive that does not count as TP falls into one of three categories.
Case 1 — IoU Miss
GT Pred IoU ≈ 0.08 → FP
Box was predicted but barely overlaps GT. The location is simply wrong.
IoU < threshold → FP
Case 2 — Duplicate Detection
GT Pred1 conf=0.97 ✓TP Pred2 conf=0.81 Same GT → 2nd is FP
IoU is fine, but another prediction already claimed this GT. NMS prevents this.
1st match → TP, rest → FP
Case 3 — Hallucination
GT (person) Pred conf=0.72 Empty region → FP
A box drawn in empty background with no GT nearby. A model hallucination.
No GT in region → FP
Dataset Comparison
PASCAL VOC vs MS COCO
Both are called "mAP" — but they measure different things. Understanding the gap is essential when comparing published results.
PASCAL VOC
MS COCO
VOC mAP@0.5
Single IoU → 1 AP value
Relatively lenient evaluation
COCO mAP@0.5:0.05:0.95
10 IoU levels → 10 APs averaged
Box quality is strictly penalised
History
Object Detection Dataset Timeline
From VOC to SA-1B — the evolution of scale, class diversity, and annotation richness.
Modern Datasets
Classic vs Modern Dataset Comparison
Scale, diversity, and annotation style determine how well a model generalises. Bubble size = number of images. Hover for details.
Progress
Detection Model Performance Timeline
VOC mAP@0.5 and COCO mAP@0.5:0.95 — from HOG+SVM to modern transformers. Hover bars for details.