PASCAL VOC — mAP Deep Dive

Lecture Notes

Object Detection Evaluation:
A mAP Deep Dive

Hanbyul Joo

SNU Visual Computing Lab

Dept. of Computer Science and Engineering

More Demos ↗

IoU

TP · FP · FN

Precision–Recall

Average Precision

mAP @ 20 classes

VOC vs COCO

Modern Datasets

Overview

PASCAL VOC Evaluation Pipeline

Five stages from raw detections to mAP. We will build each step from the ground up.

01

Compute IoU

Overlap ratio between
predicted & GT box

→

02

Assign TP / FP

IoU ≥ 0.5 → TP
else FP (duplicates too)

→

03

Build PR Curve

Sort by confidence desc.
Accumulate P / R per row

→

04

Compute AP

11-pt interpolation
or All-point AUC

→

05

mAP

Mean AP across
all 20 classes

Step 01

IoU — Intersection over Union

Drag the sliders to see how box position and size affect IoU. PASCAL VOC standard: IoU ≥ 0.5 → TP.

IoU = Intersection / Union
= (Pred ∩ GT area) / (Pred ∪ GT area)

Pred box X130

Pred box Y100

Pred box size120

0.00

IoU

Step 02

TP · FP · FN — Assignment Rules

How each detection is classified. Adjust thresholds to watch TP / FP / FN shift in real time.

Confidence threshold: 0.50

IoU threshold: 0.50

0

TP

IoU ≥ thr & conf ≥ thr

0

FP

conf ≥ thr but IoU < thr

0

FN

GT exists but not detected

—

TN

Not defined in detection

TP

FP

FN

Precision = TP / (TP + FP)
Recall = TP / (TP + FN)
F1 = 2 × P × R / (P + R)

Precision

—

Recall

—

F1

—

Key rules

• Multiple detections on same GT → only 1st is TP, rest are FP

• Unmatched GTs count as FN

• No background class → TN is undefined in detection

• This is why mAP is used instead of Accuracy

Each chip = one detection sample
img · conf=X · IoU=Y

Step 03

Precision–Recall Curve — Live Sample Walkthrough

Confidence threshold

0.50

IoU threshold

0.50

AP

—

Precision

—

Recall

—

TP / FP / FN

—

All detections — sorted by confidence ↓

#	Image	Conf	IoU	Verdict	P	R

Samples above threshold — classified as:

TP IoU ≥ thr & conf ≥ thr

FP conf ≥ thr but IoU < thr

FN GT not matched by any detection

Step 03 — How it works

Building the PR Curve step by step

We lower the confidence threshold one detection at a time. Each new detection adds one (Recall, Precision) point to the curve. Press Play or step through manually.

Step 0 / 20

Current detection

—

Cum TP

0

Cum FP

0

Precision

—

Recall

—

#	Conf	V	P	R

Step 03 — Intuition

Why the PR Curve looks like stairs

Three key patterns explain the characteristic shape of every PR Curve — and what they tell you about the model.

① TP hit → step right (Recall ↑)

When a new detection is a TP, the cumulative TP count goes up. Recall = TP / GT_total, so Recall increases. The curve steps right.

Recall = TP↑ / GT_total → moves right

② FP hit → step down (Precision ↓)

When a new detection is a FP, TP stays the same but TP+FP grows. Precision = TP / (TP+FP) drops. The curve steps down.

Precision = TP / (TP + FP↑) → moves down

③ High-conf detections are added first

Because we sort by confidence descending, the model's most certain predictions come first — these tend to be correct, so the curve starts high-precision. It sags as we add lower-confidence (noisier) detections.

Sort: conf 0.97 → 0.93 → … → 0.06

Takeaway: A curve that stays top-right means the model finds many TPs early (high precision) and continues finding them (high recall). AP is the area under this curve — bigger = better.

Step 05

PASCAL VOC — 20-class mAP

Compute per-class AP then take the arithmetic mean. Sample YOLOv3-style per-class AP on VOC 2007 test set.

mAP = (1/20) × Σ AP_c

mAP@0.5 : VOC standard
mAP@0.5:0.95 : COCO standard

mAP (avg)

—

Best class

—

Worst class

—

# classes

20

VOC 20 classes

aeroplane · bicycle · bird · boat · bottle · bus · car · cat · chair · cow · diningtable · dog · horse · motorbike · person · pottedplant · sheep · sofa · train · tvmonitor

Summary

Key Concepts at a Glance

The full 5-step pipeline on one slide.

01

IoU

Ratio of intersection to union area between two boxes. PASCAL VOC uses 0.5 as the threshold to decide TP vs FP.

IoU = |A ∩ B| / |A ∪ B|

02

TP / FP / FN

IoU ≥ thr → TP. Second match on same GT → FP. Undetected GT → FN. TN is undefined — no background class.

Precision = TP/(TP+FP) Recall = TP/(TP+FN)

03 → 04

PR Curve & AP

Sort by confidence descending, accumulate (P, R) per row to trace the PR Curve. Area under that curve = AP.

AP = ∫₀¹ P(r) dr

05

mAP

Arithmetic mean of per-class AP over all 20 classes. VOC uses mAP@0.5; COCO uses mAP@0.5:0.95 (10 IoU levels averaged).

mAP = (1/C) × Σ AP_c

FP Deep Dive

The 3 Types of False Positives

Any detection labelled Positive that does not count as TP falls into one of three categories.

Case 1 — IoU Miss

Box was predicted but barely overlaps GT. The location is simply wrong.

IoU < threshold → FP

Case 2 — Duplicate Detection

IoU is fine, but another prediction already claimed this GT. NMS prevents this.

1st match → TP, rest → FP

Case 3 — Hallucination

A box drawn in empty background with no GT nearby. A model hallucination.

No GT in region → FP

Dataset Comparison

PASCAL VOC vs MS COCO

Both are called "mAP" — but they measure different things. Understanding the gap is essential when comparing published results.

PASCAL VOC

MS COCO

VOC mAP@0.5
Single IoU → 1 AP value
Relatively lenient evaluation

COCO mAP@0.5:0.05:0.95
10 IoU levels → 10 APs averaged
Box quality is strictly penalised

History

Object Detection Dataset Timeline

From VOC to SA-1B — the evolution of scale, class diversity, and annotation richness.

Modern Datasets

Classic vs Modern Dataset Comparison

Scale, diversity, and annotation style determine how well a model generalises. Bubble size = number of images. Hover for details.

Progress

Detection Model Performance Timeline

VOC mAP@0.5 and COCO mAP@0.5:0.95 — from HOG+SVM to modern transformers. Hover bars for details.