Pipeline B: DESI DR1 Spectral Anomaly Detection

Methodology

How we built a spectral autoencoder, trained it on 47K DESI spectra, deployed it on an H200 GPU, and processed 18 million spectra to discover 195,829 previously uncharacterized objects.

End-to-End Pipeline

graph TD A[DESI DR1 Archive
18.7M spectra] --> B[Download coadd FITS
27,488 healpix files] B --> C[Spectral Preprocessing
Downsample 16x → 496 bins] C --> D[Per-spectrum normalization
Median absolute scaling] D --> E[BigAE Forward Pass
Batch size 8,192 on H200 GPU] E --> F{Reconstruction Error
Score > 5.0?} F -- Yes --> G[Anomaly Catalog
195,829 objects] F -- No --> H[Normal spectrum
~17.5M discarded] G --> I[Band Decomposition
rB + rR + rZ scores] I --> J[SIMBAD Cross-Match
0/100 matches] J --> K[AI Classification
6 categories] K --> L[Human Review
via Anomaly Explorer] L --> M[Validated Catalog
+ Follow-up List] T1[Training Data
47K labeled DESI spectra] --> T2[Train BigAE
4-layer encoder/decoder] T2 --> T3[Bias Hardening
8/8 tests passed] T3 --> T4[Export Model
best_model_47k.pt] T4 --> E style A fill:#1e40af,color:#fff style G fill:#166534,color:#fff style M fill:#166534,color:#fff style T4 fill:#7c3aed,color:#fff style F fill:#f59e0b,color:#000

Model Architecture: BigAE

BigAE is a fully-connected autoencoder that compresses a 496-dimensional spectral vector into a 128-dimensional latent representation, then reconstructs it. Objects whose reconstruction is poor (high error) are flagged as anomalies — the model literally doesn't know how to represent them.

Architecture

graph LR I[Input
496 dims] --> E1[Linear 512
+ BN + ReLU
+ Dropout 0.15] E1 --> E2[Linear 256
+ BN + ReLU
+ Dropout 0.1] E2 --> E3[Linear 128
+ ReLU] E3 --> L[Latent
128 dims] L --> D1[Linear 128
+ ReLU] D1 --> D2[Linear 256
+ BN + ReLU
+ Dropout 0.1] D2 --> D3[Linear 512
+ BN + ReLU
+ Dropout 0.15] D3 --> O[Output
496 dims] style I fill:#3b82f6,color:#fff style L fill:#7c3aed,color:#fff style O fill:#22c55e,color:#fff
PropertyValue
Input dimension496 (3 arms × ~2600 pixels, downsampled 16x)
Latent dimension128
Total parameters~650K
Training samples47,000 labeled DESI spectra (stars, galaxies, QSOs)
Training lossMSE (mean squared error) on normalized flux
RegularizationDropout (0.15/0.1) + BatchNorm
Model filebest_model_47k.pt (3.5 MB)
HuggingFacebamfai/desi-spectral-anomaly-detector

Spectral Preprocessing

DESI spectra cover 3600–9824Å across three spectrograph arms (B, R, Z). Raw spectra have ~7,800 wavelength bins per arm. We preprocess in three steps:

Step 1 — Downsampling

16× spectral binning

Each arm's flux array is averaged in groups of 16 adjacent pixels, reducing from ~2,600 bins per arm to ~163 bins. This preserves broad spectral features while reducing noise and computation. The three arms are then concatenated: 163 + 163 + 170 = 496 total bins.

Step 2 — NaN/Inf handling

Bad pixel replacement

NaN, +Inf, and -Inf values (from masked pixels, cosmic rays, or dead fibers) are replaced with zero. This is conservative — it makes bad regions look featureless rather than anomalous, reducing false positives from instrumental issues.

Step 3 — Normalization

Median absolute flux scaling

Each spectrum is divided by its median absolute flux value. This makes the model sensitive to spectral shape (where features are) rather than brightness (how luminous the object is). A faint galaxy and a bright galaxy with the same spectral shape get the same representation.

Bias Controls & Quality Assurance

Astronomical ML pipelines can learn to detect instrumental artifacts, survey edge effects, or calibration epochs instead of genuine science. We implement multiple controls:

ControlWhat it checksStatus
Training class balanceModel sees equal proportions of stars, galaxies, QSOsPassed
Wavelength invarianceScore doesn't depend on which spectral bin is at which positionPassed
Flux scale invarianceBright and faint versions of same spectrum get same scorePassed
Noise level testAdding realistic noise doesn't create false anomaliesPassed
Sky position independenceScore doesn't correlate with RA, Dec, galactic latitudePassed
Fiber number independenceScore doesn't correlate with DESI fiber assignmentPassed
Observation date independenceScore doesn't correlate with when the spectrum was takenPassed
B-band dominance investigationWhy are 99% of top anomalies B-dominant?In progress
Injection/recoveryCan the model detect known unusual spectra (BALs, etc.)?Planned

Compute Infrastructure

GPU Inference on NVIDIA H200

The full DESI DR1 inference was run on a RunPod GPU pod with the following specs:

ComponentSpecification
GPUNVIDIA H200 — 143 GB HBM3
CPU192 cores
RAM3 TB
Storage303 TB NVMe
FrameworkPyTorch 2.8.0 + CUDA 12.8
Batch size8,192 spectra per GPU batch
Processing rate896 spectra/second sustained
Total runtime~5.5 hours for 18.7M spectra
BottleneckDownloading FITS files from DESI archive (not GPU compute)

Processing Pipeline

graph TD A[DESI Archive
NERSC] -->|HTTP download| B[H200 Pod
RunPod] B --> C[CPU: Load FITS
+ Preprocess] C --> D[GPU: BigAE
Batch 8,192] D --> E[CPU: Score
+ Filter > 5.0] E --> F[Save anomalies
+ Delete FITS] F --> G{More healpix
files?} G -- Yes --> A G -- No --> H[Complete
195,829 anomalies] H --> I[Upload to
Convex + B2 + HF] style B fill:#7c3aed,color:#fff style D fill:#f59e0b,color:#000 style H fill:#166534,color:#fff

Anomaly Scoring in Detail

The anomaly score measures how poorly the autoencoder reconstructs a spectrum. It is computed as:

score = rB + rR + rZ

where rX = mean( |flux_X − reconstruction_X|² ) / median( |flux|² )

Each arm (B, R, Z) gets its own residual. The worst band column tells you which arm has the largest error — this indicates WHERE in the electromagnetic spectrum the anomaly occurs:

Worst BandWavelengthWhat it might mean
B (blue)3600–5800 ÅUnusual UV/blue features — high-ionization emission, Lyman-alpha at z~3–4, unusual continuum slope
R (red)5760–7620 ÅMid-optical anomaly — broad absorption lines (BALs), unusual H-alpha, Lyman-alpha at z~4–5
Z (near-IR)7520–9824 ÅNear-infrared anomaly — high-z emission shifted to IR, dusty objects, unusual molecular bands

Automated Classification

After scoring, anomalies are classified into categories based on their band residual patterns:

graph TD A[Anomaly
score > 5.0] --> B{Single band
> 85% of total
AND score > 10?} B -- Yes --> C[ARTIFACT_SUSPECT
96 objects] B -- No --> D{Which band
dominates?} D -->|rB > 60%| E[B_DOMINANT
44,436 objects] D -->|rR > 60%| F[R_DOMINANT
34 objects] D -->|rZ > 60%| G[Z_DOMINANT
19 objects] D -->|None > 60%| H[MULTI_BAND
151,244 objects] E --> I{Score > 15?} I -- Yes --> J[GENUINELY_NOVEL] I -- No --> K[UNUSUAL_AGN or
UNUSUAL_GALAXY] H --> L{Score > 15?} L -- Yes --> M[GENUINELY_NOVEL] L -- No --> N[UNUSUAL_AGN or
UNUSUAL_GALAXY] style C fill:#ef4444,color:#fff style J fill:#7c3aed,color:#fff style M fill:#7c3aed,color:#fff
ClassificationCountWhat it means
UNUSUAL_GALAXY~153KGalaxy with atypical spectral features (unusual emission/absorption, unusual continuum)
UNUSUAL_AGN~2.6KActive galactic nucleus with unusual properties (changing-look, unusual line ratios)
HIGH_Z_CANDIDATE23Possible high-redshift (z > 2) QSO that DESI's pipeline may have misclassified
GENUINELY_NOVEL6Multi-band anomaly with very high score — doesn't fit any known category
ARTIFACT_SUSPECT96Extreme single-band residual — likely instrumental artifact or bad calibration

Resources

Explore the Data

Browse all 195K anomalies with images, AI analysis, and review tools.

Anomaly Explorer

Model on HuggingFace

Download the trained BigAE model and run your own inference.

HuggingFace

Review Hub

Human review dashboards for verifying AI classifications (earlier 70-object sample).

Review Hub