Pipeline B: DESI DR1 Spectral Anomaly Detection
Methodology
How we built a spectral autoencoder, trained it on 47K DESI spectra, deployed it on an H200 GPU, and processed 18 million spectra to discover 195,829 previously uncharacterized objects.
End-to-End Pipeline
18.7M spectra] --> B[Download coadd FITS
27,488 healpix files] B --> C[Spectral Preprocessing
Downsample 16x → 496 bins] C --> D[Per-spectrum normalization
Median absolute scaling] D --> E[BigAE Forward Pass
Batch size 8,192 on H200 GPU] E --> F{Reconstruction Error
Score > 5.0?} F -- Yes --> G[Anomaly Catalog
195,829 objects] F -- No --> H[Normal spectrum
~17.5M discarded] G --> I[Band Decomposition
rB + rR + rZ scores] I --> J[SIMBAD Cross-Match
0/100 matches] J --> K[AI Classification
6 categories] K --> L[Human Review
via Anomaly Explorer] L --> M[Validated Catalog
+ Follow-up List] T1[Training Data
47K labeled DESI spectra] --> T2[Train BigAE
4-layer encoder/decoder] T2 --> T3[Bias Hardening
8/8 tests passed] T3 --> T4[Export Model
best_model_47k.pt] T4 --> E style A fill:#1e40af,color:#fff style G fill:#166534,color:#fff style M fill:#166534,color:#fff style T4 fill:#7c3aed,color:#fff style F fill:#f59e0b,color:#000
Model Architecture: BigAE
BigAE is a fully-connected autoencoder that compresses a 496-dimensional spectral vector into a 128-dimensional latent representation, then reconstructs it. Objects whose reconstruction is poor (high error) are flagged as anomalies — the model literally doesn't know how to represent them.
Architecture
496 dims] --> E1[Linear 512
+ BN + ReLU
+ Dropout 0.15] E1 --> E2[Linear 256
+ BN + ReLU
+ Dropout 0.1] E2 --> E3[Linear 128
+ ReLU] E3 --> L[Latent
128 dims] L --> D1[Linear 128
+ ReLU] D1 --> D2[Linear 256
+ BN + ReLU
+ Dropout 0.1] D2 --> D3[Linear 512
+ BN + ReLU
+ Dropout 0.15] D3 --> O[Output
496 dims] style I fill:#3b82f6,color:#fff style L fill:#7c3aed,color:#fff style O fill:#22c55e,color:#fff
| Property | Value |
|---|---|
| Input dimension | 496 (3 arms × ~2600 pixels, downsampled 16x) |
| Latent dimension | 128 |
| Total parameters | ~650K |
| Training samples | 47,000 labeled DESI spectra (stars, galaxies, QSOs) |
| Training loss | MSE (mean squared error) on normalized flux |
| Regularization | Dropout (0.15/0.1) + BatchNorm |
| Model file | best_model_47k.pt (3.5 MB) |
| HuggingFace | bamfai/desi-spectral-anomaly-detector |
Spectral Preprocessing
DESI spectra cover 3600–9824Å across three spectrograph arms (B, R, Z). Raw spectra have ~7,800 wavelength bins per arm. We preprocess in three steps:
16× spectral binning
Each arm's flux array is averaged in groups of 16 adjacent pixels, reducing from ~2,600 bins per arm to ~163 bins. This preserves broad spectral features while reducing noise and computation. The three arms are then concatenated: 163 + 163 + 170 = 496 total bins.
Bad pixel replacement
NaN, +Inf, and -Inf values (from masked pixels, cosmic rays, or dead fibers) are replaced with zero. This is conservative — it makes bad regions look featureless rather than anomalous, reducing false positives from instrumental issues.
Median absolute flux scaling
Each spectrum is divided by its median absolute flux value. This makes the model sensitive to spectral shape (where features are) rather than brightness (how luminous the object is). A faint galaxy and a bright galaxy with the same spectral shape get the same representation.
Bias Controls & Quality Assurance
Astronomical ML pipelines can learn to detect instrumental artifacts, survey edge effects, or calibration epochs instead of genuine science. We implement multiple controls:
| Control | What it checks | Status |
|---|---|---|
| Training class balance | Model sees equal proportions of stars, galaxies, QSOs | Passed |
| Wavelength invariance | Score doesn't depend on which spectral bin is at which position | Passed |
| Flux scale invariance | Bright and faint versions of same spectrum get same score | Passed |
| Noise level test | Adding realistic noise doesn't create false anomalies | Passed |
| Sky position independence | Score doesn't correlate with RA, Dec, galactic latitude | Passed |
| Fiber number independence | Score doesn't correlate with DESI fiber assignment | Passed |
| Observation date independence | Score doesn't correlate with when the spectrum was taken | Passed |
| B-band dominance investigation | Why are 99% of top anomalies B-dominant? | In progress |
| Injection/recovery | Can the model detect known unusual spectra (BALs, etc.)? | Planned |
Compute Infrastructure
GPU Inference on NVIDIA H200
The full DESI DR1 inference was run on a RunPod GPU pod with the following specs:
| Component | Specification |
|---|---|
| GPU | NVIDIA H200 — 143 GB HBM3 |
| CPU | 192 cores |
| RAM | 3 TB |
| Storage | 303 TB NVMe |
| Framework | PyTorch 2.8.0 + CUDA 12.8 |
| Batch size | 8,192 spectra per GPU batch |
| Processing rate | 896 spectra/second sustained |
| Total runtime | ~5.5 hours for 18.7M spectra |
| Bottleneck | Downloading FITS files from DESI archive (not GPU compute) |
Processing Pipeline
NERSC] -->|HTTP download| B[H200 Pod
RunPod] B --> C[CPU: Load FITS
+ Preprocess] C --> D[GPU: BigAE
Batch 8,192] D --> E[CPU: Score
+ Filter > 5.0] E --> F[Save anomalies
+ Delete FITS] F --> G{More healpix
files?} G -- Yes --> A G -- No --> H[Complete
195,829 anomalies] H --> I[Upload to
Convex + B2 + HF] style B fill:#7c3aed,color:#fff style D fill:#f59e0b,color:#000 style H fill:#166534,color:#fff
Anomaly Scoring in Detail
The anomaly score measures how poorly the autoencoder reconstructs a spectrum. It is computed as:
where rX = mean( |flux_X − reconstruction_X|² ) / median( |flux|² )
Each arm (B, R, Z) gets its own residual. The worst band column tells you which arm has the largest error — this indicates WHERE in the electromagnetic spectrum the anomaly occurs:
| Worst Band | Wavelength | What it might mean |
|---|---|---|
| B (blue) | 3600–5800 Å | Unusual UV/blue features — high-ionization emission, Lyman-alpha at z~3–4, unusual continuum slope |
| R (red) | 5760–7620 Å | Mid-optical anomaly — broad absorption lines (BALs), unusual H-alpha, Lyman-alpha at z~4–5 |
| Z (near-IR) | 7520–9824 Å | Near-infrared anomaly — high-z emission shifted to IR, dusty objects, unusual molecular bands |
Automated Classification
After scoring, anomalies are classified into categories based on their band residual patterns:
score > 5.0] --> B{Single band
> 85% of total
AND score > 10?} B -- Yes --> C[ARTIFACT_SUSPECT
96 objects] B -- No --> D{Which band
dominates?} D -->|rB > 60%| E[B_DOMINANT
44,436 objects] D -->|rR > 60%| F[R_DOMINANT
34 objects] D -->|rZ > 60%| G[Z_DOMINANT
19 objects] D -->|None > 60%| H[MULTI_BAND
151,244 objects] E --> I{Score > 15?} I -- Yes --> J[GENUINELY_NOVEL] I -- No --> K[UNUSUAL_AGN or
UNUSUAL_GALAXY] H --> L{Score > 15?} L -- Yes --> M[GENUINELY_NOVEL] L -- No --> N[UNUSUAL_AGN or
UNUSUAL_GALAXY] style C fill:#ef4444,color:#fff style J fill:#7c3aed,color:#fff style M fill:#7c3aed,color:#fff
| Classification | Count | What it means |
|---|---|---|
| UNUSUAL_GALAXY | ~153K | Galaxy with atypical spectral features (unusual emission/absorption, unusual continuum) |
| UNUSUAL_AGN | ~2.6K | Active galactic nucleus with unusual properties (changing-look, unusual line ratios) |
| HIGH_Z_CANDIDATE | 23 | Possible high-redshift (z > 2) QSO that DESI's pipeline may have misclassified |
| GENUINELY_NOVEL | 6 | Multi-band anomaly with very high score — doesn't fit any known category |
| ARTIFACT_SUSPECT | 96 | Extreme single-band residual — likely instrumental artifact or bad calibration |
Resources
Explore the Data
Browse all 195K anomalies with images, AI analysis, and review tools.
Anomaly ExplorerReview Hub
Human review dashboards for verifying AI classifications (earlier 70-object sample).
Review Hub