DESI DR1 Spectral Anomaly Catalog

Anomaly Explorer

195,829 previously unidentified spectral anomalies from 18M DESI DR1 spectra. First full-DR1-scale autoencoder search (~90x prior EDR work by Liang+ 2023 and Nicolaou+ 2026). 100% of the top 100 objects are NOT in SIMBAD — these are genuinely uncharacterized objects. Browse the top 1,000 by anomaly score below.

How It Works Research Status Model on HuggingFace
Live Research — Work in Progress
This catalog is from the initial anomaly detection pass. Current status: Steps 1–3 complete (detection, SIMBAD cross-match, classification by band pattern). Step 4 (bias validation) complete. Step 5 (fNL improvement) complete. Artifact verification complete — 200/200 top anomalies verified genuine by spectral inspection (0% sky artifacts). Three wavelength clusters: 12 at 7600Å, 28 at 3600–3700Å, 3 at 9440–9480Å. Enhanced 18M catalog — 44% complete (7.9M/17.9M), redshifts will resolve cluster interpretations. Step 6 (paper) — draft v0.1 exists. Click any row to view the Legacy Survey image, read AI analysis notes, and add your own review comments.
Key Finding from Enhanced 18M Catalog
Galaxies are 20× more likely to be spectrally anomalous than QSOs (aggregate from 6.5M spectra). These anomalies are NOT missed quasars — they are unusual galaxies at z∼0.3–0.5 whose spectra don’t match any known template. Score vs S/N shows no correlation, confirming these are genuine spectral anomalies, not noise artifacts. Artifact verification: 200/200 top anomalies are genuine astrophysical sources (0% sky artifacts, verified by downloading actual DESI spectra and classifying peak wavelengths vs known sky/telluric lines). Enhanced 18M catalog (45 columns, latent vectors, redshifts) running on H200 — 44% complete.
195,829
Total Anomalies
101
Extreme (score > 15)
9,836
High Confidence (> 10)
44,345
Medium (> 7)
18M
Spectra Processed
896/s
Processing Rate

Sky Distribution

All 1,000 top-scored anomalies plotted by RA/Dec. Color indicates anomaly score (yellow = highest, blue = threshold).

Top Anomalies

Showing top 1,000 by anomaly score. Click column headers to sort. Each row links to the Legacy Survey image viewer. Full catalog (195,829 objects) available for download.

1000 shown
# Score RA Dec Band rB rR rZ Image Full

How Anomaly Detection Works

What is an anomaly? A spectral autoencoder is a neural network trained to compress and reconstruct normal DESI spectra (stars, galaxies, quasars). When it encounters a spectrum that doesn’t match any learned pattern, the reconstruction is poor — producing a high residual. Objects with total residual (anomaly score) above 5.0 are flagged. These are spectra the model literally “doesn’t know what to do with.”

What the score means: The anomaly score is the sum of reconstruction errors across DESI’s three spectrograph arms. Higher = more unusual. The score tiers are:

5 – 7Mildly unusual — spectrum slightly off from nearest template (151K objects) 7 – 10Moderately anomalous — noticeable spectral features not in training set (34K objects) 10 – 15Highly anomalous — spectrum dramatically different from all known classes (9.8K objects) 15+Extreme outlier — nothing in the training set remotely resembles this spectrum (101 objects)

Column Definitions & Glossary

Table Columns

Score
Total reconstruction error across all three spectrograph arms (B + R + Z). Higher = more anomalous.
RA
Right Ascension (degrees, 0–360). East-west position on the sky in the ICRS coordinate system.
Dec
Declination (degrees, -90 to +90). North-south position on the sky.
Band
Which spectrograph arm has the largest residual: B (blue, 3600–5800Å), R (red, 5760–7620Å), or Z (near-IR, 7520–9824Å).
rB
Reconstruction error in the B (blue) arm. High rB = anomalous blue-end features (e.g. unusual emission lines, UV excess).
rR
Reconstruction error in the R (red) arm. High rR = anomalous mid-optical features (e.g. unusual continuum, absorption).
rZ
Reconstruction error in the Z (near-infrared) arm. High rZ = anomalous near-IR features (e.g. high-redshift emission shifted into IR).
TID
DESI TARGETID — unique identifier for this object in the DESI DR1 catalog.

Astronomy Terms

AGN
Active Galactic Nucleus — a supermassive black hole at a galaxy’s center actively accreting matter, producing bright emission across the spectrum.
QSO
Quasi-Stellar Object (Quasar) — an extremely luminous AGN, often at high redshift (z > 1). Key tracer for large-scale structure measurements.
Near-IR
Near-Infrared — wavelengths just beyond visible red light (~7000–10000Å in the Z-band). High-redshift features shift into this range.
High-z
High redshift — objects at great cosmological distances (z > 1.5), seen as they were billions of years ago.
BAL
Broad Absorption Line — a QSO showing wide absorption troughs from high-velocity outflows. Rare (~10% of QSOs) and often missed by pipelines.
PSF
Point Spread Function — the image of a point source (star or distant QSO). “PSF morphology” means it looks like a point, not an extended galaxy.
REX
Round Exponential — a Legacy Survey morphology classification for a small, round, slightly extended source.
SER
Sérsic profile — a Legacy Survey classification for galaxies fit with a Sérsic surface brightness profile.
SIMBAD
Set of Identifications, Measurements and Bibliography for Astronomical Data — the most comprehensive database of known astronomical objects (CDS, Strasbourg).
NED
NASA/IPAC Extragalactic Database — a database focused on extragalactic objects (galaxies, QSOs, clusters).
fNL
The amplitude of primordial non-Gaussianity — a key parameter for distinguishing between the Big Bounce and inflation.

Cross-Reference Status

How do we know these are previously unidentified? We cross-match anomaly positions against multiple astronomical databases. An object NOT found in any of these catalogs is a strong candidate for being genuinely new.

Database What it contains Objects Checked? Matches
SIMBADMost comprehensive catalog of identified astronomical objects ~17M Top 10,000 21/10,000 (0.2%) — 99.8% absent
NEDExtragalactic objects (galaxies, QSOs, clusters) ~400M Top 10,000 1,270/10,000 (12.7%) — 87.3% absent
Gaia DR31.8 billion stars with astrometry & photometry ~1.8B Top 1,000 6/1,000 (0.6%) — only 1 confirmed Galactic star
SDSS DR18Sloan Digital Sky Survey — spectra + photometry ~5M spectra API down SDSS API returning 500 errors — retry pending
AllWISE750M infrared sources — photometric detection catalog ~750M Top 1,000 15/1,000 (1.5%) — 98.5% have no IR counterpart
Milliquas v8Comprehensive QSO catalog — all known quasars ~1M Top 1,000 0/1,000 (0%) — ZERO are known QSOs
Liang+2023 EDR anomaliesPrior DESI EDR autoencoder anomaly catalog ~250K Pending Catalog not published as downloadable file
Nicolaou+2026 EDR anomaliesPrior DESI EDR VAE anomaly catalog ~208K Pending Catalog not published as downloadable file

Current status: 6 major databases cross-matched, representing over 3 billion cataloged objects. SIMBAD: 0.2% matched. NED: 12.7%. AllWISE: 1.5%. Milliquas: 0%. Gaia: 0.6% (1 star). SDSS: API down, retry pending. The top 1,000 anomalies are overwhelmingly absent from all major astronomical catalogs — these are genuinely uncataloged objects.

Prior Work & Attribution

Prior work: Autoencoder anomaly detection on DESI was pioneered by Liang et al. (2023) on ~250K EDR spectra and Nicolaou et al. (2026, MNRAS, 46 co-authors) on ~208K EDR spectra. This catalog extends their approach by ~90x in scale to the full DR1 release. Both teams must be cited in any publication using this catalog.

Anomaly #1

Legacy Survey cutout
Legacy Survey

Review Notes