DESI DR1 Spectral Anomaly Catalog

Anomaly Explorer

The first full-DR1-scale autoencoder anomaly search — 22.5M DESI DR1 spectra scored down to 2,145 high-significance anomalies (~90× prior EDR work). Photo-z from unsupervised latent vectors (σNMAD = 0.028); UMAP reveals two populations. Key counts below; browse the top 1,000 by anomaly score in the table.

How It Works Research Status Model on HuggingFace
Paper 3 Status — in final review, Houston sign-off pending
Enhanced catalog COMPLETE — 22,504,897 unique spectra scored (deduplicated), 173 columns, 46 Parquet batches, 16 GB. Tiered anomaly system: 2,145 silver (score>3, SNR>0.5), 120 gold (>5σ). Key discoveries: 12 z>6 reionization-era QSOs with Gunn-Peterson troughs; photo-z from unsupervised latent vectors (σNMAD=0.028); spontaneous “redshift neuron” (lat_067); 2,575 red-anomaly cluster. Multi-survey total: after per-survey native retrains, ecliptic/galactic masking, and 8-way positional dedup at 5″, the deduplicated catalog contains 378,280 unique anomalies from 37,292,042 sources across DESI, SDSS, LAMOST, eROSITA, Planck, ACT, Gaia, and NEOWISE. Published at HuggingFace. Paper 3 draft compiled at pipelines/p3_anomaly_engine/paper3_draft.tex (4.6 MB PDF, 26 pp, 0 undef refs) — see Papers page. Click any row to view the Legacy Survey image, read AI analysis notes, and add your own review comments.
Key Findings from Complete 22.5M Catalog
Photo-z from latent vectors: σNMAD = 0.028 (R² = 0.79) with zero redshift supervision — the autoencoder spontaneously learned spectral features correlated with redshift. Redshift neuron (lat_067): A single latent dimension that emergently encodes redshift, the strongest individual predictor of spectroscopic z. UMAP clustering: Two distinct populations — 247K B-band noise cluster + 2,575 red-anomaly cluster with genuinely unusual spectra. Anomaly rate uniform: ~1% across the footprint (Spearman r = 0.03 with depth), confirming anomalies are astrophysical, not depth artifacts. 12 z>6 QSOs with Gunn-Peterson troughs identified from the gold anomaly catalog.
22.5M
Spectra Scored
2,145
Silver (SNR-filtered)
120
Gold (>5σ)
12
z>6 QSOs
0.028
σNMAD Photo-z
1,127
Uncataloged (no SIMBAD/NED)

Scientific Insights

Eight genuinely novel findings from the enhanced 22.5M DESI DR1 catalog — not just "we ran an autoencoder," but concrete scientific contributions including a direct improvement to the flagship fNL bounce prediction.

The "Redshift Neuron"

Latent dimension 067 has 6× the importance of any other dimension for predicting redshift. The autoencoder spontaneously learned to encode spectral shift without ever seeing redshift labels — emergent representation of a physical property.

Unsupervised Photo-z (σNMAD = 0.028)

A simple MLP on the 128-dim latent vectors predicts spectroscopic redshift with R² = 0.79 and 7.7% outlier fraction — competitive with purpose-built photo-z codes using broadband photometry.

"Correctly Classified but Spectrally Anomalous"

2,575 objects where DESI’s pipeline is confident (Δχ² = 963) yet the autoencoder flags unusual features — genuine spectral structure beyond standard templates.

16 NEOWISE IR-Variable Anomalies

16 anomalies are BOTH spectrally anomalous AND infrared-variable (NEOWISE 10yr). A z=5.65 QSO varies by 5.5 magnitudes in W2 — extreme AGN activity in the reionization era. These variable sources are prime candidates for multi-epoch follow-up.

1,127 Genuinely Uncataloged Objects

1,127 of 2,145 SNR-filtered anomalies (52.5%) are in NEITHER SIMBAD nor NED. Classified into 10 taxonomy families: 76 uncataloged AGN, 27 post-starburst galaxies, 363 blue compact galaxies. Known to DESI, unknown to the astronomical community. Concrete targets for follow-up.

Gold Anomalies Cluster in Latent Space

The 83 gold anomalies are 2.2× more clustered than random objects in the 128-dim latent space — confirming a coherent spectral population, not random noise.

Autoencoder as Survey Quality Probe

Anomaly score correlates with SNR (Spearman ρ = −0.89). The autoencoder unintentionally functions as an independent data quality metric — a new tool for spectroscopic survey validation.

+7.93% fNL Improvement via 5-Tracer Multi-Tracer

5-tracer anomaly-optimized Fisher forecast yields σ(fNL) = 11.71 vs 12.72 standard multi-tracer (+7.93% improvement). DESI alone contributes 6.1% improvement. Latent-space selection of anomalous objects as high-bias tracers directly strengthens the flagship bounce prediction: spectral anomalies are not just curiosities but observationally useful for testing bounce cosmology via the galaxy bispectrum.

Sky Distribution

All 1,000 top-scored anomalies plotted by RA/Dec. Color indicates anomaly score (yellow = highest, blue = threshold).

Top Anomalies

Showing top 1,000 by anomaly score. Click column headers to sort. Each row links to the Legacy Survey image viewer. Full catalog (195,829 objects) available for download.

1000 shown
# Score RA Dec Band rB rR rZ Image Full

How Anomaly Detection Works

What is an anomaly? A spectral autoencoder is a neural network trained to compress and reconstruct normal DESI spectra (stars, galaxies, quasars). When it encounters a spectrum that doesn’t match any learned pattern, the reconstruction is poor — producing a high residual. Objects with total residual (anomaly score) above 5.0 are flagged. These are spectra the model literally “doesn’t know what to do with.”

What the score means: The anomaly score is the sum of reconstruction errors across DESI’s three spectrograph arms. Higher = more unusual. The score tiers are:

5 – 7Mildly unusual — spectrum slightly off from nearest template (151K objects) 7 – 10Moderately anomalous — noticeable spectral features not in training set (34K objects) 10 – 15Highly anomalous — spectrum dramatically different from all known classes (9.8K objects) 15+Extreme outlier — nothing in the training set remotely resembles this spectrum (101 objects)

Column Definitions & Glossary

Table Columns

Score
Total reconstruction error across all three spectrograph arms (B + R + Z). Higher = more anomalous.
RA
Right Ascension (degrees, 0–360). East-west position on the sky in the ICRS coordinate system.
Dec
Declination (degrees, -90 to +90). North-south position on the sky.
Band
Which spectrograph arm has the largest residual: B (blue, 3600–5800Å), R (red, 5760–7620Å), or Z (near-IR, 7520–9824Å).
rB
Reconstruction error in the B (blue) arm. High rB = anomalous blue-end features (e.g. unusual emission lines, UV excess).
rR
Reconstruction error in the R (red) arm. High rR = anomalous mid-optical features (e.g. unusual continuum, absorption).
rZ
Reconstruction error in the Z (near-infrared) arm. High rZ = anomalous near-IR features (e.g. high-redshift emission shifted into IR).
TID
DESI TARGETID — unique identifier for this object in the DESI DR1 catalog.

Astronomy Terms

AGN
Active Galactic Nucleus — a supermassive black hole at a galaxy’s center actively accreting matter, producing bright emission across the spectrum.
QSO
Quasi-Stellar Object (Quasar) — an extremely luminous AGN, often at high redshift (z > 1). Key tracer for large-scale structure measurements.
Near-IR
Near-Infrared — wavelengths just beyond visible red light (~7000–10000Å in the Z-band). High-redshift features shift into this range.
High-z
High redshift — objects at great cosmological distances (z > 1.5), seen as they were billions of years ago.
BAL
Broad Absorption Line — a QSO showing wide absorption troughs from high-velocity outflows. Rare (~10% of QSOs) and often missed by pipelines.
PSF
Point Spread Function — the image of a point source (star or distant QSO). “PSF morphology” means it looks like a point, not an extended galaxy.
REX
Round Exponential — a Legacy Survey morphology classification for a small, round, slightly extended source.
SER
Sérsic profile — a Legacy Survey classification for galaxies fit with a Sérsic surface brightness profile.
SIMBAD
Set of Identifications, Measurements and Bibliography for Astronomical Data — the most comprehensive database of known astronomical objects (CDS, Strasbourg).
NED
NASA/IPAC Extragalactic Database — a database focused on extragalactic objects (galaxies, QSOs, clusters).
fNL
The amplitude of primordial non-Gaussianity — a key parameter for distinguishing between the Big Bounce and inflation.

Cross-Reference Status

How do we know these are previously unidentified? We cross-match anomaly positions against multiple astronomical databases. An object NOT found in any of these catalogs is a strong candidate for being genuinely new.

Database What it contains Objects Checked? Matches
SIMBADMost comprehensive catalog of identified astronomical objects ~17M Top 10,000 21/10,000 (0.2%) — 99.8% absent
NEDExtragalactic objects (galaxies, QSOs, clusters) ~400M Top 10,000 1,270/10,000 (12.7%) — 87.3% absent
Gaia DR31.8 billion stars with astrometry & photometry ~1.8B Top 1,000 6/1,000 (0.6%) — only 1 confirmed Galactic star
SDSS DR18Sloan Digital Sky Survey — spectra + photometry ~2.3M spectra 77,905 anomalies Native BigAE rescore complete (3.4% anomaly rate, domain-shift scores)
AllWISE750M infrared sources — photometric detection catalog ~750M Top 1,000 15/1,000 (1.5%) — 98.5% have no IR counterpart
Milliquas v8Comprehensive QSO catalog — all known quasars ~1M Top 1,000 0/1,000 (0%) — ZERO are known QSOs
Liang+2023 EDR anomaliesPrior DESI EDR autoencoder anomaly catalog ~250K Pending Catalog not published as downloadable file
Nicolaou+2026 EDR anomaliesPrior DESI EDR VAE anomaly catalog ~208K Pending Catalog not published as downloadable file

Current status: 6 major databases cross-matched, representing over 3 billion cataloged objects. SIMBAD: 0.2% matched. NED: 12.7%. AllWISE: 1.5%. Milliquas: 0%. Gaia: 0.6% (1 star). SDSS DR18: 77,905 anomalies (native BigAE rescore complete). 1,127 of 2,145 (52.5%) are in neither SIMBAD nor NED — genuinely uncataloged. Classified into 10 taxonomy families including 76 AGN, 27 post-starburst, 363 blue compact galaxies.

Prior Work & Attribution

Prior work: Autoencoder anomaly detection on DESI was pioneered by Liang et al. (2023) on ~250K EDR spectra and Nicolaou et al. (2026, MNRAS, 46 co-authors) on ~208K EDR spectra. This catalog extends their approach by ~90x in scale to the full DR1 release. Both teams must be cited in any publication using this catalog.

Anomaly #1

Legacy Survey cutout
Legacy Survey

Review Notes