I'm an amateur astronomer and data enthusiast who's been diving into the public datasets from missions like TESS and Kepler, trying my hand at identifying potential exoplanet candidates from light curve data. While I can follow the basic transit method tutorials, I'm hitting a wall when it comes to distinguishing genuine planetary signals from the noise of stellar variability or instrumental artifacts, especially for smaller, Earth-sized candidates. For others who have ventured into this citizen science or personal research space, what software tools and filtering techniques did you find most reliable for processing and analyzing the raw light curves? How do you stay updated on the latest confirmation methods and classification criteria, and are there any particular resources or communities focused on the nuanced, hands-on analysis of exoplanet data beyond just reading published papers?
Earth-sized signals are tricky. Here’s a practical workflow I’ve used: pull Kepler/TESS light curves from MAST via Lightkurve, detrend with EVEREST (Kepler/K2) or PDCSAP (TESS) and apply CBVs if available, then sigma-clip outliers. Run a transit search with Transit Least Squares (TLS) to pull shallow candidates. Fit promising signals with batman for a robust transit model and use exoplanet (or PyTransit) to sample the posterior. For vetting, compute a false positive probability with vespa or TRICERATOPS and check centroid shifts with pixel-level analysis or difference imaging to rule out background eclipsing binaries. Finally, run injection-recovery tests to estimate detection completeness for Earth-sized signals in your data. If you need speed, do offline MCMC on the best candidates and keep online updates lightweight.
Priors matter a lot here. Start with weakly informative priors (Gelman-style) and let the data update posteriors. For a binary/planet signal, a Beta prior on transit depth or a Beta(a,b) prior on planet probability works, while a log-rate prior with a Gaussian process can handle time-varying noise. Use hierarchical priors across similar stars or datasets to share strength. Prior predictive checks are essential to ensure your priors produce plausible light curves before you look at new data, and you should run a short elicitation with domain experts to translate engineering intuition into distribution parameters.
Recommended software stack for practical production use: Lightkurve for data access and basic detrending; Eleanor or EVEREST for more advanced detrending; TLS for transit search; batman or exoplanet for transit modeling; PyTransit as an alternative; PyMC/NumPyro for Bayesian inference; celerite or george for scalable Gaussian processes; vespa or TRICERATOPS for false positives; exoplanet-archive and related feeds for community benchmarks; and a lightweight online inference layer (particle filters or streaming Bayesian updates) if you need near-real-time alerts. Ground your pipeline with a clean data contract and versioned models.
Validation and collaboration are key. Do prior predictive checks with a few representative stars to show priors imply sane light curves; present calibration tests to non-Bayes folks with simple plots like predicted vs observed events and credible interval coverage. Have domain scientists review priors and interpretations; run cross-validation by splitting data into training/validation sets and compare posterior predictive checks across pipelines to catch biases. Keep a small, transparent log of decisions for stakeholders.
Useful resources and communities worth following: NASA Exoplanet Archive and their methods/docs; Exoplanet.eu and the Exoplanet Archive’s Light Curves section; Zooniverse Planet Hunters and Exoplanet Explorers for citizen science data; Astronomy Stack Exchange and r/astronomy for practical Q&A; GitHub repositories and tutorials for Lightkurve, TLS, batman, exoplanet, PyMC/NumPyro, and GP tools; the Vespa and TRICERATOPS documentation for false-positive assessment; and recent arXiv surveys and review articles on exoplanet validation methods.
Want a quick starter plan? If you share which telescope (Kepler/K2 or TESS), cadence, and typical SNR, I can map out a concrete 4-week setup: data access, an initial TLS pipeline, a priors elicitation session with a domain expert, and an initial validation checklist with a simple dashboard.