Login

I'm an amateur astronomer and data science hobbyist, and I've been diving into the publicly available datasets from missions like TESS and Kepler to try my hand at identifying potential exoplanet candidates through transit detection. While I can follow the basic tutorials on light curve analysis, I'm struggling with the more nuanced data cleaning steps needed to distinguish a true planetary transit from stellar variability or instrumental noise, especially for smaller, Earth-sized candidates. For others who have worked with this data, what specific preprocessing techniques or software pipelines did you find most effective for noise reduction, and how did you validate your candidate signals against false positives before getting too excited? I'd love to contribute meaningfully to citizen science projects.

You're venturing into a cool area with real-world impact. A pragmatic pipeline I’ve used starts with getting clean light curves from Kepler or TESS using Lightkurve, then moving through a series of preprocessing, search, and validation steps. For cleaning, lean on PDCSAP flux (Kepler) or PDCSAP/SPOC products (TESS) to remove systematics, but also keep a copy of SAP flux in case you want to compare. Do a conservative outlier removal (3-sigma clips) and mask data gaps. Detrending is the tricky part: model long-term trends with a modest polynomial or a Gaussian Process (GP) so you don’t distort transit shapes. The GP can leverage a quasi-periodic kernel to capture stellar rotation signals, while letting the transit signals pass through. Then run a transit search with Transit Least Squares (TLS) because it’s tuned to the real transit shape and is more sensitive to small planets than a traditional BLS. If TLS flags a candidate, require multiple events at the same depth and period. For vetting, check that the transit signal remains coherent across multiple sectors, do a centroid analysis (look for shifts indicating a background eclipsing binary), compare odd-even transit depths, and inspect difference images if you have pixel-level data. A shallow transit is easy to mistake for noise—proof comes from cross-validation across sectors and instruments. Finally, quantify the false-positive probability with a tool like Vespa if you have the right context, or at least have a transparent discussion of potential contaminants. If you want, I can share a starter notebook using lightkurve + TLS to get you going.

Login
Username:
Password:	Lost Password?
	Remember me