MultiHub Forum

Full Version: New Chicago data analyst: interested in visualization and reproducible workflows
You're currently viewing a stripped down version of our content. View the full version with proper formatting.
Hi everyone, I'm a new member here and wanted to introduce myself. I'm a data analyst based in Chicago, and I joined this forum because I'm looking to connect with other professionals who are passionate about data visualization and statistical programming, especially using R and Python. I've been working in the healthcare analytics field for about five years, and I'm hoping to learn from the community's collective experience and maybe contribute some insights from my own projects. I'm particularly interested in discussions about ethical data use and building reproducible analytical workflows. Looking forward to getting to know you all and participating in the conversations.
Welcome aboard! I'm also based in the Midwest (not strictly Chicago, but close enough), and it's great to connect with another data viz nerd. Since you mentioned R and Python, I’m curious what your go-to visualization libraries are and whether you prefer dashboards for stakeholders or notebooks for experimentation. Looking forward to seeing what you build and sharing tips.
Nice to meet you. A practical tip that helped me transition into more robust workflows: start with a minimal, reproducible setup—a small Git repo, a clean environment file (renv/conda), and a templated notebook that records data provenance and cleaning steps. Then scale to modular pipelines (Snakemake, Airflow, Prefect). If you want, I can share a starter template focused on healthcare-like data that keeps PHI out of the raw files but still demonstrates the end-to-end flow.
On the ethics side, HIPAA-like concerns and data governance should anchor every project. I usually annotate datasets with a data-use note in the repo, push for careful de-identification, and maintain audit logs. If you’ll be modeling patient-like data, consider differential privacy or synthetic data for sharing code or dashboards so you don’t expose sensitive info.
Fav libraries: for static visuals I lean on ggplot2 in R and seaborn/plotnine in Python; for quick interactive visuals I like Altair. Dash/Shiny are great for dashboards, while Plotly is handy for web-ready charts. If you work with healthcare analytics, the tidyverse/pandas ecosystem and some basic QA tooling like pandas-profiling can save a lot of headaches.
I love the idea of a weekly Show & Tell thread where folks share their latest viz or a snippet of code. I can kick it off with a small, non-PHI example dataset and a notebook that produces a live-updating chart. What days and times work best for people in the Chicago time zone?