A cover letter is required for consideration for this position and should be attached as the first page of your resume. The cover letter should address your specific interest in the position and outline skills and experience that directly relate to this position.
The Comprehensive Mobile Precision Approach for Scalable Solutions in Mental Health Treatment (COMPASS) project is looking for an Application Programmer/Analyst Lead for our new NIMH project. COMPASS is focused on the effects of machine learning, wearable and mobile technology to both reduce mental health symptoms and predict response to clinic-based treatments. The ideal candidate should be passionate about building data and software infrastructure to power precision mental health research that integrates mobile sensing, electronic health records, and genomics. The candidate will create new databases and programs to assist in the collection and integration of complex data across multiple platforms, with a focus on research regulations. In addition, the candidate will provide technical leadership across statistical genetics pipelines and research web tools that serve investigators and collaborators. We are looking for someone, with a demonstrated background in genetic biostatistics who is detail oriented and will work to conceptualize, develop and implement complex program designs to assist with the study data collection, participant tracking, integration of electronic health records, and required sponsor funding reports. This is a unique opportunity to shape a modern genetics data platform that directly supports precision mental health research; to work in a collaborative environment with biostatisticians, clinicians, and software engineers in a strong culture of reproducibility and open science and to have meaningful opportunities to contribute to manuscripts and to serve as lead author on selected papers.
Our office is located at the University of Michigans North Campus Research Complex. Due to COVID-19 and the remote nature of this project, most work is currently being conducted through telework. However, the programmer may need to come into the office for brief periods of time, observing the University of Michigan health and safety measures.
NOTE: A cover letter is required for this application. In your cover letter, please address how you meet the required qualifications and your interest in genetic data analysis, programming and data capture. Those who do not meet the minimum requirements nor submit a cover letter will not be considered.
Responsibilities include, but are not limited to:
Genetic Data Analysis:
- Lead the design, implementation, and operation of production-grade genetics data pipelines: genotype/sequencing QC, imputation, GWAS, PRS, and fine-mapping; automate with workflow managers; containerize and monitor for reliability and reproducibility.
- Design and maintain scalable data models and databases for genomic and phenotypic data; implement ETL from diverse sources, including large biobanks and EHR-derived datasets.
- Contribute to study design discussions, translate research questions into technical requirements, and communicate options, tradeoffs, and timelines to stakeholders.
- Uphold data governance and research compliance, including HIPAA and data use agreements, support secure data sharing with partner institutions.
- Track emerging methods and tools in statistical genetics and scientific computing; evaluate and deploy those that improve performance, cost, or scientific value.
Code and documentation development
- Develop and improve processes for extracting, transforming and loading various complex data sources (reference data sets, EHR, etc.) into study data pipeline.
- Build, maintain, and update databases/scripts/programs aimed at improving data cleaning processes by writing automatic scripts and templates (using python, R, JavaScript, bash shell, or equivalency).
- Develop data documentation (codebooks, technical appendices, etc.) that includes key information for processing and analysis following standard guidelines.
- Identify opportunities to improve the efficiency and quality of data processing and analysis and collaborate with team members to implement solutions.
- Implement monitoring solutions to ensure data integrity and pipelines are operational.
- Investigate new technologies and share with the team.
Oversee Genetic Data Management Team
- Establish coding standards, documentation templates, and review practices; mentor developers and analysts; collaborate closely with biostatisticians, clinicians, and data managers.
- Work collaboratively with project managers, data analysts and Principal Investigators (PIs) to understand project goals, data requirements, and objectives; answer questions about the structure and nature of datasets.
- Monitor data team workload and assist with prioritization and ensure that data priorities are on track/schedule. Including, ensuring that team can:
- Deliver analyzable datasets in multiple formats (e.g., CSV/TSV, Excel, VCF/BCF, PLINK2, BGEN, SAVVY) to staff and faculty.
- Perform data cleaning, quality checks, and merging procedures on and prepare data sets for different statistical analyses or sponsor reports.
- Communicate proactively with project managers and PIs about data progress, timelines, and issues.
- Identify and resolve time-sensitive data issues.
- Implement data storage and retrieval solutions for efficient access by members of the research team.
Manuscript Development
- Support the writing team by preparing figures, methods, and reproducible analyses; coauthor manuscripts and lead selected method or application papers, including drafting, submission, and revisions.
- Develop data visualizations to effectively communicate data in reports, flowcharts, and dashboards.
- support research design, determining and interpreting research results, and independently writing up results of analyses.
- PhD in Biostatistics or Bioinformatics.
- 7-10 years of experience in systems analysis/programming
- Proficiency using relevant tools including Python, C/C++, R, SLURM cluster usage, Git, Snakemake, and Jupyter notebook.
- Demonstrated background in statistical genetics, including hands-on experience with GWAS; strong understanding of genotyping/sequencing QC, imputation, reference panels, and association testing
- Proven experience building and operating production-grade genetics data pipelines at scale, including workflow management (for example, Snakemake, Nextflow, or similar), containerization, and HPC (SLURM) or cloud environments.
- Experience working with large-scale human genetic datasets, such as the UK Biobank, NIH All of Us, FinnGen