Software Dvlpr 2🔍School of Medicine, Stanford, California, United StatesNew📁Information Technology Services📅1 day ago Post Date📅105251 Requisition #The Stanford Center for Genomics and Personalized Medicine (SCGPM) has an exciting opportunity available for a motivated Bioinformatics Systems Architect/Team Leader to create innovative data infrastructures that will automate the process of turning big genomic data into biomedical insights. The ideal person for this position is a keen listener who can interpret biological questions, assess the value and relevance of different technologies and methods, and deliver actionable technical solutions.Background:
The Department of Veterans Affairs (VA) has commissioned the sequencing of thousands of whole genomes from participants in the Million Veteran Program (MVP) [https://www.mvp.va.gov/]. This data is currently being delivered to the SCGPM's cloud computing environment and constitutes one of the largest repositories of whole-genome sequencing data in the world. The scale and richness of this data make it an incredible resource for biomedical research. Our goal is to turn this data lake into a data commons: a dynamic computing environment where researchers bring questions and get answers, all without having to go through the ordeal of manually collecting, cleaning, massaging, scrubbing, sorting, transforming, and filtering data.Position:
In this position, you would be the lead architect and system implementer of the cloud-based MVP data management system that we have created called Trellis. Trellis keeps track of the petabytes of sequence data contributed to the MVP by veterans. It also orchestrates the processing of that data into derivative files, while keeping track of what programs were used to transform the data, maintaining a detailed record of data provenance.To manage the enormous volumes of biomedical research data that the MVP generates, we built and run Trellis in the Google Cloud Platform. The Trellis architecture takes advantage of many serverless cloud services such as Cloud Functions and Pub/Sub to make a workflow which responds to the arrival of new data by initiating pipeline processes automatically.A production version of Trellis has already processed the whole genomic sequences of 150,000 veterans and we plan to process at least as many more in the coming year. You would be in charge of keeping this production system running and optimized, and you would interface with the DevOps team which will maintain that system in a FedRAMP-secure environment.Now that we have proven that we can process and manage biomedical data at scale, our desire is to make MVP data more easily accessible to VA-internal researchers and to the scientific community at large. Possible directions for this sharing include creating a visualization front-end to allow researchers to experiment with data graphically and providing a cohort selection mechanism so subpopulations of veterans can be studied. You would continue the development of the Trellis system to integrate new data from the VA and to present Trellis data to the research community with tools and interfaces which are easy-to-use and powerful. To help you achieve these goals, you would direct a small team of excellent, self-starting engineers in tasks like devising new pipelines for quality control and integrating demographic data with sequence data.This project has many open-source components, and you would be encouraged to publish details from your systems architecture work or results from processing the genomic data. As an example of a publication from this group, see this reference describing the early design of the Trellis system:Ross, P.B., Song, J., Tsao, P.S. et al. Trellis for efficient data and task management in the VA Million Veteran Program. Scientific Reports 11, 23229 (2021). https://doi.org/10.1038/s41598-021-02569-5Our Team:
Our SCGPM bioinformatics team is a multi-disciplinary group composed of about a dozen scientists, engineers, and software developers with complementary backgrounds, each contributing their own expertise in managing and analyzing complex biomedical data [http://med.stanford.edu/gbsc/scgpm-team.html]. Projects supported by this team include the Stanford Genomics Sequencing Center, the VA Million Veteran Project, the NCI Human Tumor Atlas Network, Human BioMolecular Atlas Program, and the Stanford Metabolic Health Center.This position can be on-site, fully remote, or hybrid.Bioinformatics System Architect Duties include:
Subscribe to job alerts and upload your resume!
*By registering with our site, you agree to our
Terms and Privacy Policy.