Multi-cloud support for workflow execution engine - ON-114

Preferred Disciplines and Level: Software engineer (Masters, PhD or PDF)
Company: DNAstack
Project Length: 8-12 months (2 units)
Desired start date: ASAP
Location: Toronto, Ontario
No. of Positions: 1
Preferences: Language: English

About the Company: 

DNAstack is a Toronto-based software company developing a cutting-edge cloud-based platform for genomics analysis, interpretation and sharing. We are a small team of passionate software engineers, bioinformaticians, geneticists and entrepreneurs, helping to define the standards that will drive the field of genomics into the future.

We are looking for talented interns to join our team and assist in the design and development of various aspects of the backend of our platform, and open source projects. We are agile and move quickly. You can expect to tackle tough problems, design and implement features for a robust, secure and scalable cloud-based platform. You will also have the opportunity to be involved in the development of standards defining the future of genomics.

Project Description:

DNAstack provides a secure and scalable environment for execution of bioinformatics workflows. To do so, we rely on open-source workflow execution engines, and it’s in our best interest to make these execution engines compatible with the latest standards, and allow them engines to run on multiple public clouds.

Toil is an open-source pure-Python workflow engine with support for multiple execution environments, such as AWS or HPC. TES is an emerging standard for describing batch execution tasks. Pipelines API is Google’s way to create, run, and monitor jobs that execute command-line tools on Google Compute Engine VMs in a Docker container. Neither TES, nor Pipelines API are currently supported by Toil. The goal of this project is to enable Toil on Google Cloud Platform (GCP) through a combination of TES and Google’s Pipelines API.

Research Objectives:​

This task will involve the following objectives :

  • Evaluate TES and Pipelines API with respect to their suitable for use in a batch system for Toil
  • Work with Google’s Pipelines team, Toil team and core team at DNAstack to find the right approach and solution to the problem
  • Integration of a TES client into Toil
  • Implementation of TES on the Google Cloud Platform with Pipelines
  • Full working implementation of Toil on GCP

Methodology:

To be determined

Expertise and Skills Needed:

  • Extensive experience with Python
  • Experience with backend web development, design and implementation of RESTful web services
  • Hands-on experience with SQL and NoSQL databases and building systems on top of them
  • Strong understanding of professional software development and design practices
  • Familiarity with cloud computing and building distributed systems with microservice architecture
  • Motivation and ability to work independently and as part of an agile, global team
  • Knowledge of bioinformatics/genomics is an advantage, but not required

 

For more info or to apply to this applied research position, please

  1. Check your eligibility and find more information about open projects.

  2. Complete this webform. You will be asked to upload your CV. Remember to indicate the title of the project(s) you are interested in and obtain your professor’s approval to proceed.

Program: