Middle Data Scientist
We are seeking a Data Insights Programmer to support the development of its patient matching service for oncology clinical trials.
Job Description
In this role, you will work closely with cross-functional teams to build upon our shared codebase that powers this work, ensuring the highest quality of patient matches for our life sciences partners.
The ideal candidate will be proficient in R programming and SQL, with experience in data extraction and transformation from Snowflake and other databases. They will ensure data integrity, collaborate with cross-functional teams, and align technical solutions with clinical trial objectives. Familiarity with Git and a proactive, detail-oriented approach are essential. Experience with clinical or oncology data is a plus.
Requirements:
- BS in a technical field (e.g., data science, computer science, engineering, mathematics, applied statistics, health economics, etc.);
- 1+ year of relevant experience in data programming.
- Experience working with real-world or clinical trial data, particularly oncology-specific datasets will be a plus; Â
- Strong knowledge of R, especially with data cleaning libraries like dplyr and tidyverse;
- Mid-to-advanced proficiency in SQL, particularly with data extraction and window functions;
- Familiarity with relational databases (PostgreSQL, MySQL), with primary experience using Snowflake;
- Experience with Git for version control, with exposure to GitHub/GitLab for managing code repositories;
- Experience with Python, particularly with Pandas will be a plus;
- Understanding of statistical concepts for generating and interpreting data analyses;
- High sense of ownership for deliverables, ensuring they are timely and of high quality;
- Experience working independently and proactively consult or ask questions when needed;
- Strong communication skills and a collaborative mindset when working in cross-functional teams.
Responsibilities: Â
- Write reusable and efficient R and SQL code to clean, transform, and analyze large datasets;
- Develop and maintain R code to identify patients eligible for oncology clinical trials;
- Extract data from Snowflake and other databases (PostgreSQL, MySQL) and load it into R for analysis;
- Implement business rules to manipulate data in alignment with project requirements;
- Ensure data integrity and security across the data lifecycle, adhering to data governance best practices;
- Collaborate closely with cross-functional teams to align technical solutions with business and clinical trial needs;
- Manage and share code using Git and platforms like GitHub/GitLab, maintaining a collaborative and reusable codebase;
- Collect and understand business requirements to create accurate datasets for oncology clinical trials.
What we offer:
- USD Monthly Payment;
- 100% remote opportunity;
- 10 business days of paid vacation per year (can be taken after 6 months in CT);
- Up to 10 national holidays (either US or country of residence);
- 5 days personal days off (can be taken after 3 months in CT);
- Travel expenses covered when applicable;
- Referral program;
- Paid certification program;
- Personalized personal development plan (PDP);
- Language platform.
Apply now
Middle Data Scientist