Research Data Scientist

LEIDOS • Full-time • SAN DIEGO, CA • 4d ago

Description Are you looking for a career that will make an impact? The Health and Civil Sector at Leidos currently has an opening for a Research Data Scientist to work within the Naval Health Research Center's (NHRC) Warfighter Performance Department in San Diego, CA. The NHRC Sleep, Tactical Efficiency, and Endurance Laboratory (STEEL) conducts research with a focus on cognitive endurance, physiology, sleep, fatigue, and military performance in laboratory and operational environments. We are looking for a Research Data Scientist with a strong data science background and experience working with human physiological data (such as motion sensor, accelerometry, heart rate, HRV) and longitudinal human cohort data (including demographics, biomarkers, and sleep outcomes). This role requires a combination of data science, statistical analysis, and engineering skills to process, manage, and analyze large, complex datasets related to human health research. Are you ready for unique and exciting work? This role offers the opportunity to work on impactful research projects related to warfighter performance and human health, using cutting-edge data science techniques to improve our understanding of human performance outcomes. You will be part of a dynamic, collaborative research team that values high standards in data management, statistical rigor, and code quality. If you're passionate about using your data science and engineering skills to unlock valuable insights from human health data, and you're excited to work in an interdisciplinary research environment, we encourage you to apply. The ideal candidate will have expertise in programming languages like Python, R, and SQL; experience working with cloud-based data storage and distributed computing platforms (AWS, Azure, Google Cloud); and proficiency in tools like Jupyter, Pandas, Shiny, and DataBricks. You will be responsible for developing data pipelines (including those that are automated and scheduled), performing complex data analyses that can address the stakeholders' decision-making needs, and working closely with cross-disciplinary research teams to drive insights from diverse datasets. Strong statistical background and programming proficiency are essential for this role. Key Responsibilities: Data Management & Processing:Manage and process large volumes of human health data, including physiological data (motion sensor, accelerometry, heart rate, HRV) and longitudinal cohort data (demographics, sleep outcomes, device use). Preprocess data from multiple sources (ingest, clean, transform) to ensure that they are suitable for analysis and model building. Work with heterogeneous data sources and formats (structured and unstructured data). Data Pipeline Development: Design, develop, and maintain automated data pipelines that run on schedules or are triggered by events. Ensure that these pipelines are robust, reliable, and scalable, and deliver updated data to support ongoing research and analytical needs. Database Construction & Management: Build databases to store and manage large datasets efficiently. Ensure data integrity, security, and accessibility for stakeholders. Programming & Code Optimization: Write clean, maintainable, and efficient code in Python and R. Continuously improve the performance of existing scripts, pipelines, and systems, ensuring optimal functionality. Dashboard Development: Collaborate with stakeholders to design, implement, and maintain interactive dashboards that provide up-to-date research insights. Ensure that dashboards automatically update as new data is processed through the pipeline. Collaboration & Documentation: Work closely with research teams, data scientists, and data engineers to understand data needs and ensure solutions are aligned with project objectives. Provide thorough documentation for all tools, pipelines, and processes to enable knowledge sharing and system maintenance. Data Analysis & Statistical Modeling:Perform advanced statistical analysis on human physiological and cohort data. Apply a variety of analytical techniques, including regression modeling, time-series analysis, survival analysis, and multivariate techniques. Use machine learning algorithms for predictive modeling, anomaly detection, and trend analysis in human health data. Develop and implement algorithms for analyzing accelerometry, heart rate, HRV, and other physiological metrics. Programming & Automation:Write clean, efficient, and scalable code in Python, R, and other relevant languages to automate data processing and analysis tasks. Utilize libraries and frameworks such as Pandas, NumPy, SciPy, and scikit-learn for data analysis and statistical modeling. Develop interactive data visualizations and dashboards using Shiny (R) and Python libraries (e.g., Dash, Plotly). Utilize Jupyter or similar tools (including JupyterLab, Notebooks, and other tools) for reproducible research, collaboration, and visualization. Collaboration & Research Support:Work closely with researchers, clinicians, and data scientists to design and implement data analysis workflows for health-related research projects. Provide technical expertise and support for data analysis, model validation, and interpretation of results. Ensure that data analysis and results are documented and communicated effectively to non-technical stakeholders. Continuous Improvement:Keep up to date with emerging tools, techniques, and methodologies in data science and research. Continuously improve processes and workflows for efficiency, scalability, and reliability. Actively contribute to best practices in coding, version control (e.g., Git), and reproducible research. If this sounds like an environment where you can thrive, keep reading for the keys to success! Required Qualifications: Master's degree in Data Science, Biomedical Engineering, Statistics, Computer Science, or a related field. Minimum of 4 years of prior relevant work experience. Proficiency in Python (especially using Pandas, NumPy, scikit-learn) and R (with expertise in Shiny, ggplot2, and more generally the tidyverse) for data analysis, automation, and pipeline development. Strong understanding of data structures, algorithms, and best practices for writing efficient and maintainable code. Experience with data visualization tools such as Qlik Sense, Tableau, or Power BI. Strong background in data processing, including cleaning, wrangling, and transforming raw data, especially from large and complex datasets. Solid foundation in statistical analysis/machine learning (e.g., hypothesis testing, regression analysis, decision trees, random forests, support vector machines, survival analysis, time-series analysis, predictive modeling using a variety of algorithms). Experience with machine learning and predictive modeling techniques applied to human health data (including physiological measurements and cohort data). Ability to derive insights from human physiological and cohort data, such as motion sensor data, heart rate variability (HRV), VO2 max, and longitudinal health records. Previous experience working with human physiological data (motion sensors, accelerometry, heart rate, HRV) and/or longitudinal human cohort data (demographics, biomarkers, sleep outcomes). Experience managing and analyzing large datasets in a research environment, including working with raw, messy, or unstructured data. Strong problem-solving abilities, with an ability to tackle complex data challenges in a research setting. Excellent communication skills, with the ability to explain complex technical concepts to both technical and non-technical stakeholders. Ability to work independently as well as part of an interdisciplinary research team. Enthusiastic about continuous learning and intellectual growth; staying up to date with emerging tools, technologies, and methodologies in data science and human health research. We strongly encourage efforts to improve our research processes and the quality of knowledge products delivered to stakeholders. Ability to manage multiple projects simultaneously, prioritize tasks, and work independently in a fast-paced environment. Models Leidos values at all times. Due to federal contract restrictions, candidate must be a U.S. citizen and able to obtain a Secret Security Clearance. Preferred Qualifications: Experience with Jupyter Notebooks, JupyterLab, or similar tools for interactive data analysis and visualization. Experience with DataBricks for collaborative data analysis and large-scale data processing. Proficiency in Spark and programming in a distributed computing paradigm is desirable, but not a requirement. Familiarity with cloud platforms (AWS, Azure, or Google Cloud) and experience working with cloud-based infrastructure for data storage, processing, and analysis. Experience with data visualization tools such as Qlik Sense, Tableau, Power BI, or custom dashboards is a plus. Intrinsic interest in writing high-quality, clean, and efficient code. Naval Health West - NHW Original Posting Date: 2024-11-20 While subject to change based on business needs, Leidos reasonably anticipates that this job requisition will remain open for at least 3 days with an anticipated close date of no earlier than 3 days after the original posting date as listed above. Pay Range: Pay Range $81,250.00 - $146,875.00 The Leidos pay range for this job level is a general guideline onlyand not a guarantee of compensation or salary. Additional factors considered in extending an offer include (but are not limited to) responsibilities of the job, education, experience, knowledge, skills, and abilities, as well as internal equity, alignment with market data, applicable bargaining agreement (if any), or other law. REQNUMBER: R-00148359 All qualified applicants will receive consideration for employment without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, disability or veteran status. Leidos will consider qualified applicants with criminal histories for employment in accordance with relevant Laws. Leidos is an equal opportunity employer/disability/vet. S:WWSKMED GPSMALLB