Data Science and Biostatistics


Data Science is used to maximize the value and knowledge contained in your data to identify patterns, prioritize experimental variables, and perform predictive modeling.

For assistance with a NIAID bioinformatics or computational biology project, please LOGIN and return to this page

Our Data Analysis Methods include:


  • Identify patterns and outliers within numerical, text, or sequence-based data sets.
  • Use AI/ML algorithms and experimental design.
  • Data processing (e.g. feature engineering, transformation, normalization and imputation)
  • Developing and evaluating supervised and unsupervised machine learning models.
  • Variable ranking and prioritization.
  • Deep Learning and AI approaches.
  • Reproducible data science workflows.
  • Statistical Testing and Power Analysis.

Data Science and Biostatistics Team


  • Gabe Rosenfeld, Ph.D. (Group coordinator)
  • Dan Veltri, Ph.D. (Group coordinator)
  • Jingwen Gu, M.S.
  • Mariam Namawejje, Ph.D.
  • Mina Peyton, Ph.D.

Selected Publications


  • Rosenfeld, G., Gabrielian, A., Wang, Q., Gu, J., Hurt, D., Long, A., & Rosenthal, A. (2021). Radiologist observations of computed tomography (CT) images predict treatment outcome in TB Portals, a real-world database of tuberculosis (TB) cases. PLOS ONE, 16(3), e0247906.
  • Luo, L., Yan, S., Lai, P., Veltri, D., Oler, A., & Xirasagar, S. et al. (2021). PhenoTagger: a hybrid method for phenotype concept recognition using human phenotype ontology.  Bioinformatics, 37(13), 1884-1890.
  • Gabriel Rosenfeld, Angelina Angelova, Chris Shin, Mariam Quinones, Darrell Hurt. (2021). Current challenges in microbiome metadata collection. bioRxiv 2021.05.05.442781
  • K Singh, M Burkhardt, S Nakuchima, R Herrera, et al. 2020. "Structure and function of a malaria transmission blocking vaccine targeting Pfs230 and Pfs230-Pfs48/45 proteins" Communications Biology 3 (395), 1-12.
  • D. Veltri, U. Kamath, A. Shehu. 2018. "Deep Learning Improves Antimicrobial Peptide Recognition" Bioinformatics 34 (16), 2740–2747.
Related tools developed by BCBB

Triptase Calculator 

Total Rise In Peripheral Tryptase After Systemic Event (TRIPTASE) Calculator is a R Shiny app, PI: Jonathan Lyons


Data Sharing API 

The TB DEPOT Analytic API is designed to facilitate easy retrieval of information from TB DEPOT using programming languages readily available to data scientists and researchers.


tbportals.depot.api R package 

The tbportals.depot.api R package aims to provide a convenient wrapper functionality in R to the TB Portals Analytic API containing the tidy analytic data from TB Portals DEPOT database.