Aug 1
Top Secret/SCI
Mid Level Career (5+ yrs experience)
IT - Data Science
Fort Belvoir, VA•Reston, VA
Red Gate Group is seeking a talented Sr. Data Scientist to join our team supporting the Defense Threat Reduction Agency (DTRA) in Reston, VA or Ft. Belvoir, VA. As a Sr. Data Scientist, you will have hands-on experience in applied machine learning and NLP, including LLM implementation. This role involves building end-to-end ML workflows—from data prep to deployment—in a production environment handling massive streaming data via Kafka and data lakes. You’ll design models for document classification, extraction, summarization, and search, and own pipelines that process millions of documents weekly. Strong Python skills, a deep understanding of data exploration and visualization, and the ability to quickly grasp complex infrastructure are essential.
Qualifications
5+ years of experience in applied data science or ML roles, including using Python and NLP and LLM implementation
5+ years of experience with data exploration, data cleaning, data analysis, data visualization, or data mining
Experience with production-level systems, data lake environments, and streaming data, including Kafka
Experience implementing end-to-end ML workflows from data prep to deployment and evaluation
Ability to quickly learn infrastructure or systems concepts, including how pipelines interface with data lakes
Ability to design, implement, and iterate on ML models for document classification, extraction, summarization, and search
Ability to take ownership of data science workflows that interact with a production system streaming millions of documents per week
TS/SCI clearance
Bachelor's degree
Desired Qualifications:
Experience in collaborating with MLOps and infrastructure engineers to ensure robust model deployment, monitoring, and retraining pipelines
Experience supporting platform components such as documents indexing or search, GPU workloads, and distributed storage, including Cloudera
Experience in the development of algorithms leveraging R, Python, SQL, or NoSQL
Experience with Distributed data or computing tools, including MapReduce, Hadoop, Hive, EMR, Spark, Gurobi, or MySQL
Experience with visualization packages, including Plotly, Seaborn, or ggplot2
Qualifications
5+ years of experience in applied data science or ML roles, including using Python and NLP and LLM implementation
5+ years of experience with data exploration, data cleaning, data analysis, data visualization, or data mining
Experience with production-level systems, data lake environments, and streaming data, including Kafka
Experience implementing end-to-end ML workflows from data prep to deployment and evaluation
Ability to quickly learn infrastructure or systems concepts, including how pipelines interface with data lakes
Ability to design, implement, and iterate on ML models for document classification, extraction, summarization, and search
Ability to take ownership of data science workflows that interact with a production system streaming millions of documents per week
TS/SCI clearance
Bachelor's degree
Desired Qualifications:
Experience in collaborating with MLOps and infrastructure engineers to ensure robust model deployment, monitoring, and retraining pipelines
Experience supporting platform components such as documents indexing or search, GPU workloads, and distributed storage, including Cloudera
Experience in the development of algorithms leveraging R, Python, SQL, or NoSQL
Experience with Distributed data or computing tools, including MapReduce, Hadoop, Hive, EMR, Spark, Gurobi, or MySQL
Experience with visualization packages, including Plotly, Seaborn, or ggplot2
group id: 10349707