Data Scientist (Python) / Business Analyst at Onyx Government Services,LLC
, , -
Full Time


Start Date

Immediate

Expiry Date

06 May, 26

Salary

0.0

Posted On

06 Feb, 26

Experience

5 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

No

Skills

Python, SQL, Git, Machine Learning, Data Analysis, ETL, OCR, NLP, Document Processing, Data Labeling, Model Training, FastAPI, Agile Development, Business Intelligence, Data Profiling, Collaboration

Industry

Software Development

Description
Company Description Onyx Government Services, LLC. is a Service-Disabled Veteran-Owned Small Business (SDVOSB), headquartered in Fairfax, Virginia. We specialize in data management, integration, and analysis solutions to provide decision-ready information to Command and Control (C2) and Decision Support Systems. We have demonstrated expertise in the field of Information Technology, database & COTS integration, and custom software development. Onyx pairs subject matter and functional experts with developers to provide high quality, tailored solutions. In support of our various efforts, we have developed the Onyx Data Management Toolkit, a combination of Agile Development principles, COTS Integration, and custom software, to deliver flexible, cost-effective solutions to a variety of Department of Defense, Intelligence Community, and Law Enforcement agencies. Job Description The AI Developer & Business Analyst will join our team as one of the Python developers for Enterprise Analytics project to solve business intelligence challenges. The position requires deep technical expertise in Python, Git, SQL, BI tools, ETL, and data profiling, combined with hands-on experience building AI pipelines from document ingestion through labeling and model deployment. The ideal candidate will have an active BI or NACLC Public Trust security clearance, be a confident leader experienced in performing day-to-day tasks, a critical thinker and self-starter who takes ownership of assignments. Previous consulting experience using an agile delivery methodology is a plus. This is six-month contract (1099) position with potential to become permanent (W2). Responsibilities: Design, build, and deploy end-to-end AI pipelines for content moderation including production ML models for compliance classification and active learning workflows enabling human-in-the-loop label review and validation. Engage with operational business stakeholders to identify key initiatives and execute solutions to business problems using data analysis and advanced analytics Extract text/tables from PDF documents using OCR (Tesseract, AWS Textract, Azure Form Recognizer) Create/build labeling pipelines for training datasets from document corpora Train custom ML models (NLP/OCR) for document classification, extraction, entity recognition Implement end-to-end pipeline: PDF → labeling → training → inference → API serving Version control all code, datasets, and models using Git (DVC for ML datasets preferred) Deploy models as REST APIs (FastAPI) for production document processing Work collaboratively to align business requirements with data and analytical solutions while empowering business to draw insights and analyze data while Identify areas of concern of as-is processes and business dependencies Coordinating and facilitating requirement gathering, playing an active role in the discovery, analysis, value validation, design, and testing phases. Ideate with functional analysts and business process owners. Requirements: Security Clearance: Must possess active BI or NACLC Public Trust security clearance Bachelor’s degree with 7 years preferred strong analytical, problem-solving, planning, organizational and project management skills Experience in a matrix structure and communicating with technical and non-technical people. Ability to develop and deliver presentations that simplify complex solutions/insights for non-technical audience a must Excellent innovation, interpersonal, and communication skills required. At least 5 years of work experience in analytical role Python 3+ with ML frameworks (PyTorch/TensorFlow, Transformers) Document processing: PyMuPDF, pdfplumber, OCR tools, layout parsing Data labeling/annotation experience (LabelStudio, Prodigy) Model training lifecycle: preprocessing, augmentation, validation, fine-tuning Git proficiency (branching, MLflow/DVC integration) SQL/Pandas for data handling SAS and Tableau development experience a plus Strong conceptual, analytical, and decision-making skills Ability to work as a collaborative team, mentoring and training the client on the various tools and techniques to build a complete Analytics solution Must be comfortable working in a fast-paced, flexible environment, and take the initiative to learn new tools quickly Desired: Extensive experience in Python development with AI/ML frameworks (PyTorch, scikit-learn, Transformers) Proven experience with document AI: PyMuPDF, pdfplumber, OCR processing Strong Git proficiency and production ML deployment (Docker, FastAPI) ETL pipeline development and SQL database integration Dashboard development: Streamlit, Gradio, or BI tools
Responsibilities
The role involves designing, building, and deploying end-to-end AI pipelines for content moderation and engaging with business stakeholders to solve data-related challenges. The candidate will also be responsible for extracting data from documents, training ML models, and deploying them as APIs.
Loading...