Senior Data Scientist at XOi Technologies

Nashville, TN 37203, USA -

Full Time

Start Date

Immediate

Expiry Date

25 Jul, 25

Salary

0.0

Posted On

25 Apr, 25

Experience

2 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Tesseract, Python, Bedrock, Computer Science, Data Science, Gemini, Nlp, Machine Learning, Ml, Scikit Learn, Data Extraction, Classification, Fine Tuning, Data Processing

Industry

Information Technology/IT

Description

XOi is revolutionizing the field service industry with the expansion of our on-demand equipment data enrichment and metadata retrieval capabilities. We are powering a world in which people and equipment are connected, decisions are transparent, and quality outcomes are predictable. Field service technicians across the nation utilize our suite of products daily to increase efficiency and performance on the job, streamline communication and information sharing, enhance their skills, and gain insights through innovative software and AI solutions.
We are looking for a Senior Data Scientist with experience in AI-driven document processing, OCR, and generative AI applications. This role requires hands-on expertise in designing, training, and deploying machine learning models, as well as operationalizing AI agents within production systems. You will be working closely with LLM frameworks, orchestration tools, and AWS infrastructure to build scalable, intelligent AI agents to assist XOi users executing a variety of workflows.

TECHNICAL REQUIREMENTS

Master’s or PhD in Computer Science, Data Science, Machine Learning, or a related technical field.
2+ years of professional experience in data science or machine learning with a focus on LLMs, OCR, and NLP applications.
Experience applying NLP and computer vision techniques to large-scale, enterprise document processing tasks, including data extraction, classification, and workflow automation.
Proficiency in Python and ML libraries such as Scikit-learn, TensorFlow, or PyTorch.
Strong hands-on experience with OCR frameworks (Tesseract, AWS Textract, or similar).
Direct experience working with LangChain, LangSmith, or other LLM orchestration tools and exposure to providing MCP-based data contexts for LLMs.
Skilled in deploying ML models via AWS SageMaker, Bedrock, or other cloud-native services.
Solid grasp of LLM prompting techniques, vector databases, agent-driven AI architectures, and RAG pipelines.
Familiarity with MLOps principles, reproducibility practices, model governance, and CI/CD for AI systems.
Experience integrating generative AI solutions (Gemini, OpenAI, Anthropic, Mistral) into document workflows is highly desirable.
Knowledge of model fine-tuning, transfer learning, and multi-modal data processing preferred.

Responsibilities

Data Analysis & Pattern Discovery: Analyze large, complex, and multi-modal datasets to uncover insights that drive AI model development and business strategy.
Model Development: Design, develop, and optimize state-of-the-art LLM-based and NLP models using deep learning frameworks like PyTorch and TensorFlow.
OCR & NLP Solutions: Build and implement OCR and NLP solutions to extract, process, and analyze textual information from various document types in real-world enterprise settings.
Machine Learning Pipelines: Develop and manage end-to-end machine learning pipelines for data ingestion, preprocessing, model training, evaluation, and deployment with CI/CD and MLOps best practices.
AI Agent Development: Implement AI agents using LangChain, LangSmith, MCP, and other tools to automate diagnosis, symptom-issue-resolution identification, summarization, and task optimization.
LLM Fine-Tuning & Deployment: Orchestrate and integrate large language models via AWS Bedrock, SageMaker, and other cloud-native services for providing intelligent assistance to users executing domain-specific workflows.
Experimentation & Optimization: Conduct experiments to fine-tune model parameters, evaluate different architectures, and optimize performance for specific use cases like document processing accuracy, entity recognition, and semantic understanding.
Prompt Engineering: Design prompt templates and implement prompt chaining for multi-step agents and leverage frameworks such as MCP in targeted retrieval-augmented generation (RAG) pipelines.
Third-Party & API Integration: Collaborate on integrating open-source models, third-party APIs, and proprietary tools into enterprise production systems.
Collaboration & Cross-Functional Work: Partner with software engineers, data engineers, and product managers to deploy AI models into scalable, reliable production applications.
Research & Continuous Learning: Stay at the forefront of advancements in LLMs, OCR, NLP, and AI, continuously evaluating new tools, frameworks, and techniques for adoption.
Mentorship: Provide technical leadership, guidance, and mentorship to junior data scientists and machine learning engineers within the team.