Environmental Data Engineer / Machine Learning Engineer at Grassroots Carbon

San Antonio, TX 78205, USA -

Full Time

Start Date

Immediate

Expiry Date

04 Dec, 25

Salary

0.0

Posted On

04 Sep, 25

Experience

3 year(s) or above

Remote Job

Yes

Telecommute

Yes

Sponsor Visa

Skills

Soil Mapping, Infrastructure, Sustainability, Soil Science, Code, Scikit Learn, Data Processing, Sql, Datasets, Numpy, Environmental Science, Data Engineering, Containerization, Machine Learning, Pandas, Data Science, Python, Agriculture, Computer Science

Industry

Information Technology/IT

Description

Position: Environmental Data Engineer / Machine Learning Engineer
Position Type: Full-Time
Reports to: Jay Weeks, Director of Data & Soil Science

REQUIRED QUALIFICATIONS

Bachelor’s or Master’s degree in Computer Science, Data Science, Machine Learning, Environmental Science, or a related field.
3+ years of experience in data engineering, with a proven track record of productionizing prototype code in a startup or fast-paced environment.
Strong proficiency in programming languages such as Python (with libraries like Pandas, NumPy, Scikit-learn) and SQL.
Experience with ML frameworks (e.g., TensorFlow, PyTorch, XGBoost) and timeseries analysis (e.g., Prophet, LSTM networks, PINNs).
Hands-on experience with cloud platforms, containerization, and CI/CD pipelines.
Familiarity with geospatial data processing and environmental modeling concepts, particularly in soil science or agriculture.
Excellent problem-solving skills, with the ability to handle ambiguous requirements and deliver under tight deadlines.

PREFERRED SKILLS

Experience in digital soil mapping, carbon stock prediction models, advanced statistics, and/or Bayesian model calibration / inference
Knowledge of big data technologies (e.g., Spark, Kafka) for handling large-scale timeseries datasets.
Background in DevOps practices and infrastructure as code (e.g., Terraform).
Passion for sustainability and environmental impact.

How To Apply:

Incase you would like to apply to this job directly from the source, please click here

Responsibilities

ROLE OVERVIEW

As a Data Engineer / Machine Learning Engineer, you will play a pivotal role in bridging the gap between experimental prototypes and scalable, production-ready systems. You’ll spend approximately 50% of your time optimizing and deploying data pipelines to support long-term business needs, and the other 50% developing advanced machine learning models for environmental mapping and predicting changes in environmental metrics (e.g., soil organic carbon stocks) using time series data. This is a hands-on position in a fast-paced startup environment, where you’ll collaborate with cross-functional teams to deliver impactful, reliable solutions.

KEY RESPONSIBILITIES

Production Pipeline Development (50% of time):
Evaluate and refactor prototype code from R&D phases into efficient, maintainable production pipelines.

Design, implement, and maintain scalable data ingestion, processing, and ETL (Extract, Transform, Load) workflows using cloud-based infrastructure (e.g., AWS, GCP, or Azure).
Ensure pipelines are robust, fault-tolerant, and optimized for performance, security, and cost-efficiency.
Integrate monitoring, logging, and alerting systems to support ongoing operations and quick issue resolution.
Collaborate with software engineers, scientists, data scientists, and other stakeholders to align pipelines with business objectives, enabling long-term scalability and reliability.
Model Development (50% of time):
Build, train, and deploy machine learning models for environmental quantification (e.g., digital soil mapping, predicting soil organic carbon stock changes, etc.).

Work with time series data from various sources (e.g., satellite imagery, sensor data, historical records) to develop predictive models using techniques like time-series forecasting, geospatial analysis, and deep learning.
Perform feature engineering, model evaluation, hyperparameter tuning, and validation to ensure accuracy and generalizability.
Integrate ML models into production environments, including API development for real-time predictions and batch processing.
Stay abreast of advancements in ML for geospatial and environmental applications, experimenting with new algorithms and tools to improve model performance.
General Duties:
Conduct code reviews, write documentation, and mentor junior team members on best practices in data engineering and ML.

Troubleshoot and debug issues in both data pipelines and ML systems.