Lead Data Intelligence Machine Learning Engineer
Job Overview
-
Date PostedApril 16, 2026
-
Location
-
Expiration dateJune 18, 2026
Job Description
411_3484710
About the Role
We are looking for a specialized Lead Data Intelligence Machine Learning Engineer to design and implement in‑house tools that automate our data labelling pipelines. Your primary goal will be to reduce our reliance on manual annotation by leveraging techniques such as Active Learning, Weak Supervision, and Synthetic Data Generation. You will bridge the gap between raw data collection and model‑ready datasets, ensuring high‑quality labels at scale.
Key Responsibilities
- Architect Labelling Pipelines: Design and deploy end‑to‑end automated labelling systems using frameworks such as Snorkel, Cleanlab, or custom active learning loops.
- Develop Human‑in‑the‑Loop (HITL) Systems: Build interfaces and workflows where models pre‑label data and humans only intervene on high‑uncertainty samples.
- Quality Assurance & Denoising: Implement algorithmic checks to identify and correct mislabelled or noisy data within existing datasets.
- Tooling & Integration: Collaborate with software engineers to integrate labelling tools with our existing data lakes and ML training infrastructure.
- Model Optimization: Fine‑tune teacher models to generate high‑quality pseudo‑labels for student models.
- Set up and maintain robust data preparation infrastructure – optimizing for data quality, speed, and seamless integration with downstream MLOps pipelines.
- Perform data visualization and in‑depth analysis using advanced data and feature engineering techniques. You’ll help transform raw data into actionable insight, supporting both research and deployment.
- Work closely with Data Scientists, Software Engineers, and Product teams to ensure high data quality and usability across products and projects.
Qualifications
- At least 8+ years of professional experience in Machine Learning engineering, specifically focused on data‑centric AI or computer vision/NLP pipelines.
- Proficiency in Python and mastery of the ML stack (PyTorch or TensorFlow, NumPy, Pandas, Scikit‑learn).
- Experience with Weak Supervision (labelling functions) or Active Learning strategies (uncertainty sampling, diversity sampling).
- Data engineering experience with SQL and NoSQL databases, managing large‑scale unstructured data (images, text, or audio).
- Familiarity with cloud labelling services such as AWS SageMaker Ground Truth, GCP Vertex AI, or Azure ML.
- Experience with data version control tools such as DVC or similar.
- Hands‑on expertise building auto‑labelling solutions or working with large‑scale data annotation workflows.
- Advanced skills in Python (and/or other relevant languages) and experience with key ML/data science libraries (TensorFlow, PyTorch, scikit‑learn, pandas).
- Experience designing, deploying, and maintaining scalable data pipelines, including data cleansing, transformation, and storage (cloud, on‑prem, or hybrid).
- Strong background in feature engineering, data analysis, and data visualization – comfortable using tools like Jupyter, Tableau, or Power BI.
- Excellent communicator who documents solutions clearly and collaborates effortlessly across technical and non‑technical teams.
- Ability to balance speed and quality, stay curious about new developments, and deliver results in a fast‑moving environment.
- Bachelor’s or Master’s degree in Computer Science, Engineering, Mathematics, Data Science, or a related field.
Dyson is an equal opportunity employer. We know that great minds don’t think alike, and it takes all kinds of minds to make our technology so unique. We welcome applications from all backgrounds and employment decisions are made without regard to race, colour, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or another dimension of diversity.
#J-18808-Ljbffr
2026-04-16 07:14:49