Home/Candidates/From Data Scientist to ML Engineer: When SQL Stops Being the Point

CareerNov 2025 - Feb 2026

From Data Scientist to ML Engineer: When SQL Stops Being the Point

From notebooks to production systems

SQL appears in 85% of data scientist roles and 19% of ML engineer postings. The transition is fundamentally about moving from the analytical layer to the systems layer.

1.5x

Volume ratio

more MLE openings than DS

$193k-$255k

Salary premium

vs $167k-$236k for DS

27%

LLM skills

of MLE roles mention LLMs

SQL appears in 85% of data scientist job descriptions and 19% of ML engineer postings. That single number tells you more about this transition than any overlap analysis. Data science lives in the analytical layer: querying data, testing hypotheses, building models that generate insight. ML engineering lives in the systems layer: deploying models into production, optimising inference latency, keeping things running reliably at scale. The work that defines each role barely overlaps.

The tools partially overlap. Python, PyTorch, TensorFlow, machine learning, AWS, Spark: seven of the fifteen most in-demand skills for ML engineers also appear in data scientist job descriptions. But the depth changes dramatically. PyTorch appears in 13% of data scientist roles and 44% of ML engineer roles. TensorFlow goes from 13% to 31%. AWS jumps from 13% to 26%. You may recognise the tools, but ML engineering expects a level of fluency that most data science positions don't require.

Then there are the skills data science barely touches. C++ shows up in 18% of ML engineer roles and just 2% for data scientists. Kubernetes sits at 14% versus 4%. LLMs appear in 27% of ML engineer postings versus 7% for data scientists. RAG (retrieval-augmented generation) in 13% versus under 1%. These are production infrastructure and emerging AI system skills that sit well outside the typical data science workflow.

Skills you already have vs. skills you need

Use two distinct colour groups. "Overlap" skills at top, visual separator, then "Bridging" skills below. Note PyTorch and TensorFlow are "overlap" in that both roles use them, but the frequency gap is large.

The LLM factor

The single biggest new skill domain is large language models. LLMs appear in 27% of ML engineer postings, RAG in 13%, and prompt engineering in 8%. A year or two ago, ML engineering job descriptions were dominated by traditional machine learning and deep learning. Now LLM-related skills collectively appear across a substantial share of openings, reshaping what the role looks like in practice.

For a data scientist, this creates both a challenge and an opportunity. LLMs are new enough that most people are learning them in real time, regardless of their starting point. The ML engineer who's been deploying recommendation models for five years doesn't have a huge head start on LLM architecture over the data scientist who's been fine-tuning classifiers. The field is moving fast and practical experience with these tools counts for more than formal credentials.

What the gap is really about

Look at the bridging skills and a theme emerges. C++ (18% of ML engineer roles), Kubernetes (14%), Docker (11%), MLOps (11%), CI/CD (6%). These are production systems skills. The gap between data science and ML engineering is fundamentally about taking models from notebooks into production: packaging them for deployment, scaling them to handle real traffic, monitoring their performance, and recovering when they fail.

Data scientists typically hand off a trained model. ML engineers own the full lifecycle: training infrastructure, model serving, latency optimization, A/B testing in production (as opposed to offline experimentation), and the monitoring systems that catch model drift before users notice. The engineering side of the role is where most of the new learning happens.

Cloud infrastructure reinforces this. AWS appears in 26% of ML engineer roles, GCP in 19%, Azure in 13%. Data scientists see these cloud platforms at 13%, 7%, and 5% respectively. The expectation is deeper: not just running notebooks in the cloud, but architecting the compute, storage, and serving layers that ML systems depend on.

What falls away

Just as revealing is what drops in importance. SQL goes from 85% of data scientist roles to 19% of ML engineer roles. R drops from 34% to 1%. Causal inference, one of the most distinctive data science skills at 24%, appears in less than 1% of ML engineering postings. Statistics drops from 16% to 2%. A/B testing falls from 14% to 4%. The visualization tools (Tableau at 12%, Looker at 10%) are essentially absent from ML engineering job descriptions.

ML engineers still use SQL and understand statistics. But these skills have moved from the centre of the role to the periphery. The analytical layer of data science work, the insight generation and experimentation and stakeholder communication, is precisely what gets left behind in this transition, replaced by systems thinking, performance optimization, and infrastructure management.

The seniority picture

ML engineering roles skew heavily toward the senior end. Staff and principal level roles account for 26% of ML engineer openings, compared to 17% for data scientists. Senior roles make up 47% of ML engineer postings.

Seniority distribution comparison

Horizontal stacked bars. Colour gradient from light (junior) to dark (director+). The staff/principal difference is the visual takeaway.

The high share of staff and principal roles signals what companies value in ML engineers: deep technical expertise accumulated over years of production experience. If you're a mid-level or senior data scientist, the good news is that your modelling experience is a genuine asset. The ML engineering roles at senior and above expect someone who understands machine learning deeply and can bring that intuition to production systems. Your data science background provides the ML intuition that complements the engineering skills you'll build.

Volume, industry, and compensation

The market is hiring ML engineers at roughly 1.5x the volume of data scientists across the openings we track: 2,950 versus 1,940. The industry mix reflects where production ML matters most. Mobility companies (autonomous vehicles, robotics, logistics) account for 18% of ML engineer roles, making it the largest single sector. AI/ML companies follow at 15%, then consumer (12%) and fintech (12%). Compare this with data science, where fintech dominates at 22% and mobility sits at 7%. If you're drawn to the companies pushing the boundaries of applied ML, ML engineering is where those roles concentrate. Startup representation is also higher at 21% versus 14%, reflecting the infrastructure-heavy hiring needs of early-stage AI companies.

In the US market, ML engineer roles range from around $193k to $255k across the postings with published compensation. Data scientist roles sit at $167k to $236k. Both roles command strong compensation, but ML engineering carries a consistent premium of roughly $20k to $25k across the salary band.

The salary comparison

Range bars showing min-to-max with midpoint marked. Label the ranges. Include n in a subtle annotation. Use USD formatting. Ordered bottom to top: DS, MLE.

Practical next steps

If you're making this move, the data points to a clear sequencing based on where the largest skill gaps sit.

Go deep on PyTorch. It's the largest single skill gap in the data, jumping from 13% of data scientist roles to 44% for ML engineers. Most data scientists have used PyTorch for training models, but ML engineering requires fluency: custom training loops, distributed training, model optimization, and export for serving. If you've been using Scikit-Learn or high-level APIs, this is where the investment matters most.

Learn LLM infrastructure. LLMs, RAG, and the surrounding ecosystem represent the fastest-growing part of the ML engineering skill set. Build a project that involves fine-tuning a model, implementing retrieval-augmented generation, and deploying it behind an API. The hands-on experience with token management, latency budgets, evaluation pipelines, and failure modes is what separates candidates who understand LLMs conceptually from those who've built with them.

Pick up production tooling. Kubernetes (14% of ML engineer roles), Docker (11%), MLOps (11%), and CI/CD (6%) form the infrastructure layer that data scientists rarely touch. You don't need to become a DevOps specialist, but you need to be comfortable containerizing models, deploying them to cloud infrastructure, and setting up monitoring for drift and performance. The goal is operational ownership of your models beyond just training them.

Consider C++ if your target is performance-critical ML. At 18% of ML engineer roles, C++ signals a segment of the market focused on low-latency inference, embedded systems, or core ML framework development. This is the steepest learning curve in the transition and not required for every ML engineering role, but it's worth noting for anyone targeting robotics, autonomous vehicles, or real-time systems.

You already have Python fluency, ML fundamentals, and experience working with real data at scale. The transition is about extending those foundations into production systems: deeper framework expertise, infrastructure skills, and the new LLM domain that's reshaping ML engineering as a discipline. The market is hiring ML engineers at roughly 1.5x the volume of data scientists, the salary premium is consistent, and the work sits closer to the products that companies are building their futures around.

Methodology: This analysis draws on approximately 4,900 data and ML job postings collected between November 2025 and February 2026 from company career pages. Postings span London, New York, San Francisco, Denver, and Singapore. Skills were extracted from full job descriptions using an LLM classifier. Salary data is limited to US markets where disclosure is more common. Recruitment agency listings and out-of-scope roles were excluded. For skills analysis, the data scientist sample is 430 postings and the ML engineer sample is 858 postings. Salary sample sizes: ML engineer n=306, data scientist n=163.

Want interactive dashboards and the full dataset?