GenAI Data Engineer

GenAI Data Engineer

Our client, a leading global supplier for IT services, requires experienced GenAI Data Engineer to be based at their client’s offices in London or Edinburgh.

This is a hybrid role – you can work remotely in the UK and attend the London or Edinburgh office 2 days per week .

This is a 6+ month temporary contract to start asap

Day rate: Competitive Market rate

Key Responsibilities:

  • Design and maintain scalable data pipelines using PySpark, Python, and distributed computing frameworks to support high‑volume data processing.
  • Architect and optimise AWS-based data and AI infrastructure, ensuring secure, performant, and cost‑efficient ingestion, transformation, and storage.
  • Develop, finetune, benchmark, and evaluate GenAI/LLM models, including custom training and inference optimization.
  • Implement and maintain RAG pipelines, vector databases, and document-processing workflows for enterprise GenAI applications.
  • Build reusable frameworks for prompt management, evaluation, and GenAI operations.
  • Collaborate with cross-functional teams to integrate GenAI capabilities into production systems and ensure high-quality data, governance, and operational reliability

Key Requirements:

  • Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines.
  • Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing.
  • Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration.
  • Hands on expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting).
  • Practical experience with GenAI/LLM model creation, finetuning, benchmarking, and evaluation.
  • Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods.
  • Experience working with structured and unstructured datasets (documents, logs, text, images).
  • Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB).
  • Understanding model optimisation techniques (quantisation, distillation, inference optimisation).
  • Strong capability to debug, tune, and optimise distributed systems and AI pipelines.

Due to the volume of applications received, unfortunately we cannot respond to everyone

If you do not hear back from us within 7 days of sending your application, please assume that you have not been successful on this occasion.

Please do keep an eye on our website https://projectrecruit.com/jobs/ for future roles.

Upload your CV/resume or any other relevant file. Max. file size: 50 MB.

Project Global
Privacy Overview

This website uses cookies so that we can provide you with the best user experience possible. Cookie information is stored in your browser and performs functions such as recognising you when you return to our website and helping our team to understand which sections of the website you find most interesting and useful.