GenAI Data Engineer

临时
伦敦市，伦敦
发布于1天前
可协商英镑/年

GenAI Data Engineer

Our client, a leading global supplier for IT services, requires experienced GenAI Data Engineer to be based at their client’s offices in London or Edinburgh.

这是个混合型职位——您可在英国远程办公，每周前往伦敦或爱丁堡办公室工作两天。

这是一份为期6个月以上的临时合同，需尽快开始工作。

日费率：具有市场竞争力的费率

主要职责：

Design and maintain scalable data pipelines using PySpark, Python, and distributed computing frameworks to support high‑volume data processing.
Architect and optimise AWS-based data and AI infrastructure, ensuring secure, performant, and cost‑efficient ingestion, transformation, and storage.
Develop, finetune, benchmark, and evaluate GenAI/LLM models, including custom training and inference optimization.
Implement and maintain RAG pipelines, vector databases, and document-processing workflows for enterprise GenAI applications.
Build reusable frameworks for prompt management, evaluation, and GenAI operations.
Collaborate with cross-functional teams to integrate GenAI capabilities into production systems and ensure high-quality data, governance, and operational reliability

关键要求：

Strong experience with PySpark, distributed data processing, and largescale ETL/ELT pipelines.
Strong SQL expertise including star/snowflake schema design, indexing strategies, writing optimized queries, and implementing CDC, SCD Type 1/2/3 patterns for reliable data warehousing.
Advanced proficiency in Python for data engineering, automation, and ML/GenAI integration.
Hands on expertise with AWS services (S3, Glue, Lambda, EMR, Bedrock / custom model hosting).
Practical experience with GenAI/LLM model creation, finetuning, benchmarking, and evaluation.
Solid understanding of RAG architectures, embeddings, vector stores, and LLM evaluation methods.
Experience working with structured and unstructured datasets (documents, logs, text, images).
Familiarity with scalable data storage solutions (Delta Lake, Parquet, Redshift, DynamoDB).
Understanding model optimisation techniques (quantisation, distillation, inference optimisation).
Strong capability to debug, tune, and optimise distributed systems and AI pipelines.

由于申请数量庞大，我们很遗憾无法逐一回复每位申请人。

若您在提交申请后7天内未收到我们的回复，请理解本次申请未能成功。

请务必关注我们的网站 https://projectrecruit.com/jobs/ 以获取未来职位信息。