Senior NLP Data Engineer

Cambridge, MA, USA

Up to $180,000/ year

Data Engineer DevOps Engineer Machine Learning Engineer

Actively hiring

Senior NLP Data Engineer

GSK

Cambridge, MA, USA

Up to $180,000/ year

Data Engineer DevOps Engineer Machine Learning Engineer

GSK

Actively hiring

hackajob is partnering with GSK to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

The Onyx Research Data Platform organization represents a major investment by GSK R&D and Digital & Tech, designed to deliver a step change in our ability to leverage data, knowledge, and prediction to find new medicines. We are a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward:

Building a next-generation data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity, and reducing time spent on “data mechanics”
Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent
Aggressively engineering our data at scale to unlock the value of our combined data assets and predictions in real-time

Data Engineering is responsible for the design, delivery, support, and maintenance of industrialised automated end to end data services and pipelines. They apply standardised data models and mapping to ensure data is accessible for end users in end-to-end user tools through use of APIs. They define and embed best practices and ensure compliance with Quality Management practices and alignment to automated data governance. They also acquire and process internal and external, structure and unstructured data in line with Product requirements.

A Senior NLP Data Engineer is a leading technical contributor who can consistently take a poorly defined business or technical problem, work it to a well-defined data problem / specification, and execute on it at a high level. They have a strong focus on metrics, both for the impact of their work and for its inner workings / operations. They are a model for the team on best practice for software development in general (and data engineering in particular), including code quality, documentation, DevOps practices, and testing, and consistently mentor junior members of the team. They ensure robustness of our services and serve as an escalation point in the operation of existing services, pipelines, and workflows

Key Responsibilities :

Designs, builds, and operates data tools, services, workflows, etc that deliver high value through the solution to high-impact AI-driven products by leveraging modern data engineering tools (e.g. Spark, Kafka, Storm, …) and orchestration tools (e.g. Google Workflow, AirFlow Composer)
Partners with AIML and knowledge graph platform team to build, test, and deploy NLP and GenAI pipelines, systems and solutions
Apply graph-based data modelling techniques for efficient organization, integration, and data retrieval to ensure system flexibility and maintainability
Produces well-engineered software, including appropriate automated test suites, technical documentation, and operational strategy
Diverse problem solver who surfaces opportunities to reuse modular code and develop microservices to drive efficiencies
Provides input into the roadmaps of upstream teams (e.g. Data Platforms, DataOps, DevOps) to help improve the overall program of work
Ensures consistent application of platform abstractions to ensure quality and consistency with respect to logging and lineage
Fully versed in coding best practices and ways of working, and participates in code reviews and partnering to improve the team’s standards
Adheres to QMS framework and CI/CD best practices and helps to guide improvements to them that improve ways of working
Provides leadership to team members to help others get the job done right

Why you?

Basic Qualifications:

We are looking for professionals with these required skills to achieve our goals:

Bachelors’ degree in Data Engineering, Computer Science, Software Engineering, or related discipline
5+ years of data engineering experience in industry
Knowledge of NLP and GenAI techniques and experience of processing unstructured data, using vector stores, and approximate retrieval
Experience with building end-to-end systems based on machine learning or deep learning methods
Experience overcoming high volume, high compute challenges
Familiarity with orchestrating tooling
Cloud experience (e.g., AWS, Google Cloud, Azure)
Experience in automated testing and design
Experience with DevOps-forward ways of working
Deep knowledge and use of at least one common programming language: e.g., Python, Scala, Java
Deep experience with common big data tools (e.g., Spark, Kafka, Storm, …)
Proven experience with machine learning algorithms and NLP frameworks like Pytorch, Tensorflow, Spacy, etc.
Application experience of CI/CD implementations using git and a common CI/CD stack (e.g., Jenkins, CircleCI, GitLab, Azure DevOps) • Experience with agile software development environments using tools like Jira and Confluence
Experience with Infrastructure as a Code and automation tools (i.e. Terraform)

Preferred Qualifications:

If you have the following characteristics, it would be a plus:

Master's or PhD in Data Engineering, Computer Science, Software Engineering, or related discipline
Good understanding of ontologies and semantic harmonization of data across sources
Experience implement Generative AI solutions a huge plus
Proven track record of working with knowledge graphs and graph databases, and in general good understanding of database concepts
Proficiency in semantic web technologies (SPARQL, RDF, OWL) and harmonization of data
Experience working with complex biomedical datasets, including genomics, proteomics, and high-throughput screening

hackajob is partnering with GSK to fill this position. Create a profile to be automatically considered for this role—and others that match your experience.

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Find out more

Ready to reach your potential?

Find out more

Platform

Solutions

Resources

Senior NLP Data Engineer

Cambridge, MA, USA

Up to $180,000/ year

Actively hiring

Senior NLP Data Engineer

GSK

Cambridge, MA, USA

Up to $180,000/ year

GSK

Actively hiring

Why you?

Preferred Qualifications:

Upskill

Ready to reach your potential?