Save time and effort sourcing top tech talent

Data Engineer I

Cambridge, MA, USA
Data Engineer Python Developer Full Stack Python Developer
GSK
Actively hiring

Sign up for the chance to get matched to this role, and similar opportunities.

 

Purpose of Onyx

 

The Onyx Research Data Tech organization is GSK’s Research data ecosystem which has the capability to bring together, analyze, and power the exploration of data at scale. We partner with scientists across GSK to define and understand their challenges and develop tailored solutions that meet their needs. The goal is to ensure scientists have the right data and insights when they need it to give them a better starting point for and accelerate medical discovery. Ultimately, this helps us get ahead of disease in more predictive and powerful ways.

Onyx is a full-stack shop consisting of product and portfolio leadership, data engineering, infrastructure and DevOps, data / metadata / knowledge platforms, and AI/ML and analysis platforms, all geared toward:​

  • Building a next-generation, metadata- and automation-driven data experience for GSK’s scientists, engineers, and decision-makers, increasing productivity and reducing time spent on “data mechanics”​
  • Providing best-in-class AI/ML and data analysis environments to accelerate our predictive capabilities and attract top-tier talent​
  • Aggressively engineering our data at scale, as one unified asset, to unlock the value of our unique collection of data and predictions in real-time​

 

Data Engineering is responsible for the design, delivery, support, and maintenance of industrialized automated end to end data services and pipelines. They apply standardized data models and mapping to ensure data is accessible for end users in end-to-end user tools through use of APIs. They define and embed best practices and ensure compliance with Quality Management practices and alignment to automated data governance. They also acquire and process internal and external, structure and unstructured data in line with Product requirements.

 

A Data Engineer I is a technical contributor who can take a well-defined specification for a function, pipeline, service, or other sort of component, and a technical approach to building it, and deliver it at a high level.  They are aware of, and adhere to, best practice for software development in general (and data engineering in particular), including code quality, documentation, DevOps practices, and testing. They ensure robustness of our services and serve as an escalation point in the operation of existing services, pipelines, and workflows.

 

A Data Engineer I should have awareness of the most common tools (languages, libraries, etc) in the data space, such as Spark, Kafka, Storm, etc.  They should be constantly seeking feedback and guidance to further develop their technical skills and expertise, and should take feedback well from all sources in the name of development.

 

 

Key responsibilities for the Data Engineer I include:

 

  • Builds modular code / libraries / services / etc using modern data engineering tools (Python/Spark, Kafka, Storm, …) and orchestration tools (e.g. Google Workflow, Airflow Composer)
  • Produces well-engineered software, including appropriate automated test suites and technical documentation
  • Ensure consistent application of platform abstractions to ensure quality and consistency with respect to logging and lineage
  • Adhere to QMS framework and CI/CD best practices
  • Provide L3 support to existing tools / pipelines / services

 

Why you?

 

Basic Qualifications:

 

We are looking for professionals with these required skills to achieve our goals:

  • Bachelors degree +2 years of data engineering experience.
  • Cloud experience (e.g., AWS, Google Cloud, Azure, Kubernetes)
  • Experience in automated testing and design 
  • Experience with DevOps-forward ways of working 
  • Experience with at least one common programming language: e.g., Python, Scala, Java etc.
  • Experience with data modelling, database concepts and SQL

 

Preferred Qualifications:

 

If you have the following characteristics, it would be a plus:

  • Familiarity with orchestrating tooling
  • Knowledge and use toolchains for documentation, testing, and operations / observability
  • Cloud experience (e.g., AWS, Google Cloud, Azure, Kubernetes)
  • Application experience of CI/CD implementations using git and a common CI/CD stack (e.g. Jenkins, CircleCI, GitLab, Azure DevOps)
  • Exposure to common tools and techniques for data engineering (e.g. Spark, Kafka, Storm, …)
  • Knowledge of data modelling, database concepts and SQL

Sign up for the chance to get matched to this role, and similar opportunities.

Upskill

Level up the hackajob way. Verify your skills, learn brand new ones and test your ability with Pathways, our learning and development platform.

Ready to reach your potential?