Senior Data Engineer
We are expanding our efforts to shorten the time to diagnosis for rare genetic diseases by applying our disease screening algorithms to de-identified patient records on a world-wide scale. Data will be captured from health records at individual hospital systems and from health record aggregators. Data sets will include structured and semi-structured / unstructured data. The hire will be responsible for creating and optimizing our data pipeline architecture and re-designing screening algorithms to work with the new data architecture. They must be senior enough to recommend and implement the best solutions for the project and tech savvy enough to wrangle data and code algorithms that will involve some natural language processing of unstructured data. The right candidate will be excited by the opportunity to design the company’s data architecture from the ground up and influence the direction of a rapidly growing tech startup.
Responsibilities for Senior Data Engineer
- Create and maintain optimal data pipeline architecture,
- Assemble large, complex data sets that meet functional / non-functional business requirements.
- Identify, design, and implement internal process improvements: automating manual processes, optimizing data delivery, re-designing infrastructure for greater scalability, etc.
- Build the infrastructure required for optimal extraction, transformation, and loading of data from a wide variety of data sources using SQL and AWS ‘big data’ technologies.
- Build analytics tools that utilize the data pipeline to provide actionable insights into customer acquisition, operational efficiency and other key business performance metrics.
- Work with stakeholders including the Executive, Product, Data and Design teams to assist with data-related technical issues and support their data infrastructure needs.
- Keep our data separated and secure across national boundaries through multiple data centers and AWS regions.
- Create data tools for analytics and data scientist team members that assist them in building and optimizing our product into an innovative industry leader.
- Work with data and analytics experts to strive for greater functionality in our data systems.
Required Qualifications / Experience:
- BS in Computer Science or a related technical field or equivalent experience with at least 5+ years experience (3+ years with a MS/MIS Degree).
- 3 years of experience with big data tools: Hadoop, Spark, Kafka, NiFi.
- 3 years of experience with object-oriented/object function scripting languages: Python (preferred) and/or Java.
- 3 years of experience with and managing data across relational SQL and NoSQL databases like MySQL, Postgres, Cassandra, HDFS, Redis, and Elasticsearch.
- 3 years of experience working in a Linux environment.
- 2 years of experience working with and designing REST APIs.
- Experience in designing/developing platform components like caching, messaging, event processing, automation, transformation and tooling frameworks.
- Experience transforming data in various formats, including JSON, XML, CSV, and zipped files.
- Experience developing flexible ontologies to fit data from multiple sources and implementing the ontology in the form of database mappings / schemas.
- Strong interpersonal and communication skills necessary to work effectively with customers and other team members.
- Leadership or mentorship experience
- Owned and led the delivery of major features and components
- You consider ‘big picture’ perspectives and can successfully balance business goals and technical constraints
Preferred Qualifications / Experience:
- Data engineering experience in the Healthcare Industry. Experience extracting data from Cerner a plus.
- Experience with Microservices architecture components, including Docker and Kubernetes.
- Experience developing microservices to fit data cleansing, transformation and enrichment needs.
- Experience with AWS cloud services: EC2, S3, EMR, RDS, Redshift, Athena and/or Glue.
- Experience with Jira, Confluence and extensive experience with Agile methodologies.
- Knowledge about security and best practices.
- Experience developing flexible data ingest and enrichment pipelines, to easily accommodate new and existing data sources.
- Experience with software configuration management tools such as Git/Gitlab, Salt, Confluence, etc.
- Experience with continuous integration and deployment (CI/CD) pipelines and their enabling tools such as Jenkins, Nexus, etc.
- Detailed oriented/self-motivated with the ability to learn and deploy new technology quickly.