hero

Jobs at Alumni Ventures Portfolio Companies

388
companies
1,995
Jobs

Site Reliability Engineer - Observability

Emerald Cloud Lab

Emerald Cloud Lab

Software Engineering
Austin, TX, USA
Posted on Oct 22, 2024

The Emerald Cloud Laboratory (ECL) enables life scientists to move out of the lab, and to conduct research entirely from a computer. Stepping away from manual completion of experiments at the bench, scientists on the ECL leverage the remote, automated execution of all standard biology and chemistry experiments in Emerald’s industrial lab facilities, working within a software platform for all stages of research workflows, from experimental design to data analysis. Our system empowers scientists at Big Pharma companies, startups, and academic laboratories by allowing them to run wet lab experiments from anywhere in the world without ever stepping foot into the lab.

The Team:

Site Reliability Engineering at ECL is responsible for the security, reliability, and capacity of the software and virtual machines used to develop and run both our application and our laboratory, as well as development and improvement of internal specialty applications and integration. You will be joining a tight-knit and interdisciplinary team. Our methodology relies heavily on automation, infrastructure-as-code, and continuous integration and deployment.

Our Responsibilities:

  • Design and develop processes and tools to automate and audit all aspects of development and production environments and databases for the ECL cloud application backend
  • Continuously improve our set of in-house Go and Python facilities for automating container builds and deployments, and our bespoke Wolfram Language-based automated unit testing environment
  • Develop applications related to laboratory systemsAutomated provision and deployment of Wolfram Enterprise Private Cloud instances for integration with our customer-facing Command Center applicationDevelopment of domain-specific language infrastructure in support of ECL's Symbolic Lab Language
  • Coordinate with and advise other teams to plan and execute releases of application upgrades, new services, and migrations to new architectures or infrastructures, without degradation or interruption of service
  • Efficiently and dynamically prioritize ad hoc requests alongside roadmap initiatives
  • Coordinate with IT where premises and cloud infrastructure intersect. Evaluate and integrate open-source and commercial tools to serve the above purposes

Our Technology Stack:

  • Execution environment: Kubernetes on AWS EKS; AWS Lambda and Fargate
  • Languages: Python; Wolfram Language; Go; shell scripting
  • Database: AWS Aurora PostgresSQL
  • Other infrastructure: GitHub; DockerHub; Ubuntu, Debian, Alpine; Envoy+Contour; Terraform; AlertManager; PagerDuty; SendGrid; Auth0; Serverless
  • Observability Infrastructure: Prometheus, Grafana, OTEL, Honeycomb, AWS Cloudwatch
  • AWS services: EC2; EKS; RDS; ELB/ALB/NLB; IAM; S3; Certificate Manager; CloudWatch; Route 53; ElastiCache; RDS; SQS; VPC; premises-to-cloud VPN; security groups; CloudFront

Required Skills and Experience:

  • Coding in Python/Go: Proficient in developing and automating solutions to enhance infrastructure reliability and performance.
  • Observability Setup: Adept at implementing comprehensive observability solutions, including distributed tracing with OpenTelemetry (Otel), creating actionable dashboards using Grafana, and setting up effective monitoring and alerting systems. Experience with setting up front end observability is preferred.
  • SLI/SLO Metrics: Proven track record of setting up Service Level Indicators (SLIs), Service Level Objectives (SLOs), and other key performance metrics to ensure service reliability and performance.
  • DevOps Practices: Proficient in CI/CD tools, with experience in automating deployment pipelines and seamlessly deploying applications to Kubernetes from source control management (SCM).
  • Cloud Administration (AWS preferred): Skilled in cloud infrastructure management, with hands-on experience in AWS (EKS, Fargate, IAM, S3, VPC).
  • Cloud Networking & Security: Deep understanding of cloud networking and security concepts, including VPCs, VPNs, subnets, and security best practices.
  • Infrastructure Automation: Ability to automate Infrastructure provisioning using Terraform.

Ideal Candidate:

  • Extensive experience in building comprehensive observability solution for an end to end distributed system ( Microservices deployed in Kubernetes)
  • Proven track record of setting up Front End Observability solution for Web based and Desktop application

About ECL: https://www.emeraldcloudlab.comThe Emerald Cloud Laboratory (ECL) enables life scientists to move out of the lab, and to conduct research entirely from a computer. Stepping away from manual completion of experiments at the bench, scientists on the ECL leverage the remote, automated execution of all standard biology and chemistry experiments in Emerald’s industrial lab facilities, working within a software platform for all stages of research workflows, from experimental design to data analysis.


Optional but welcome: A link to your Github account or any projects you are proud of can be especially helpful. With project links, please include a short remark to help us get our bearings.

At Emerald Cloud Lab, we are committed to pioneering the future of scientific research by providing an innovative, cloud-based laboratory environment. We believe in the power of collaboration, diversity, and the continuous pursuit of knowledge to drive groundbreaking discoveries. If you are passionate about reshaping the landscape of scientific experimentation and eager to contribute to a culture of excellence and innovation, we invite you to join us.