Senior ML Infrastructure Engineer
Quince
Software Engineering, Other Engineering, Data Science
United States · Remote
Posted on Thursday, August 29, 2024
OUR STORY
Quince was started to challenge the existing idea that nice things should cost a lot. Our mission was simple: create an item of equal or greater quality than the leading luxury brands and sell them at a much lower price.
OUR VALUES
EVERYONE SHOULD BE ABLE TO AFFORD NICE THINGS. Quality shouldn’t be a luxury. We’re proud of our mission to bring the world’s highest quality goods to people at affordable prices.
QUALITY IS MORE THAN MATERIALS. True quality is a combination of premium materials and high production standards.
WE FOCUS ON THE ESSENTIALS. From the perfect crewneck sweater to hotel quality sheets, we're all about high quality essentials that bring enjoyment to daily life.
WE’RE INNOVATING TO MAKE UNREAL PRICES A REALITY. Our uniquely developed factory-direct model lets us offer exceptionally high quality goods for much lower prices than our competitors.
ALWAYS A BETTER DEAL. We believe in real price transparency, for both our customers and factory partners. This way, everyone gets a better deal.
FAIR FACTORIES. We are committed to working with factories that meet the global standards for workplace safety and wage fairness.
OUR TEAM AND SUCCESS
Quince is a retail and technology company co-founded by a team that has extensive experience in retail, technology and building early stage companies. You’ll work with a team of world-class talent from Stanford GSB, Wish.com, D.E. Shaw, Stitch Fix, Urban Outfitters, Wayfair, McKinsey, Nike etc.
THE IDEAL CANDIDATE
We are seeking passionate individuals eager to revolutionize the way people purchase essential goods by leveraging cutting-edge ML and AI solutions. Our centralized data science team is dedicated to optimizing and automating decision-making processes while delivering valuable, actionable business insights. As an ML Infrastructure Engineer at Quince, you will play a critical role in shaping our ML development ecosystem. You will build and own the foundational ML development processes, operational pipelines, and production infrastructure necessary to support a scalable, efficient, and impactful ML practice. Your contributions will directly enhance our ability to drive meaningful business outcomes and innovation.
RESPONSIBILITIES
- Design, Build, and Maintain ML Pipelines: Develop and optimize end-to-end machine learning pipelines, including data ingestion, model training, validation, deployment, and monitoring.
- Implement Continuous Integration/Continuous Deployment (CI/CD) for ML Models: Establish robust CI/CD processes to automate the testing, deployment, and monitoring of machine learning models in production environments.
- Build and Own Production Infrastructure for Serving ML Models: Design, deploy, and maintain the production infrastructure necessary for real-time and batch serving of machine learning models, ensuring high availability, scalability, and reliability.
- Build and Own the Feature Store: Design, implement, and manage the feature store to ensure efficient and scalable storage, retrieval, and versioning of features used in machine learning models, enabling consistent and reusable feature engineering across teams.
- Collaborate with Data Scientists and Engineers: Work closely with data scientists, data engineers, and software engineers to ensure seamless integration of ML models into production systems, aligning models with business goals.
- Monitor and Optimize Model Performance: Implement monitoring solutions to track the performance of ML models in production, identifying and addressing any issues such as data drift, model degradation, or system bottlenecks.
- Ensure Scalability and Reliability: Design and implement scalable and reliable ML infrastructure, leveraging cloud platforms, containerization, and orchestration tools like Kubernetes and Docker.
- Automate Data and Model Management: Develop automated solutions for version control, model registry, and experiment tracking to manage the lifecycle of ML models efficiently.
- Optimize Resource Utilization: Manage and optimize the use of computational resources, such as GPUs and cloud instances, to balance performance with cost-effectiveness.
- Conduct Root Cause Analysis and Troubleshooting: Diagnose and resolve issues in ML pipelines, including debugging data, code, and model performance problems.
- Document Processes and Systems: Create and maintain comprehensive documentation of ML pipelines, deployment processes, and operational workflows to ensure knowledge sharing and continuity.
DESIRED SKILLS:
- Bachelor degree in computer science, engineering or related field
- 5+ years of experience in ML Infrastructure or ML engineering.
- Hands-on and expertise experience in: building and maintaining ML pipelines, building and managing scalable ML production infrastructure, and AWS or other major cloud services.
- Strong knowledge of CI/CD practices for ML models.
- Familiarity with DevOps principles and tools.
- Familiarity with TensorFlow, PyTorch, or similar frameworks.
- Proficient in Python and Java (or Scala).
- Excellent communication skills.
- Move fast, be a team player, and kind.