Bridge the gap between local machines and cloud notebooks
Data scientists use notebooks (locally and in cloud production environments) to conduct exploratory data analysis and implement machine learning models. In order to ensure quality, notebook code must be portable to traditional software environments and processes. This includes packaging, version control and testing for traceability, reproducibility and maintainability. The produced code and relevant metadata is often put into machine learning pipelines, so that different steps of the execution can be managed. Additionally, software engineers and data engineers create internal libraries and new pipeline steps that data scientists want to use from notebooks. Multiple ideas exist for how to best solve this (going from notebooks to machinel earning pipelines, and back(!)), but the community has yet to decide upon a preferred solution.
You will build upon the open source library Cowait for data pipelines to improve the process of going from projects on a local machine to cloud notebooks (jupyter or similar), and vice versa. You will explore, develop and evaluate different solutions to build upon Cowait in an effort to solve these problems.
Solving these problems will help with version control, traceability and maintainability when notebooks like Jupyter are used in data pipelines. It is up to you to investigate and evaluate different approaches, in collaboration with our CTO. This is an excellent opportunity to work on challenging problems with experienced engineers while making an important contribution to open source.
Cowait is developed and maintained by Backtick Technologies. Cowait is an open source python framework for creating containerized distributed applications with asynchronous python. Using Cowait makes it easy to start building data intensive pipelines that are robust, scalable and repeatable. By providing a docker-based cloud agnostic approach (Kubernetes), Cowait runs on your local machine, at any cloud provider or on-prem.
- Find Cowait on GitHub: https://github.com/backtick-se/cowait
- Check out the documentation: http://docs.cowait.io/
- Visit Cowait: https://cowait.io/
Who are you?
You have a strong background in computer science or similar fields. You are familiar with python, have a curious mind and are eager to learn. Ideally, you have a positive mindset around open source. Working and learning about the following technologies excites you (we do not expect you to have experience with all of these, but 3-4 is preferred):
- CI / CD
- Machine Learning
Who are we?
At Backtick, we’re a mix of software engineers, data engineers and data scientists with backgrounds from leading global tech companies. We’re a small consultancy company helping our customers go from data to production machine learning systems. We are a leading company in MLOps, and open sourced Cowait early 2020. We have an office in central Lund, and you are welcome to sit with us. If you are interested in learning more about state of the art machine learning pipelines while making an important contribution to an open source project, please reach out to:
Oskar Handmark, email@example.com