User access control for modern data platforms

Lund / Full Time / Lund, Sweden

About Backtick

Backtick Technologies provides consultancy services in software engineering, data engineering, data science, artificial intelligence and related fields. Today, Backtick employs 12 engineers and has customers in the enterprise and startup world alike. In 2023, we’re building a state of the art data platform product, and we could use your help!

Project description

The world of big data is moving quickly. Recently, we have seen a shift away from proprietary data lake products offered by the major cloud providers towards a new paradigm built on open source software - avoiding vendor lock-ins, and reducing overall costs. As part of this shift, large volumes of data are moving into open columnar formats stored in S3-compatible object storage systems. The use of open storage formats allows software from multiple vendors to work on the same underlying data, removing the need for duplication between different systems. However, moving to data lakes based on object storage brings with it some security implications, such as how to enforce access rules on a per-user, per-column basis.

Your task will be to investigate different sorts of access control systems, weigh their pros and cons, and implement a prototype system. The main goal is to design a system for managing file-level permissions on a MinIO cluster, and ideally column-level permissions in Apache Spark. If successful, your work will become a building block in Backticks upcoming data platform product.

You will work alongside Backtick engineers, where you will have an opportunity to learn about the modern big data stack. You may work remotely, or from our offices in central Lund. We prefer to have you around at least weekly.

Research Questions

Research questions will be worked out in collaboration with your supervisors at LTH early on, making sure that the project lives up to the academic standards while maximizing effort on software development.

We suggest something along the lines of:

  • What are the most common ways to implement column-level user access control in data lakes?
  • What are their pros and cons, in terms of
  • Performance (i.e. requeset overhead)
  • Implementation complexity

Who are we?

At Backtick, we’re a mix of innovators, software engineers, data engineers and data scientists. We’re a small consultancy company helping our customers go from data to production ready machine learning systems. We have an office in central Lund, and you are welcome to sit with us. Read more about us at backtick.se

Who are you?

We are looking for two students with a background in Computer Science, Mathematics, Physics or similar fields. This project will involve a decent amount of practical work and prototyping - so it’s a huge benefit if you enjoy programming and feel comfortable solving problems on your own. Prior experience working with Linux will also be useful.

You are excited to learn about the following technologies:

  • Kubernetes
  • Compute clusters, primarily Apache Spark
  • Storage clusters (MinIO, S3)

Start date & duration

Jan/Feb/Mar 2023, 30 HP (~1 semester)

Apply

Introduce yourself in a few lines to:

Johan Henriksson, CTO johan@backtick.se

Photos by:
Josefin Widell Hultgren