Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: support Azure cloud storage service principal authentication - expedite #5926

Open
wants to merge 76 commits into
base: develop
Choose a base branch
from

Conversation

machov
Copy link

@machov machov commented May 28, 2024

PR fulfills these requirements

  • Commit message(s) and PR title follows the format [fix|feat|ci|chore|doc]: TICKET-ID: Short description of change made ex. fix: DEV-XXXX: Removed inconsistent code usage causing intermittent errors
  • Tests for the changes have been added/updated (for bug fixes/features)
  • Docs have been added/updated (for bug fixes/features)
  • Best efforts were made to ensure docs/code are concise and coherent (checked for spelling/grammatical errors, commented out code, debug logs etc.)
  • Self-reviewed and ran all changes on a local instance (for bug fixes/features)

Change has impacts in these area(s)

  • Product design
  • Backend (Database)
  • Backend (API)
  • Frontend

Describe the reason for change

For the moment, there is only an authentication through account key to Azure. The problem is that account key give a lot of rights on the storage that may not be necessary in order to perform readonly operations.

What does this fix?

Now it is possible to have an azure integration that is based on a service principal.

What is the new behavior?

We have to create an app registration in Azure :
We give this registration specific rights depending on what we want to do in labelstudio :
In the UI, we create a new storage integration :

What is the current behavior?

There is no impact on the previous behavior, it's just a new one.

What libraries were added/updated?

Azure Identity

Does this change affect performance?

Get a delegation key is quite slow, that's why there is :

  • 24h cache for delegation key
  • 15mn cache for sas token
    There is a second minor change :
  • When a "thumbnail" tag is present in the datarow, it will be used in the grid view
  • When there is multiple layer in the image, the gridview will only render the first channel.
    (Today there is a buggy behavior).

Does this change affect security?

It should improve the security for people who just need specific rights on the container. For the moment, the "write" behavior have not been tested / developped on the container.

What alternative approaches were there?

We can use an enterprise application and SSO in order to be linked to a user right and act on behalf of him.

What feature flags were used to cover this change?

None

Does this PR introduce a breaking change?

(check only one)

  • Yes, and covered entirely by feature flag(s)
  • Yes, and covered partially by feature flag(s)
  • No
  • Not sure (briefly explain the situation below)

What level of testing was included in the change?

  • e2e
  • integration
  • unit

Which logical domain(s) does this change affect?

  • Authentication
  • UI

Copy link

netlify bot commented May 28, 2024

👷 Deploy request for label-studio-docs-new-theme pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 20465c9

Copy link

netlify bot commented May 28, 2024

👷 Deploy request for heartex-docs pending review.

Visit the deploys page to approve it

Name Link
🔨 Latest commit 20465c9

Copy link

sentry-io bot commented May 28, 2024

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: label_studio/io_storages/base_models.py

Function Unhandled Issue
info_set_in_progress ValueError: Storage status (in_progress) must be QUEUED to move it IN_PROGRESS io_storages.base...
Event Count: 1

Did you find this useful? React with a 👍 or 👎

@machov machov marked this pull request as ready for review May 28, 2024 05:07
@machov machov changed the title Man/friend feat: Feature/azure service principal May 28, 2024
@machov machov changed the title feat: Feature/azure service principal feat: Feature/azure service principal - expedite May 28, 2024
@makseq
Copy link
Member

makseq commented May 28, 2024

Is it a copy of #5765 ?

@machov machov closed this May 28, 2024
@machov
Copy link
Author

machov commented May 30, 2024

I also created a PR to @FrsECM branch FrsECM#1

@machov machov reopened this Jun 16, 2024
@machov
Copy link
Author

machov commented Jul 8, 2024

For the current PR, I am not able to successfully create the pre-signed urls, any suggestions?

- name: stage
  request:
    method: GET
    url: '{django_live_url}/api/projects/{project_pk}/next'
  response:
    json:
      data:
        image_url: !re_match "/tasks/\\d+/presign/\\?fileuri=YXp1cmUtYmxvYjovL3B5dGVzdC1henVyZS1pbWFnZXMvYWJj"
    status_code: 200

this test above fails, the endpoint returns "azure-spi://<path/to/file>" instead of "/tasks/\d+/presign/?fileuri=YX..."

I resolved this issue, all tests are passing now 🚀 🚀 🚀

This storage authentication supports all types of Label studio inputs: presigned urls, blobs, and jsons. Any chance someone can take a look at this @makseq ? 🗡️ 💯 😀

@machov
Copy link
Author

machov commented Jul 8, 2024

anyone that can help with this @vegai @deppp @sajarin ? this authentication would unblock projects in a lot of companies that only allow Service Principal to authenticate to Azure 👍

@machov
Copy link
Author

machov commented Jul 9, 2024

ready to merge 😃 - go to CI build
image

@machov
Copy link
Author

machov commented Jul 10, 2024

J’ai chiffré le mot passe @FrsECM

@machov
Copy link
Author

machov commented Jul 10, 2024

I added the encryption to avoid storing passwords and have even more security. :)

@FrsECM
Copy link
Contributor

FrsECM commented Jul 10, 2024

Great to see it !

Just for my understanding. I saw that you replace blob urls by something like :

´´´python
"azure_spi://container_name/blob_name"
´´´

It is the way you handle it as a standard in LStudio ?

@machov
Copy link
Author

machov commented Jul 10, 2024

this is the expected behavior @FrsECM , i learned by setting s3 sync. Azure, GCP, and Redis also work this way with support for uris in json like :///blob.

label-studio supports different sources: a) json or b) blob. When it is a) json, it uses this convention of a json like {‘text’: ‘some_text’, ‘image’: uri}, when the uri is something like s3://<bucket>/<blob> , azure-blob://<container>/<blob> or in our case azure-spi://<container>/<blob> then the backend resolves it to the actual blob url https:/<account_name>.blob.core.windows.net/container/path/to/blob/<sas_token>. This way it can load jsons with text predictions and annotations together with blobs like images. You can see azure-blob storage using account key also supports this functionality and we are mirroring their tests.

when it is b) blob , it will read all objects in the container as blobs using https:.microsoft.dfs.com/path/to/blob/<sas_token’

this is explained in more detail here https://labelstud.io/guide/storage

@FrsECM
Copy link
Contributor

FrsECM commented Jul 10, 2024 via email

@machov
Copy link
Author

machov commented Jul 10, 2024

There is no dfs involved.

Azure spi will create that https:// path

using this template AZURE_ACCOUNT_URL_TEMPLATE = Template('https://${account_name}.blob.core.windows.net')

@machov
Copy link
Author

machov commented Jul 10, 2024 via email

@FrsECM
Copy link
Contributor

FrsECM commented Jul 10, 2024 via email

@machov
Copy link
Author

machov commented Jul 14, 2024

I ended up removing the encryption settings from the PR, @FrsECM cause the UI is still having some trouble with it, this was my last attempt to add the encryption in this separate branch https://github.com/machov/label-studio/tree/dev/man/add_sp

@machov machov changed the title feat: Feature/azure service principal - expedite feat: support Azure cloud storage service principal authentication - expedite Jul 15, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants