Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate fields mount name and mount path in Dataset #3687

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

zhang-x-z
Copy link
Collaborator

Ⅰ. Describe what this PR does

This PR validate fields mount.Name and mount.Path when creating dataset.

Ⅱ. Does this pull request fix one issue?

NONE

Ⅲ. List the added test cases (unit test/integration test) if any, please explain if no tests are needed.

Start new Dataset Controller in local and apply different datasets, check the status of these datasets.

  • Test Case 1: Create valid dataset with single mount.
$ cat test-dataset-1.yaml 
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
      path: /test
$ kubectl apply -f test-dataset-1.yaml 
dataset.data.fluid.io/demo created
$ kubectl get dataset demo
NAME   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE      AGE
demo                                                                  NotBound   7s

Dataset status switch to NotBound successfully.

  • Test Case 2: Create valid dataset with multi mounts.
$ cat test-dataset-2.yaml
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
    - mountPoint: https://mirrors.bit.edu.cn/apache/flink/
      name: flink
$ kubectl apply -f test-dataset-2.yaml 
dataset.data.fluid.io/demo created
$ kubectl get dataset demo
NAME   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE      AGE
demo                                                                  NotBound   7s

Dataset status switch to NotBound successfully.

  • Test Case 3: Create invalid mount name in dataset with single mount.
$ cat test-dataset-3.yaml 
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: $(cat /proc/self/status | grep CapEff > /test.txt)
      path: /test
$ kubectl apply -f test-dataset-3.yaml 
dataset.data.fluid.io/demo created
$ kubectl get dataset demo
NAME   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
demo                                                                          66s

Dataset status didn't switch to NotBound after a long time.
Logs in Dataset Controller:

ERROR	datasetctl.Dataset	dataset/dataset_controller.go:109	Failed to create dataset	{"dataset": "default/demo", "DatasetCreateError": "default/demo", "error": "spec.mounts.name: Invalid value: \"$(cat /proc/self/status | grep CapEff > /test.txt)\": a DNS-1035 label must consist of lower case alphanumeric characters or '-', start with an alphabetic character, and end with an alphanumeric character (e.g. 'my-name',  or 'abc-123', regex used for validation is '[a-z]([-a-z0-9]*[a-z0-9])?')"}
  • Test Case 4: Create invalid mount path in dataset with single mount.
$ cat test-dataset-4.yaml 
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
      path: /$(cat /proc/self/status | grep CapEff > /test.txt)/test
$ kubectl apply -f test-dataset-4.yaml 
dataset.data.fluid.io/demo created
$ kubectl get dataset demo
NAME   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
demo                                                                          92s

Dataset status didn't switch to NotBound after a long time.
Logs in Dataset Controller:

ERROR	datasetctl.Dataset	dataset/dataset_controller.go:109	Failed to create dataset	{"dataset": "default/demo", "DatasetCreateError": "default/demo", "error": "spec.mounts.path: Invalid value: \"/$(cat /proc/self/status | grep CapEff > /test.txt)/test\": every part of the path shuold follow the relaxed DNS (RFC 1123) rule which additionally allows upper case alphabetic character and character '_'"}
  • Test Case 5: Create invalid mount path in the second mount of the dataset.
$ cat test-dataset-5.yaml 
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: spark
    - mountPoint: https://mirrors.bit.edu.cn/apache/flink/
      name: flink
      path: /$(cat /proc/self/status | grep CapEff > /test.txt)/test2
$ kubectl apply -f test-dataset-5.yaml 
dataset.data.fluid.io/demo created
$ kubectl get dataset demo
NAME   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE   AGE
demo                                                                          95s

Dataset status didn't switch to NotBound after a long time.
Logs in Dataset Controller:

ERROR	datasetctl.Dataset	dataset/dataset_controller.go:109	Failed to create dataset	{"dataset": "default/demo", "DatasetCreateError": "default/demo", "error": "spec.mounts.path: Invalid value: \"/$(cat /proc/self/status | grep CapEff > /test.txt)/test2\": every part of the path shuold follow the relaxed DNS (RFC 1123) rule which additionally allows upper case alphabetic character and character '_'"}
  • Test Case 6: Test disable validation.
$ cat test-dataset-6.yaml 
apiVersion: data.fluid.io/v1alpha1
kind: Dataset
metadata:
  name: demo
spec:
  mounts:
    - mountPoint: https://mirrors.bit.edu.cn/apache/spark/
      name: ${TEST}
      path: /test
$ kubectl apply -f test-dataset-6.yaml 
dataset.data.fluid.io/demo created
$ kubectl get dataset demo
NAME   UFS TOTAL SIZE   CACHED   CACHE CAPACITY   CACHED PERCENTAGE   PHASE      AGE
demo                                                                  NotBound   7s

Dataset status switch to NotBound successfully.

Ⅳ. Describe how to verify it

Ⅴ. Special notes for reviews

Copy link

fluid-e2e-bot bot commented Jan 11, 2024

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by:
Once this PR has been reviewed and has the lgtm label, please assign cheyang for approval by writing /assign @cheyang in a comment. For more information see:The Kubernetes Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@zhang-x-z zhang-x-z force-pushed the validate-dataset-mount branch from d4de617 to f8750d0 Compare January 11, 2024 09:03
Copy link

codecov bot commented Jan 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 64.49%. Comparing base (453b086) to head (e13a16a).
Report is 126 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master    #3687      +/-   ##
==========================================
+ Coverage   64.47%   64.49%   +0.02%     
==========================================
  Files         471      472       +1     
  Lines       28140    28179      +39     
==========================================
+ Hits        18142    18175      +33     
- Misses       7844     7848       +4     
- Partials     2154     2156       +2     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@zhang-x-z zhang-x-z requested a review from cheyang January 12, 2024 04:17
Copy link
Collaborator

@cheyang cheyang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a separate validation method. and invoke it here. WDYT?

// 0.1 Validate the mount name and mount path
// Users can set the environment variable to 'false' to disable this validation
// Default is true
if utils.GetBoolValueFromEnv(common.EnvEnableMountValidation, true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a separate validation method. and invoke it here. WDYT?

// 0.1 Validate the mount name and mount path
// Users can set the environment variable to 'false' to disable this validation
// Default is true
if utils.GetBoolValueFromEnv(common.EnvEnableMountValidation, true) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a separate validation method. and invoke it here. WDYT?

Copy link
Collaborator Author

@zhang-x-z zhang-x-z Jan 15, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Add some new unit tests. Re-run the end to end tests above and all of these test cases are passed.

@zhang-x-z zhang-x-z force-pushed the validate-dataset-mount branch 3 times, most recently from 2a88c73 to 6dd5dcb Compare January 16, 2024 06:04
@cheyang cheyang requested a review from TrafalgarZZZ January 16, 2024 12:21
@@ -1,5 +1,5 @@
/*
Copyright 2022 The Fluid Author.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need to change the copyright year. It should be the year of the creation time of the file.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

// Empty name or path is allowed
if len(mount.Name) != 0 {
// If users set the mount.Name, it should comply with the DNS1035 rule.
if errs := validation.IsDNS1035Label(mount.Name); len(errs) > 0 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest to use same validation method for both mount.Name and the parts of mount.Path. The main reason is that for AlluxioRuntime and JindoRuntime, the mount.Name will be used as the default mount path if mount.Path is not set. Using same validation method can keep such cases more consistent.

// 0.1 Validate the mount name and mount path
// Users can set the environment variable to 'false' to disable this validation
// Default is true
if !enableMountValidation {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we have to add a validation option here? WDYT @cheyang @zhang-x-z

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's not essential. Instead, how about putting the validation logic into function ReconcileInternal?

if errs := validation.IsDNS1035Label(runtime.GetName()); len(runtime.GetName()) > 0 && len(errs) > 0 {

@zhang-x-z zhang-x-z force-pushed the validate-dataset-mount branch from 6dd5dcb to e13a16a Compare January 17, 2024 06:30
Copy link

Quality Gate Passed Quality Gate passed

Kudos, no new issues were introduced!

0 New issues
0 Security Hotspots
No data about Coverage
0.0% Duplication on New Code

See analysis details on SonarCloud

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants