Skip to content

Commit

Permalink
Add ranger permission check (#5285)
Browse files Browse the repository at this point in the history
  • Loading branch information
lianneli authored Nov 29, 2024
1 parent 7828a6b commit 065f8ab
Show file tree
Hide file tree
Showing 18 changed files with 2,924 additions and 10 deletions.
49 changes: 49 additions & 0 deletions docs/en/deployment/hadoop_java_sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -741,6 +741,55 @@ JuiceFS can use local disk as a cache to accelerate data access, the following d

![parquet](../images/spark_sql_parquet.png)

## Permission control by Apache Ranger

JuiceFS currently supports path permission control by integrating with Apache Ranger's HDFS module.

### 1. Configurations

| Configuration | Default Value | Description |
|-----------------------------------|---------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| `juicefs.ranger-rest-url` | | `ranger`'s HTTP link url. Not configured means not using this feature. |
| `juicefs.ranger-service-name` | | `ranger`'s `service name` in `HDFS` module, required |
| `juicefs.ranger-cache-dir` | | `ranger`'s policies cache path. By default, a `UUID` path hierarchy is added under the environment variable `java.io.tmpdir` to prevent multitasking from interfering with each other. After configuring a fixed directory, multiple tasks will share the cache, and only one JuiceFS is responsible for cache refreshing, to reduce the pressure on connecting to `Ranger Admin`. |
| `juicefs.ranger-poll-interval-ms` | `30000` | `ranger`'s interval to refresh cache, default is 30s |

### 2. Dependencies

Considering the complexity of the authentication environment and the possibility of dependency conflicts, the JAR packages related to Ranger authentication (such as `ranger-plugins-common-2.3.0.jar`, `ranger-plugins-audit-2.3.0.jar`, etc.) and their dependencies have not been included in the JuiceFS SDK.

If occurred the `ClassNotFound` error when use, it is recommended to import it into the relevant directory (such as `$SPARK-HOME/jars`)

Some dependencies may need:

```shell
ranger-plugins-common-2.3.0.jar
ranger-plugins-audit-2.3.0.jar
gethostname4j-1.0.0.jar
jackson-jaxrs-1.9.13.jar
jersey-client-1.19.jar
jersey-core-1.19.jar
jna-5.7.0.jar
```

### 3. Tips

#### 3.1 Ranger version

The code is tested on `Ranger2.3` and `Ranger2.4`. As no other features are used except for `HDFS` module authentication, theoretically all other versions are applicable.

#### 3.2 Ranger Audit

Currently, only support authentication function, and the `Ranger Audit` is disabled.

#### 3.3 Ranger's other parameters

To improve usage efficiency, currently only support some **CORE** parameters of Ranger.

#### 3.4 Security tips

Due to the complete open source of the project, it is unavoidable for users to disrupt permission control by replacing parameters such as `juicefs.ranger.rest-url`. If stricter control is required, it is recommended to compile the code independently and solve the problem by encrypting relevant security parameters.

## FAQ

### 1. `Class io.juicefs.JuiceFileSystem not found` exception
Expand Down
49 changes: 49 additions & 0 deletions docs/zh_cn/deployment/hadoop_java_sdk.md
Original file line number Diff line number Diff line change
Expand Up @@ -866,6 +866,55 @@ JuiceFS 可以使用本地磁盘作为缓存加速数据访问,以下数据是

![parquet](../images/spark_sql_parquet.png)

## 使用 Apache Ranger 进行权限管控

JuiceFS 当前支持对接 Apache Ranger 的 `HDFS` 模块进行路径的权限管控。

### 1. 相关配置

| 配置项 | 默认值 | 描述 |
|-----------------------------------|----------|--------------------------------------------------------------------------------------------------------------------------------|
| `juicefs.ranger-rest-url` | | `ranger`连接地址。不配置该参数即不使用该功能。 |
| `juicefs.ranger-service-name` | | `ranger`中配置的`service name`,必填 |
| `juicefs.ranger-cache-dir` | | `ranger`策略的缓存路径。默认在环境变量`java.io.tmpdir`下,添加`UUID`路径层级防止多任务相互影响。当配置固定目录后,多个任务会共享缓存,有且仅有一个JuiceFS对象负责缓存刷新,减少对连接`Ranger Admin`压力。 |
| `juicefs.ranger-poll-interval-ms` | `30000` | `ranger`缓存刷新周期,默认30s |

### 2. 环境及依赖

考虑到鉴权环境的复杂性,以及依赖冲突的可能性,Ranger 鉴权相关 JAR 包(例如`ranger-plugins-common-2.3.0.jar`,`ranger-plugins-audit-2.3.0.jar`等)及其依赖并未打进 JuiceFS 的 SDK 中。

使用中如果遇到`ClassNotFound`报错,建议单独引入相关目录中(例如`$SPARK_HOME/jars`

可能需要单独添加的依赖:

```shell
ranger-plugins-common-2.3.0.jar
ranger-plugins-audit-2.3.0.jar
gethostname4j-1.0.0.jar
jackson-jaxrs-1.9.13.jar
jersey-client-1.19.jar
jersey-core-1.19.jar
jna-5.7.0.jar
```

### 3. 使用提示

#### 3.1 Ranger版本

当前代码测试基于`Ranger2.3``Ranger2.4`版本,因除`HDFS`模块鉴权外并未使用其他特性,理论上其他版本均适用。

#### 3.2 Ranger Audit

当前仅支持鉴权功能,`Ranger Audit`功能已关闭。

#### 3.3 Ranger其他参数

为提升使用效率,当前仅开放连接 Ranger 最核心的参数。

#### 3.4 安全性问题

因项目代码完全开源,无法避免用户通过替换`juicefs.ranger.rest-url`等参数的方式扰乱安全管控。如需更严格的管控,建议自主编译代码,通过将相关安全参数进行加密处理等方式解决。

## FAQ

### 1. 出现 `Class io.juicefs.JuiceFileSystem not found` 异常
Expand Down
27 changes: 27 additions & 0 deletions sdk/java/pom.xml
Original file line number Diff line number Diff line change
Expand Up @@ -350,6 +350,33 @@
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.ranger</groupId>
<artifactId>ranger-plugins-common</artifactId>
<version>2.3.0</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.ranger</groupId>
<artifactId>ranger-plugins-audit</artifactId>
<version>2.3.0</version>
<exclusions>
<exclusion>
<groupId>*</groupId>
<artifactId>*</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.httpcomponents</groupId>
<artifactId>httpclient</artifactId>
<version>4.5.13</version>
</dependency>
</dependencies>

<distributionManagement>
Expand Down
Loading

0 comments on commit 065f8ab

Please sign in to comment.