You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What happened:
when running gc with "--delete" option , eg
"juicefs gc postgres://jfs_admin:'xxxx'@jfs_meta_url:5432/jfs --delete"
It fall into infinite loop as below.
2024/12/02 08:35:24.920074 juicefs[18479] <WARNING>: Get directory parent of inode 11496018: no such file or directory [quota.go:347]
2024/12/02 08:35:24.920132 juicefs[18479] <WARNING>: Get directory parent of inode 11496018: no such file or directory [quota.go:347]
2024/12/02 08:35:24.920480 juicefs[18479] <WARNING>: Get directory parent of inode 11496018: no such file or directory [quota.go:347]
2024/12/02 08:35:24.920999 juicefs[18479] <WARNING>: Get directory parent of inode 11496018: no such file or directory [quota.go:347]
What you expected to happen:
It should stop or skip the non-existed inode.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?
I am using below mount option during most of my test juicefs mount -d -o allow_other --writeback --backup-meta 0 --buffer-size 2000 --cache-partial-only
jfs=# select * from jfs_node where inode=11496018;
inode | type | flags | mode | uid | gid | atime | mtime | ctime | atimensec | mtimensec | ctimensec | nlink | length | rdev | parent | access_acl_id | default_acl_id
-------+------+-------+------+-----+-----+-------+-------+-------+-----------+-----------+-----------+-------+--------+------+--------+---------------+----------------
(0 rows)
jfs=# select * from jfs_node where parent=11496018;
inode | type | flags | mode | uid | gid | atime | mtime | ctime | atimensec | mtimensec | ctimensec | nlink | length | rdev | parent | access_acl_id | default_acl_id
----------+------+-------+------+-----+-----+------------------+------------------+------------------+-----------+-----------+-----------+-------+--------+------+----------+---------------+----------------
25152661 | 2 | 0 | 493 | 0 | 0 | 1728496776274948 | 1728496776274948 | 1728496776274948 | 943 | 943 | 943 | 249 | 4096 | 0 | 11496018 | 0 | 0
(1 row)
[root@xxx jfs]# juicefs info -i 11496018
2024/12/02 08:40:42.871177 juicefs[18913] <FATAL>: info: no such file or directory [info.go:152]
[root@xxx jfs]# juicefs info -i 25152661
25152661 :
inode: 25152661
files: 0
dirs: 19
length: 1.75 MiB (1840395 Bytes)
size: 2.73 MiB (2863104 Bytes)
path: unknown
Create table broken_records as
WITH RECURSIVE c AS (
SELECT 11496018::bigint AS inode , 0::bigint as parent
UNION ALL
SELECT sa.inode , sa.parent
FROM jfs_node AS sa
JOIN c ON c.inode = sa.parent
)
SELECT * FROM c;
SELECT 1212423
I use above SQL to dump broken directory structure and saw it populate about 1.2M records.
From the timestamp , eg "mtime" =1728496776274948 , it is on Oct 9 2024, seems to belonging to a directory created by "juicefs clone". eg , as below , inode 25358790 is a directory with 100 files under it , 25358800 is one of the files under 25358790 , this size match how I create the test directory with 100 empty files under it. During my test , I create a directory "dir1" with about totally 20 million files , each layer has many subdirectories, each subdirectory has 100 direct empty files under it. After I have such "dir1" , then I use "juicefs clone" to clone dir1 to dir2. My best guess this broken inode is somehow related to the cloned directory.
I also tried "juicefs fsck --path / --repair --recursive" seems can not fix the issue.
### further check distribution of the broken directory , most of them(11998) has 100 direct files/dirs under each of them , pretty match how I generate "dir1" with about 20 million files.
jfs=# select count(*),child_num from (select count(*) as child_num ,parent from broken_records group by parent order by count(*)) t group by child_num;
count | child_num
-------+-----------
3 | 1
1 | 7
1 | 10
1 | 13
1 | 15
1 | 18
1 | 22
1 | 23
1 | 26
2 | 35
2 | 59
1 | 60
1 | 63
1 | 68
1 | 93
11998 | 100
2 | 603
5 | 604
1 | 605
2 | 606
1 | 607
1 | 665
1 | 672
1 | 673
1 | 675
1 | 679
2 | 1000
(27 rows)
Environment:
JuiceFS version (use juicefs --version) or Hadoop Java SDK version: juicefs version 1.2.1+2024-08-30.cd871d19
Cloud provider or hardware configuration running JuiceFS: on-prem hardware with ceph storage backend.
OS (e.g cat /etc/os-release): CentOS Linux release 7.9.2009 (Core)
What happened:
when running gc with "--delete" option , eg
"juicefs gc postgres://jfs_admin:'xxxx'@jfs_meta_url:5432/jfs --delete"
It fall into infinite loop as below.
What you expected to happen:
It should stop or skip the non-existed inode.
How to reproduce it (as minimally and precisely as possible):
Anything else we need to know?
I am using below mount option during most of my test
juicefs mount -d -o allow_other --writeback --backup-meta 0 --buffer-size 2000 --cache-partial-only
I use above SQL to dump broken directory structure and saw it populate about 1.2M records.
From the timestamp , eg "mtime" =1728496776274948 , it is on Oct 9 2024, seems to belonging to a directory created by "juicefs clone". eg , as below , inode 25358790 is a directory with 100 files under it , 25358800 is one of the files under 25358790 , this size match how I create the test directory with 100 empty files under it. During my test , I create a directory "dir1" with about totally 20 million files , each layer has many subdirectories, each subdirectory has 100 direct empty files under it. After I have such "dir1" , then I use "juicefs clone" to clone dir1 to dir2. My best guess this broken inode is somehow related to the cloned directory.
I also tried "juicefs fsck --path / --repair --recursive" seems can not fix the issue.
Environment:
JuiceFS version (use
juicefs --version
) or Hadoop Java SDK version:juicefs version 1.2.1+2024-08-30.cd871d19
Cloud provider or hardware configuration running JuiceFS: on-prem hardware with ceph storage backend.
OS (e.g
cat /etc/os-release
):CentOS Linux release 7.9.2009 (Core)
Kernel (e.g.
uname -a
):5.4.206-200.el7.x86_64 #1 SMP Thu Jul 28 14:58:01 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
Object storage (cloud provider and region, or self maintained):
Ceph , self hosted
Metadata engine info (version, cloud provider managed or self maintained):
PostgreSQL 17.2
Network connectivity (JuiceFS to metadata engine, JuiceFS to object storage):
all local network in the same datacenter.
Others:
The text was updated successfully, but these errors were encountered: