Releases: databendlabs/openraft
Fix internal dependency. Nothing changed.
v0.7.5 Doc: update change log for 0.7.5
Improve membership management
Changed:
-
Changed: 1bd22edc remove AddLearnerError::Exists, which is not actually used; by 张炎泼; 2022-09-30
-
Changed: c6fe29d4 change-membership does not return error when replication lags; by 张炎泼; 2022-10-22
If
blocking
istrue
,Raft::change_membership(..., blocking)
will
block until repliication to new nodes become upto date.
But it won't return an error when proposing change-membership log.-
Change: remove two errors:
LearnerIsLagging
andLearnerNotFound
. -
Fix: #581
-
Fixed:
-
Fixed: 2896b98e changing membership should not remove replication to all learners; by 张炎泼; 2022-09-30
When changing membership, replications to the learners(non-voters) that
are not added as voter should be kept.E.g.: with a cluster of voters
{0}
and learners{1, 2, 3}
, changing
membership to{0, 1, 2}
should not remove replication to node3
.Only replications to removed members should be removed.
Added:
- Added: 9a22bb03 add rocks-store as a
RaftStorage
implementation based on rocks-db; by 张炎泼; 2023-02-22
0.7.3
Changed:
-
Changed: 25e94c36 InstallSnapshotResponse: replies the last applied log id; Do not install a smaller snapshot; by 张炎泼; 2022-09-22
A snapshot may not be installed by a follower if it already has a higher
last_applied
log id locally.
In such a case, it just ignores the snapshot and respond with its local
last_applied
log id.This way the applied state(i.e.,
last_applied
) will never revert back.
Fixed:
-
Fixed: 21684bbd potential inconsistency when installing snapshot; by 张炎泼; 2022-09-22
The conflicting logs that are before
snapshot_meta.last_log_id
should
be deleted before installing a snapshot.Otherwise there is chance the snapshot is installed but conflicting logs
are left in the store, when a node crashes.
0.7.1:
v0.7.1
Added:
-
Added: ea696474 add feature-flag:
bt
enables backtrace; by 张炎泼; 2022-03-12--features bt
enables backtrace when generating errors.
By default errors does not contain backtrace info.Thus openraft can be built on stable rust by default.
To use on stable rust with backtrace, set
RUSTC_BOOTSTRAP=1
, e.g.:RUSTUP_TOOLCHAIN=stable RUSTC_BOOTSTRAP=1 make test
v0.7.0-alpha.3
Changed:
- Changed: f99ade30 API: move default impl methods in RaftStorage to StorageHelper; by 张炎泼; 2022-07-04
Fixed:
-
Fixed: 44381b0c when handling append-entries, if prev_log_id is purged, it should not delete any logs.; by 张炎泼; 2022-08-14
When handling append-entries, if the local log at
prev_log_id.index
is
purged, a follower should not believe it is a conflict and should
not delete all logs. It will get committed log lost.To fix this issue, use
last_applied
instead ofcommitted
:
last_applied
is always the committed log id, whilecommitted
is not
persisted and may be smaller than the actually applied, when a follower
is restarted.
v0.7.0-alpha.2
Fixed:
-
Fixed: 30058c03 #424 wrong range when searching for membership entries:
[end-step, end)
.; by 张炎泼; 2022-07-03The iterating range searching for membership log entries should be
[end-step, end)
, not[start, end)
.
With this bug it will return duplicated membership entries.- Bug: #424
v0.7.0-alpha.1
Fixed:
-
Fixed: d836d85c if there may be more logs to replicate, continue to call send_append_entries in next loop, no need to wait heartbeat tick; by lichuang; 2022-01-04
-
Fixed: 5a026674 defensive_no_dirty_log hangs tests; by YangKian; 2022-01-08
-
Fixed: 8651625e save leader_id if a higher term is seen when handling append-entries RPC; by 张炎泼; 2022-01-10
Problem:
A follower saves hard state
(term=msg.term, voted_for=None)
when amsg.term > local.term
when handling append-entries RPC.This is quite enough to be correct but not perfect. Correct because:
-
In one term, only an established leader will send append-entries;
-
Thus, there is a quorum voted for this leader;
-
Thus, no matter what
voted_for
is saved, it is still correct. E.g.
when handling append-entries, a follower node could save hard state
(term=msg.term, voted_for=Some(ANY_VALUE))
.
The problem is that a follower already knows the legal leader for a term
but still does not save it. This leads to an unstable cluster state: The
test sometimes fails.Solution:
A follower always save hard state with the id of a known legal leader.
-
-
Fixed: 1a781e1b when lack entry, the snapshot to build has to include at least all purged logs; by 张炎泼; 2022-01-18
-
Fixed: a0a94af7 span.enter() in async loop causes memory leak; by 张炎泼; 2022-06-17
It is explained in:
https://onesignal.com/blog/solving-memory-leaks-in-rust/
Changed:
-
Changed: c9c8d898 trait RaftStore: remove get_membership_config(), add last_membership_in_log() and get_membership() with default impl; by drdr xp; 2022-01-04
Goal: minimize the work for users to implement a correct raft application.
Now RaftStorage provides default implementations for
get_membership()
andlast_membership_in_log()
.These two methods just can be implemented with other basic user impl
methods.- fix: #59
-
Changed: abda0d10 rename RaftStorage methods do_log_compaction: build_snapshot, delete_logs_from: delete_log; by 张炎泼; 2022-01-15
-
Changed: a52a9300 RaftStorage::get_log_state() returns last purge log id; by 张炎泼; 2022-01-16
-
Change:
get_log_state()
returns thelast_purged_log_id
instead of thefirst_log_id
.
Because there are some cases in which log are empty:
When a snapshot is install that covers all logs,
or whenmax_applied_log_to_keep
is 0.Returning
None
is not clear about if there are no logs at all or
all logs are deleted.In such cases, raft still needs to maintain log continuity
when repilcating. Thus the last log id that once existed is important.
Previously this is done by checking thelast_applied_log_id
, which is
dirty and buggy.Now an implementation of
RaftStorage
has to maintain the
last_purged_log_id
in its store. -
Change: Remove
first_id_in_log()
,last_log_id()
,first_known_log_id()
,
because concepts are changed. -
Change: Split
delete_logs()
into two method for clarity:delete_conflict_logs_since()
for deleting conflict logs when the
replication receiving end find a conflict log.purge_logs_upto()
for cleaning applied logs -
Change: Rename
finalize_snapshot_installation()
toinstall_snapshot()
.
-
-
Changed: 7424c968 remove unused error MembershipError::Incompatible; by 张炎泼; 2022-01-17
-
Changed: beeae721 add ChangeMembershipError sub error for reuse; by 张炎泼; 2022-01-17
Fix: span.enter() in async loop causes memory leak
Fixed:
- Fixed: 4cd2a12b span.enter() in async loop causes memory leak; by 张炎泼; 2022-06-17
v0.6.4
v0.6.4
v0.6.3
v0.6.2
Fixed:
-
Fixed: 4d58a51e a non-voter not in joint config should not block replication; by drdr xp; 2021-08-31
-
Fixed: eed681d5 race condition of concurrent snapshot-install and apply.; by drdr xp; 2021-09-01
Problem:
Concurrent snapshot-install and apply mess up
last_applied
.finalize_snapshot_installation
runs in theRaftCore
thread.
apply_to_state_machine
runs in a separate tokio task(thread).Thus there is chance the
last_applied
being reset to a previous value:-
apply_to_state_machine
is called and finished in a thread. -
finalize_snapshot_installation
is called inRaftCore
thread and
finished withlast_applied
updated. -
RaftCore
thread finished waiting forapply_to_state_machine
, and
updatedlast_applied
to a previous value.
RaftCore: -. install-snapshot, .-> replicate_to_sm_handle.next(), | update last_applied=5 | update last_applied=2 | | v | task: apply 2------------------------' --------------------------------------------------------------------> time
Solution:
Rule: All changes to state machine must be serialized.
A temporary simple solution for now is to call all methods that modify state
machine inRaftCore
thread.
But this way it blocksRaftCore
thread.A better way is to move all tasks that modifies state machine to a
standalone thread, and send update request back toRaftCore
to update
its fields such aslast_applied
-
-
Fixed: a48a3282 handle-vote should compare last_log_id in dictionary order, not in vector order; by drdr xp; 2021-09-09
A log
{term:2, index:1}
is definitely greater than log{term:1, index:2}
in raft spec.
Comparing log id in the way ofterm1 >= term2 && index1 >= index2
blocks election:
no one can become a leader. -
Fixed: 228077a6 a restarted follower should not wait too long to elect. Otherwise the entire cluster hangs; by drdr xp; 2021-11-19
-
Fixed: 6c0ccaf3 consider joint config when starting up and committing.; by drdr xp; 2021-12-24
-
Change: MembershipConfig support more than 2 configs
-
Makes fields in MembershipConfig privates.
Provides methods to manipulate membership. -
Fix: commit without replication only when membership contains only one
node. Previously it just checks the first config, which results in
data loss if the cluster is in a joint config. -
Fix: when starting up, count all nodes but not only the nodes in the
first config to decide if it is a single node cluster.
-
-
Fixed: b390356f first_known_log_id() should returns the min one in log or in state machine; by drdr xp; 2021-12-28
-
Fixed: cd5a570d clippy warning; by lichuang; 2022-01-02
Changed:
-
Changed: deda6d76 remove PurgedMarker. keep logs clean; by drdr xp; 2021-09-09
Changing log(add a PurgedMarker(original SnapshotPointer)) makes it
diffeicult to implinstall-snapshot
for a RaftStore without a lock
protecting both logs and state machine.Adding a PurgedMarker and installing the snapshot has to be atomic in
storage layer. But usually logs and state machine are separated store.
e.g., logs are stored in fast flash disk and state machine is stored
some where else.To get rid of the big lock, PurgedMarker is removed and installing a
snaphost does not need to keep consistent with logs any more. -
Changed: 734eec69 VoteRequest: use last_log_id:LogId to replace last_log_term and last_log_index; by drdr xp; 2021-09-09
-
Changed: 74b16524 introduce StorageError. RaftStorage gets rid of anyhow::Error; by drdr xp; 2021-09-13
StorageError
is anenum
of DefensiveError and StorageIOError.
An error a RaftStorage impl returns could be a defensive check error
or an actual io operation error.Why:
anyhow::Error is not enough to support the flow control in RaftCore.
It is typeless thus RaftCore can not decide what next to do
depending on the returned error.Inside raft, anyhow::Error should never be used, although it could be used as
source()
of some other error types. -
Changed: 46bb3b1c
RaftStorage::finalize_snapshot_installation
is no more responsible to delete logs included in snapshot; by drdr xp; 2021-09-13A RaftStorage should be as simple and intuitive as possible.
One should be able to correctly impl a RaftStorage without reading the
guide but just by guessing what a trait method should do.RaftCore is able to do the job of deleting logs that are included in
the state machine, RaftStorage should just do what is asked. -
Changed: 2cd23a37 use structopt to impl config default values; by drdr xp; 2021-09-14
-
Changed: ac4bf4bd InitialState: rename last_applied_log to last_applied; by drdr xp; 2021-09-14
-
Changed: 74283fda RaftStorage::do_log_compaction() do not need to delete logs any more raft-core will delete them.; by drdr xp; 2021-09-14
-
Changed: 112252b5 RaftStorage add 2 API: last_id_in_log() and last_applied_state(), remove get_last_log_id(); by drdr xp; 2021-09-15
-
Changed: 7f347934 simplify membership change; by drdr xp; 2021-09-16
-
Change: if leadership is lost, the cluster is left with the joint
config.
One does not receive response of the change-membership request should
always re-send to ensure membership config is applied. -
Change: remove joint-uniform logic from RaftCore, which brings a lot
complexity to raft impl. This logic is now done in Raft(which is a
shell to control RaftCore). -
Change: RaftCore.membership is changed to
ActiveMembership
, which
includes a log id and a membership config.
Making this change to let raft be able to check if a membership is
committed by comparing the log index and its committed index. -
Change: when adding a existent non-voter, it returns an
Ok
value
instead of anErr
. -
Change: add arg
blocking
toadd_non_voter
andchange_membership
.
A blockingchange_membership
still wait for the two config change
log to commit.
blocking
only indicates if to wait for replication to non-voter to
be up to date. -
Change: remove
non_voters
. Merge it intonodes
.
Now both voters and non-voters share the same replication handle. -
Change: remove field
ReplicationState.is_ready_to_join
, it
can be just calculated when needed. -
Change: remove
is_stepping_down
,membership.contains()
is quite
enough. -
Change: remove
consensus_state
.
-
-
Changed: df684131 bsearch to find matching log between leader and follower; by drdr xp; 2021-12-17
-
Refactor: simplify algo to find matching log between leader and follower.
It adopts a binary-search like algo:The leader tracks the max matched log id(
self.matched
) and the least unmatched log id(self.max_possible_matched_index
).The follower just responds if the
prev_log_id
inAppendEntriesRequest matches the log at
prev_log_id.index
in its
store.Remove the case-by-case algo.
-
Change: RaftStorage adds 2 new API:
try_get_log_entries()
,
first_id_in_log()
andfirst_known_log_id()
.These a are not stable, may be removed soon.
-
Fix: the timeout for
Wait()
should be a total timeout. Otherwise a
Wait()
never quits. -
Fix: when send append-entries request, if a log is not found, it
should retry loading, but not enter snapshot state.
Because a log may be deleted by RaftCore just after Replication read
prev_log_id
from the store. -
Refactor: The two replication loop: line-rate loop and snapshot loop
should not change theReplicationState
, but instead returning an
error.
Otherwise it has to check the state everywhere. -
Refactor: simplify receiving RaftCore messages: split
drain_raft_rx()
intoprocess_raft_event()
and
try_drain_raft_rx()
. -
Featur...
-