-
-
Notifications
You must be signed in to change notification settings - Fork 188
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve logging of DB message insertion #1893
Improve logging of DB message insertion #1893
Conversation
We had some discussions about this via PM and in the Profanity MUC. I'll try to add some more context.
The "file log", what we usually call text chat log files, is just a dump of everything we receive/send. There is no other logic there. It was our legacy system before we had the SQL database. We considered removing it but then decided that some people will like to have those chat logs for easy grepping of conversations.
So the assumption I made via PM was that we either have something not right with NULL checking and then will silently not insert the message to the DB. Or that our
I'm not exactly sure what you mean here. Right now MAM doesn't insert messages twice.
I'm surprised they just insert. It is a long time ago but I believe that I either checked Gajim/Dino code when I implemented this or talked with one of the devs about it. Now about why we had this check in the first place:
If we have the same ID several times in the DB how do we know which one to correct? How do other clients do this? Just support these features if the ID is only present ONCE and otherwise not offer the feature for this message? Instead of aborting to insert duplicate IDs just give them a dummy ID and thus also not allow correction (and other features) on those? One thing that we should do for sure is: if we check, then we shouldn't just check the ID but also the sender. What we should do, and the questions we need to answer:
About this PR: |
8a7b609
to
15bd03d
Compare
/uniquestanzas
command to improve reliability of DB message history
Now noticable behavioural change (letting non-unique stanzas) comes with non-default setting, so it can be a question of preference. But duplicate stanzas are logged for everyone, so we can have more diagnostic data. |
I talked with someone about this. With things like Last Message Correction we just use the newest/latest message with that ID. (And maybe try to find out what other clients do here). Would you mind changing the PR to this behaviour? |
BTW keep in mind to check |
It might lead to flood in the console in case if, for example, server sets all stanza-ids to "0". We can just warn, I think that it would be a preferable behaviour.
That can be done
Surely, but you didn't mention whether setting is still needed. I think I can change default and keep it, just in case if someone has a problem with duplicate messages. |
That would be a critical server-side bug, breaking history/device sync and whatnot, comparable with, dunno, dropping stanzas. I think a client should complain as loud as possible in that case. |
Not needed. |
@jubalh can we discuss some changes in a chat? |
I'm on vacation and only check GitHub sporadically. |
The problem is about replacement. As it is right now quite complicated SQL query: auto_sqlite gchar* query = sqlite3_mprintf("SELECT * FROM (SELECT COALESCE(B.`message`, A.`message`) AS message, A.`timestamp`, A.`from_jid`, A.`type`, A.`encryption` from `ChatLogs` AS A LEFT JOIN `ChatLogs` AS B ON A.`stanza_id` = B.`replace_id` WHERE A.`replace_id` = '' AND ((A.`from_jid` = '%q' AND A.`to_jid` = '%q') OR (A.`from_jid` = '%q' AND A.`to_jid` = '%q')) AND A.`timestamp` < '%q' AND (%Q IS NULL OR A.`timestamp` > %Q) ORDER BY A.`timestamp` %s LIMIT %d) ORDER BY `timestamp` %s;", contact_barejid, myjid->barejid, myjid->barejid, contact_barejid, end_date_fmt, start_time, start_time, sort1, MESSAGES_TO_RETRIEVE, sort2); Even right now it has bugs: if you replace the same message twice or more, it will replace it visually, but after loading history, it will show only the first correction. What we want to do is barely within a scope of an SQL query (at least not in a way it's implemented right now), as we need to select all the replacement and apply them only in the first case. I see a few ways to implement it, but none of them are to my liking. As an alternative solution, we can use "replaces_id", which would be linked to the DB incremental ID or limit replacement capabilities to only the most recent messages. All the proposed changes would either add complexity to the query and make it slower, either require further development, changes in DB etc. My proposal is to add diagnostic part for now, possibly with removing the setting and after further studying the problem, we can understand its underlying causes and fix directly them, rather than relying on major changes to potentially fix a problem. |
Added logging, uniqueness checking logic remains the same. For details see profanity-im#1893 discussion
15bd03d
to
18ae147
Compare
Great find! I propose to solve this in another PR since, as I understand it, it's not related to DB but to presentation. Please improve commit message a bit. We are not refactoring anything as far as I can see :) I would also mention why we don't want those duplicate IDs (in the commit message) and maybe even a short commit above the check in the code (something like I didn't check the SQL code deeply (yet) but code looks generally fine I think. |
@jubalh , please, contact me ASAP, I have important findings. |
/uniquestanzas
command to improve reliability of DB message history
As mentioned I'm on vacation. I now specifically went to a place where I can access my chat client.. And I see that you haven't written anything to me and even didn't answer to my last messages there.. |
Sorry, missed the notifications. Anyways, I answered and we discussed the issue and came to a conclusion. |
Add logging for cases when the message is not being inserted in the DB due to its ID. Make diagnostic of unreliable message history much easier, lay the groundwork for a fully reliable message history. Further changes might include insertion of messages with non-unique stanza-id, but as it's a big change with plentiful issues, decision was made to further investigate potential causes of history unreliability. For details see profanity-im#1893 discussion
18ae147
to
403f924
Compare
Originally I believed we will change the behaviour of the db logging in this PR. But I guess just adding some logging is ok as well. |
When we received a message correction via `XEP-0308: Last Message Correction` we accepted the change without checking the sender making it possible for anybody to replace the message if the ID was known. This change has been proposed by @jubalh profanity-im#1893 (comment)
When we received a message correction via `XEP-0308: Last Message Correction` we accepted the change without checking the sender making it possible for anybody to replace the message if the ID was known. This change has been proposed by @jubalh profanity-im#1893 (comment)
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Performance boost is noticeable. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Performance boost is noticeable. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp` column as well. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Write messages with duplicate stanza IDs in the DB and allow LMC for them. Index `timestamp` column as well. Further details are available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Write messages with duplicate stanza IDs in the DB and allow LMC for them. Index `timestamp` column as well. Further details are available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. The structure despite being unintuitive allows better maintainability, **performance** and data integrity. The new way is to replace original message, while keeping the original in LMC messages, this way we can preserve full edit history and avoid wasteful JOINs on every scroll up. Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Write messages with duplicate stanza IDs in the DB and allow LMC for them. Index `timestamp` column as well. Further details are available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Please, backup your DB before performing any testing with changes in code. Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Now message LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from the any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
**Please, backup your DB before performing any testing.** Introduce new DB structure and DB migration mechanism. Index `timestamp`, `to_jid`, `from_jid` columns to improve performance. Add trigger for `replaced_by_db_id` calculation by DB on message insert. Now LMC messages are interconnected with original messages, this way we have fast access to last (hence correct) applicable edits, as well as reference to the original message from any edit (in case of chained edits). Change the way LMC messages are being displayed. Now we check if we can replace a message from current buffer. If we don't have a message in the buffer, it might've been lost, but we can still display it as a new message. Further information available here: profanity-im#1893 profanity-im#1899 profanity-im#1902
Addressing unreliable message history reports where some messages were written in the file log but not in the DB. This issue was not consistently reproducible, but investigations revealed potential causes.
WHERE
clause in the insert query, preventing message insertion.To resolve this, I propose:
WHERE
clause from the insert query, allowing messages with duplicate IDs or stanza IDs to be inserted.I also am planning to offer an optional setting in the future to handle duplicate IDs or stanza IDs differently, to resolve potential duplicate messages issues which might happen theoretically because of MAM. But the exact shape of the option depends on many things. Anyway, I see duplicate messages as a lesser evil comparing to unreliable history.
This is a prototype solution and will require feedback and refinement.
This change aims to improve message history reliability, especially for users who rely on logs for critical information.
The solution is based on the way that other clients handle such cases (by ignoring duplication of stanza-id).
https://dev.gajim.org/gajim/gajim/-/blob/master/gajim/common/modules/message.py?ref_type=heads#L224
I ask @jubalh to provide more context from MUC (since I lost MUC history).