Skip to content

Latest commit

 

History

History
12 lines (6 loc) · 3.23 KB

DUPLICATES.md

File metadata and controls

12 lines (6 loc) · 3.23 KB

Duplicates

Because of how this database is created (by importing data from an anime database JSON file), the source data can change over time, for example anime can have a name change (a dot at the end of the title vs no dot) which can make it get picked up as a new anime on import of the anime database JSON file, and thus a duplicate entry has been created.

How to clean duplicates

Since the data changes over time, in general, when a duplicate entry is created, that generally means the old entry has been removed/changed, with the newer one being the only one in the anime database JSON file. Since the older entry does not exist in the anime database JSON file anymore, that generally means we should delete the older entry. Cleaning the database of duplicate entries is a manual process, becasue it should be a manual process in order to verify an entry truly is a duplicate. Once we are sure it is a duplicate, we have a command we can run to replace the old anime with a new anime, it is: php artisan app:merge-anime-duplicate oldAnimeId newAnimeId so it would look like php artisan app:merge-anime-duplicate 29164 40755, it will display the details for both IDs and confirm you wish to merge them into the new ID. Alternatively, there is a php artisan app:delete-anime animeId for deleting any anime that doesn't have any reviews and doesn't belong in any list to force it to be re-imported from scratch, although this may never end up being necessary it's still nice to have that option.

How to find duplicates

We have a command to find possible duplicate anime entries, it is: php artisan app:check-anime-duplicates and it generates 3 CSV files you can look through to find any possible duplicate anime entries. Keep in mind that the check-anime-duplicates command will not always find all duplicates, sometimes you will randomly find them through the regular anime search on the site and notice . Higher ID number in the SQL database usually means it's a newer anime, but not always. For example, I have seen an example of a higher SQL database ID anime having status ONGOING whereas a duplicate lower SQL database ID anime has status FINISHED, but the status ONGOING one with the higher ID has a higher quality picture. This means that SQL database ID is not reliable for identifying duplicates, and sometimes detailed manual checking has to be done. When in doubt, the best way seems to be manually checking the title and picture URL in the anime database JSON file and going with that one as the "latest" and the others would be considered duplicates and merged into the new one. When running the recommended import command, the anime descriptions and images should always be downloaded, both for existing and new anime, so it is generally considered fine to merge duplicate old anime entries into a new anime entry. Finally, it's worth mentioning that while duplicate anime is usually caused by the import process adding a new anime entry rather than updating an existing anime entry, it is also possible for the anime database JSON file to have duplicate entries itself, although this is usually quite rare, it's still worth double checking searching for keywords of a title and seeing if there are multiple duplicate entries in the anime database JSON file itself.