Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Watermark detection model #2

Open
robvanvolt opened this issue Sep 15, 2022 · 2 comments
Open

Watermark detection model #2

robvanvolt opened this issue Sep 15, 2022 · 2 comments
Assignees
Labels
question Further information is requested

Comments

@robvanvolt
Copy link

robvanvolt commented Sep 15, 2022

Hey there!

It's Robert from LAION! Congratulations on this really interesting dataset release!

I was just wondering if it was possible for you to release details on your internal watermark detection model, or even the model itself: "watermark_score | float | The watermark probability of the image by our internal model"

We released our model here and would appreciate potential improvements for our model or incorporations of interesting techniques from your side!:)

Best,

Robert

@beomheepark
Copy link
Contributor

beomheepark commented Sep 16, 2022

Hello Robert!

Thanks for your interest in the COYO dataset. I'm Brook, a member of the COYO team.

First, we needed a model that can detect watermarks not only in the COYO dataset but also in the images generated by our image generation model (to be released soon). We tried to utilize the public model with good performance, but there were hardly any. As far as I know, the model you released was the only one. However, after evaluating internally, we decided that even this model was not suitable for us. (Because, as you know, the criteria for "watermark" are very vague.) Therefore, we needed to train a new model, and the details of the training are summarized below.

The following three types of datasets were used for training and evaluation; 1) Public watermark dataset (you released), 2) dataset collected from our image generation model, 3) dataset composed of watermark images from stock image sites (e.g. shutterstock) and non-watermark images from OpenImages. For the classification model, RegNetY 16GF pre-trained on ImageNet-21K was selected and trained since it was suitable in terms of the trade-off between accuracy and speed. The evaluation was performed on each data set separately, and there was a performance improvement of about 10% for all data sets compared to your published model. (the performance gap is thought to be simply caused by the increase in data and model size.)

It seems that it would be difficult to release the dataset used for training due to the license issue. However, I think that the model and code for evaluation can be sufficiently released, and I would like to discuss it with the team and release it. I am very impressed by the many contents that LAION has released. Thanks again for your hard work and interest. If you have any additional questions or comments, please feel free to reply.

Best,
Brook

@mwbyeon mwbyeon added the question Further information is requested label Sep 16, 2022
@robvanvolt
Copy link
Author

Thank you for the detailed answer - it would be really amazing if you discussed that matter with your team and came to a positive result and releasing the code and architecture of your watermark detection model!

Looking forward to hearing from you again!

Best,

Robert

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

3 participants