Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid copyright not detected #3659

Open
pombredanne opened this issue Feb 19, 2024 · 8 comments · May be fixed by #3718
Open

Invalid copyright not detected #3659

pombredanne opened this issue Feb 19, 2024 · 8 comments · May be fixed by #3718

Comments

@pombredanne
Copy link
Member

[C] The Regents of the University of Michigan and Merit Network, Inc. 1992, 1993, 1994, 1995 All Rights Reserved is rare and not detected because [C] is not a valid copyright "sign"

We have a few other cases in https://github.com/search?q="Copyright+[C]"&type=code

The only sane resolution I can think of is to normalize these warts in text preparation:

  • replace [C] The Regents of the University by (C) The Regents of the University
  • replace Copyright [c] by Copyright (c) in all character cases.

[C] cannot be/is not a valid sign and this would otherwise trigger a badzillion of false positives as seen in https://github.com/search?q="[C]"&type=code (actually only millions, not badzillions)

@vaibhavyadav-dev
Copy link

I'm taking this one @pombredanne

@vaibhavyadav-dev
Copy link

@pombredanne If I'm right, I've just to make changes as you suggest or this require anything else, if anything else required you can tell me I'm beginner and want to make good contributions.

@pombredanne
Copy link
Member Author

@CaptainTron You could start by crafting the unit tests that fail for now
Then check this https://github.com/nexB/scancode-toolkit/blob/79aae3481833de80913383b2aa21fc8cdfb9813a/src/cluecode/copyrights.py#L3987

@vaibhavyadav-dev
Copy link

@CaptainTron You could start by crafting the unit tests that fail for now Then check this

https://github.com/nexB/scancode-toolkit/blob/79aae3481833de80913383b2aa21fc8cdfb9813a/src/cluecode/copyrights.py#L3987

@pombredanne can you elaborate a bit more, I'm not getting as of now, as what unit test to look for, I've gone through that line and doc, still I'm confused!.

@swastkk swastkk linked a pull request Apr 2, 2024 that will close this issue
6 tasks
@arshad-muhammad
Copy link

@pombredanne since the problem is not closed, i'm working on this.

@pombredanne
Copy link
Member Author

pombredanne commented Oct 3, 2024

@arshad-muhammad Thanks... You may want to start NOT from the develop/ branch but rather from this other branch that has many new improvements:

@arshad-muhammad
Copy link

Thank you for the guidance, @pombredanne I’ll switch to the misc-copyrights2 branch and review the improvements there. I’ll proceed with implementing the normalization of [C] to (C) and similar cases as discussed. I'll keep you updated on my progress.

arshad-muhammad added a commit to arshad-muhammad/scancode-toolkit that referenced this issue Oct 4, 2024
arshad-muhammad added a commit to arshad-muhammad/scancode-toolkit that referenced this issue Oct 4, 2024
@arshad-muhammad
Copy link

@pombredanne i did my first PR just let me know if its correct or where i went wrong.

arshad-muhammad added a commit to arshad-muhammad/scancode-toolkit that referenced this issue Oct 5, 2024
…ormalization to copyrights.py and unit tests passed
arshad-muhammad added a commit to arshad-muhammad/scancode-toolkit that referenced this issue Oct 5, 2024
…ormalization to copyrights.py and unit tests passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants