Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Duplicates #1

Open
qjhart opened this issue Jul 26, 2017 · 1 comment
Open

Remove Duplicates #1

qjhart opened this issue Jul 26, 2017 · 1 comment
Assignees

Comments

@qjhart
Copy link
Owner

qjhart commented Jul 26, 2017

There are three duplicate codes in the current county_codes.csv file. These need to be removed. I think they are in as seperate country names in the wikidata search.

@qjhart qjhart self-assigned this Jul 26, 2017
@qjhart
Copy link
Owner Author

qjhart commented Aug 6, 2021

The codes are:
FK,NC,NL,YUCS

diff  <(cut -d, -f 1 country_codes.csv | sort) <(cut -d, -f 1 country_codes.csv | sort -u)
84d83
< FK
177d175
< NC
183d180
< NL
266d262
< YUCS

One is exactly Equivalent:

 diff  <(cat country_codes.csv | sort) <(cat country_codes.csv | sort -u)
177d176
< NC,New Caledonia,NCL,1853,,"<?xml version=""1.0"" encoding=""UTF-8""?><svg xmlns=""http://www.w3.org/2000/svg"" width=""900"" height=""600""><rect width=""900"" height=""600"" fill=""#ED2939""/><rect width=""600"" height=""600"" fill=""#fff""/><rect width=""300"" height=""600"" fill=""#002395""/></svg>"

One other shares the same country name as well:

diff  <(cut -d, -f 1,2 country_codes.csv | sort) <(cut -d, -f 1,2 country_codes.csv | sort -u)

84d83
< FK,Falkland Islands
176d174
< NC,New Caledonia

And Two others share the same code

diff  <(cut -d, -f 1,3 country_codes.csv | sort) <(cut -d, -f 1,3 country_codes.csv | sort -u)
177d176
< NC,NCL
183d181
< NL,NLD
266d263
< YUCS,YUG

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant