-
Notifications
You must be signed in to change notification settings - Fork 8.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Security Update and Enhancement for run.py #264
base: main
Are you sure you want to change the base?
Conversation
The key changes: Validate checkpoint integrity by comparing hashes Add rate limiting on inferences Use authentication for any inference endpoints Other general security best practices This helps secure the checkpoint loading, limits blast radius of any issues, and adds authentication around the API access. Let me know if you have any other questions!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is unsuitable for merging as is. Formatting issues throughout, as well as an incomplete stub for authentication.
run.py
Outdated
@@ -53,7 +63,10 @@ def main(): | |||
model=grok_1_model, | |||
bs_per_device=0.125, | |||
checkpoint_path=CKPT_PATH, | |||
# Limit inference rate | |||
inference_runner.rate_limit = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Formatting here as well
def validate_checkpoint(path, expected_hash): | ||
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest() | ||
if calculated_hash != expected_hash: | ||
raise ValueError("Invalid checkpoint file!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this error message be improved? It might also be nice to utilize logging
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import logging
# Set up logging
logger = logging.getLogger(__name__)
def validate_checkpoint(path: Text, expected_hash: Text):
with open(path, 'rb') as f:
contents = f.read()
calculated_hash = hashlib.sha256(contents).hexdigest()
if calculated_hash != expected_hash:
logger.error(f"Invalid checkpoint file. Expected hash: {expected_hash}, "
f"Actual hash: {calculated_hash}")
raise ValueError("Checkpoint validation failed")
The key changes:
-
Imported the logging module and created a logger object
-
Logged an error with the expected and actual hash values for more detail
-
Updated the exception message to be more specific
This makes it clear in the logs when a validation failure happens and provides the expected and actual hashes for diagnostics.
Other enhancements could include:
- Adding the checkpoint path to the log message
- Logging at INFO level when validation succeeds
- Configuring logging to output to a file for production debugging
It would make the code longer etc. is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please utilize Flake8 as well as some standardized code formatter. I'm noticing many inconsistencies in code you submit. There's no problem that I notice with your usage of logging
. You just have not submitted an actual commit with fixes for file context management.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
100% convinced this user is just repeating garbage from an LLM.
|
||
|
||
def validate_checkpoint(path, expected_hash): | ||
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use a context manager with
for opening and reading the given path. It might also be in our best interest to utilize type hints in the function signature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from typing import Text
import hashlib
CKPT_HASH = "expected_checkpoint_hash"
def validate_checkpoint(path: Text, expected_hash: Text):
with open(path, 'rb') as f:
contents = f.read()
calculated_hash = hashlib.sha256(contents).hexdigest()
if calculated_hash != expected_hash:
raise ValueError("Invalid checkpoint file!")
The key changes:
-
Added type hints for the path (Text) and expected_hash (Text) parameters.
-
Opened the file using a with statement, which automatically closes it when done.
-
Stored the file contents in a variable called 'contents' to avoid re-reading the file.
-
Passed the contents variable to hashlib.sha256 rather than the file object.
It seems we would need to import text from typing is this necessary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Text
import is superfluous and could just as easily be replaced with str
without importing any extra type hints.
It looks like there were some formatting issues in the code. I've taken the liberty of re-formatting it to be more readable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sent some code sample to review.
def validate_checkpoint(path, expected_hash): | ||
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest() | ||
if calculated_hash != expected_hash: | ||
raise ValueError("Invalid checkpoint file!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
import logging
# Set up logging
logger = logging.getLogger(__name__)
def validate_checkpoint(path: Text, expected_hash: Text):
with open(path, 'rb') as f:
contents = f.read()
calculated_hash = hashlib.sha256(contents).hexdigest()
if calculated_hash != expected_hash:
logger.error(f"Invalid checkpoint file. Expected hash: {expected_hash}, "
f"Actual hash: {calculated_hash}")
raise ValueError("Checkpoint validation failed")
The key changes:
-
Imported the logging module and created a logger object
-
Logged an error with the expected and actual hash values for more detail
-
Updated the exception message to be more specific
This makes it clear in the logs when a validation failure happens and provides the expected and actual hashes for diagnostics.
Other enhancements could include:
- Adding the checkpoint path to the log message
- Logging at INFO level when validation succeeds
- Configuring logging to output to a file for production debugging
It would make the code longer etc. is this necessary?
|
||
|
||
def validate_checkpoint(path, expected_hash): | ||
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from typing import Text
import hashlib
CKPT_HASH = "expected_checkpoint_hash"
def validate_checkpoint(path: Text, expected_hash: Text):
with open(path, 'rb') as f:
contents = f.read()
calculated_hash = hashlib.sha256(contents).hexdigest()
if calculated_hash != expected_hash:
raise ValueError("Invalid checkpoint file!")
The key changes:
-
Added type hints for the path (Text) and expected_hash (Text) parameters.
-
Opened the file using a with statement, which automatically closes it when done.
-
Stored the file contents in a variable called 'contents' to avoid re-reading the file.
-
Passed the contents variable to hashlib.sha256 rather than the file object.
It seems we would need to import text from typing is this necessary?
if __name__ == "__main__": | ||
logging.basicConfig(level=logging.INFO) | ||
main() | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 space indent is not standard. Please view PEP8
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using 1 space, and you should comment that to the original repo, you're making our lives very complicated enough with your reviews that doesn't make any sense at all!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm using 1 space, and you should comment that to the original repo, you're making our lives very complicated enough with your reviews that doesn't make any sense at all!
This change is not accepted and requires fixing. 1 space indent is not standard and worsens readability and consistently in all code affected.
Not accepted.
def validate_checkpoint(path, expected_hash): | ||
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest() | ||
if calculated_hash != expected_hash: | ||
raise ValueError("Invalid checkpoint file!") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please utilize Flake8 as well as some standardized code formatter. I'm noticing many inconsistencies in code you submit. There's no problem that I notice with your usage of logging
. You just have not submitted an actual commit with fixes for file context management.
|
||
|
||
def validate_checkpoint(path, expected_hash): | ||
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think the Text
import is superfluous and could just as easily be replaced with str
without importing any extra type hints.
The initial review is still out for fixing, as the problem is still present i.e. def validate_checkpoint(path, expected_hash):
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest() Two problems with the above code. No handling for closing the file context, and inconsistent styling. def validate_checkpoint(path, expected_hash):
try: # handle issues with opening the path, as well as hashing
with open(path, "rb") as f:
calculated_hash = hashlib.sha256(f.read()).hexdigest()
... # rest of the inner function
except Exception as e: # refine this exception (OSError, FileNotFoundError, etc.)
logging.error(f"Failed to validate checkpoint path {path}: {e}")
raise |
You requested a new review but have not addressed the previous review. Please correct the issues described and request a review. |
I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews. |
I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code. |
You cannot delete reviews. Your PR also does nothing for bettering the security of the codebase, nor the maintainability. |
Alright, thanks also your review is not that good etc. that's why you don't get paid enough you should stay in your lane. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do not merge
|
||
|
||
def validate_checkpoint(path, expected_hash): | ||
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is still left to fix. Calling open
outside of a context manager is bad practice and not recommended for production.
# Validate checkpoint integrity | ||
validate_checkpoint(CKPT_PATH, CKPT_HASH) | ||
|
||
grok_1_model = LanguageModelConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only change here is a dedent from the PEP8 standard 4 space indent.
bs_per_device=0.125, | ||
checkpoint_path=CKPT_PATH, | ||
# Limit inference rate | ||
inference_runner.rate_limit = 100 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This appears to reference inference_runner.rate_limit
before it is defined.
|
||
name="local", | ||
load=CKPT_PATH, | ||
tokenizer_path="./tokenizer.model", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you were to improve anything, I'd suggest improving how file paths are defined by utilizing pathlib
logging.basicConfig(level=logging.INFO) | ||
main() | ||
logging.basicConfig(level=logging.INFO) | ||
main() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall, a complete waste of a PR. Nothing of value was added.
Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no... |
My pay is acceptable. This PR is not. |
I'm going to go ahead and unsubscribe to updates from this PR as the user is using abusive language because of the reviews that this PR has received. |
I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing AI or something. |
Good, thanks for the review. |
Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional. |
You don't know me and I made more projects than you can count with your fingers in the past years. Your reviews are professionally time-consuming and pretty useless etc. as some can confirm in the PR. It's professional but it doesn't do anything, or modifying it add more code. Also you haven't worked above board/ become an investor or make a hard decision to fire people. You think you can keep your job? tech industry are offloading, you are looking for new jobs. |
No idea what you're on about. But your code smells so whatever it is you're selling, I'm not buying. |
I'm not selling anything I'm just doing a pull request and I already reviewed my code in the latest LLM's from AWS, I don't know why you are reviewing it like you're an AI more accurate or can give more than this LLM's. |
I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness. |
Your python coding is excellent but your 10 years' experience is nothing against the LLM's, you make mistake etc. your not perfect even the AI's/LLM's, my attitude is not wrong for complaining for your spamming reviews. And I didn't generate the code, stub X devs could use it for their future integrations/API. If I'm ever hiring people for a blockchain project I wouldn't hire you seem to would take several years before the project is finished/done. Companies are laying off because they are seeing the inefficiency of human employees. |
No worries there, I'm selective when it comes to employers, and you're not one I'd ever consider working for. LLMs are a thousand miles away from replacing software engineers. Also, define "spamming reviews". What's the point of a PR if it's not an implementation. This project has absolutely no need for stub functions in production. That's just bad practice. You would know that if you had any wherewithall. Who cares how long it takes to develop something? I'm one person, and I work on things as I see fit. |
I don't know about not replacing jobs but it seems they are replacing a lot of jobs recently., we hold several millions on mutual funds in banks/malls/tech/real-estate back home, AI/AGI seems very efficiency that's why investors give them millions/billions of dollars, also WEF plans and sees this too. Even Sam Altman going to the government of UAE asking for trillions of dollars he should go to Qatar trillion here is available. |
The key changes:
This helps secure the checkpoint loading, limits blast radius of any issues, and adds authentication around the API access.