Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Security Update and Enhancement for run.py #264

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

MiChaelinzo
Copy link

@MiChaelinzo MiChaelinzo commented Mar 21, 2024

The key changes:

  • 1.) Validate checkpoint integrity by comparing hashes
  • 2.) Add rate limiting on inferences
  • 3.) Use authentication for any inference endpoints
  • 4.) Other general security best practices
    This helps secure the checkpoint loading, limits blast radius of any issues, and adds authentication around the API access.

The key changes:

Validate checkpoint integrity by comparing hashes
Add rate limiting on inferences
Use authentication for any inference endpoints
Other general security best practices
This helps secure the checkpoint loading, limits blast radius of any issues, and adds authentication around the API access. Let me know if you have any other questions!
@MiChaelinzo MiChaelinzo changed the title Update run.py Security Update and Enhancement for run.py Mar 21, 2024
Copy link

@Aareon Aareon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unsuitable for merging as is. Formatting issues throughout, as well as an incomplete stub for authentication.

run.py Show resolved Hide resolved
run.py Outdated
@@ -53,7 +63,10 @@ def main():
model=grok_1_model,
bs_per_device=0.125,
checkpoint_path=CKPT_PATH,
# Limit inference rate
inference_runner.rate_limit = 100
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Formatting here as well

run.py Show resolved Hide resolved
def validate_checkpoint(path, expected_hash):
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
if calculated_hash != expected_hash:
raise ValueError("Invalid checkpoint file!")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could this error message be improved? It might also be nice to utilize logging

Copy link
Author

@MiChaelinzo MiChaelinzo Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import logging

# Set up logging
logger = logging.getLogger(__name__)

def validate_checkpoint(path: Text, expected_hash: Text):

  with open(path, 'rb') as f:
    contents = f.read()

  calculated_hash = hashlib.sha256(contents).hexdigest()

  if calculated_hash != expected_hash:
    logger.error(f"Invalid checkpoint file. Expected hash: {expected_hash}, " 
                 f"Actual hash: {calculated_hash}")
    raise ValueError("Checkpoint validation failed")

The key changes:

  • Imported the logging module and created a logger object

  • Logged an error with the expected and actual hash values for more detail

  • Updated the exception message to be more specific

This makes it clear in the logs when a validation failure happens and provides the expected and actual hashes for diagnostics.

Other enhancements could include:

  • Adding the checkpoint path to the log message
  • Logging at INFO level when validation succeeds
  • Configuring logging to output to a file for production debugging

It would make the code longer etc. is this necessary?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please utilize Flake8 as well as some standardized code formatter. I'm noticing many inconsistencies in code you submit. There's no problem that I notice with your usage of logging. You just have not submitted an actual commit with fixes for file context management.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

100% convinced this user is just repeating garbage from an LLM.



def validate_checkpoint(path, expected_hash):
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use a context manager with for opening and reading the given path. It might also be in our best interest to utilize type hints in the function signature.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import Text

import hashlib

CKPT_HASH = "expected_checkpoint_hash"

def validate_checkpoint(path: Text, expected_hash: Text):
  with open(path, 'rb') as f:
    contents = f.read()

  calculated_hash = hashlib.sha256(contents).hexdigest()
  
  if calculated_hash != expected_hash:
    raise ValueError("Invalid checkpoint file!")

The key changes:

  • Added type hints for the path (Text) and expected_hash (Text) parameters.

  • Opened the file using a with statement, which automatically closes it when done.

  • Stored the file contents in a variable called 'contents' to avoid re-reading the file.

  • Passed the contents variable to hashlib.sha256 rather than the file object.

It seems we would need to import text from typing is this necessary?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Text import is superfluous and could just as easily be replaced with str without importing any extra type hints.

It looks like there were some formatting issues in the code. I've taken the liberty of re-formatting it to be more readable.
Copy link
Author

@MiChaelinzo MiChaelinzo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sent some code sample to review.

def validate_checkpoint(path, expected_hash):
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
if calculated_hash != expected_hash:
raise ValueError("Invalid checkpoint file!")
Copy link
Author

@MiChaelinzo MiChaelinzo Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

import logging

# Set up logging
logger = logging.getLogger(__name__)

def validate_checkpoint(path: Text, expected_hash: Text):

  with open(path, 'rb') as f:
    contents = f.read()

  calculated_hash = hashlib.sha256(contents).hexdigest()

  if calculated_hash != expected_hash:
    logger.error(f"Invalid checkpoint file. Expected hash: {expected_hash}, " 
                 f"Actual hash: {calculated_hash}")
    raise ValueError("Checkpoint validation failed")

The key changes:

  • Imported the logging module and created a logger object

  • Logged an error with the expected and actual hash values for more detail

  • Updated the exception message to be more specific

This makes it clear in the logs when a validation failure happens and provides the expected and actual hashes for diagnostics.

Other enhancements could include:

  • Adding the checkpoint path to the log message
  • Logging at INFO level when validation succeeds
  • Configuring logging to output to a file for production debugging

It would make the code longer etc. is this necessary?



def validate_checkpoint(path, expected_hash):
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from typing import Text

import hashlib

CKPT_HASH = "expected_checkpoint_hash"

def validate_checkpoint(path: Text, expected_hash: Text):
  with open(path, 'rb') as f:
    contents = f.read()

  calculated_hash = hashlib.sha256(contents).hexdigest()
  
  if calculated_hash != expected_hash:
    raise ValueError("Invalid checkpoint file!")

The key changes:

  • Added type hints for the path (Text) and expected_hash (Text) parameters.

  • Opened the file using a with statement, which automatically closes it when done.

  • Stored the file contents in a variable called 'contents' to avoid re-reading the file.

  • Passed the contents variable to hashlib.sha256 rather than the file object.

It seems we would need to import text from typing is this necessary?

if __name__ == "__main__":
logging.basicConfig(level=logging.INFO)
main()
main()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 space indent is not standard. Please view PEP8

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using 1 space, and you should comment that to the original repo, you're making our lives very complicated enough with your reviews that doesn't make any sense at all!

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm using 1 space, and you should comment that to the original repo, you're making our lives very complicated enough with your reviews that doesn't make any sense at all!

This change is not accepted and requires fixing. 1 space indent is not standard and worsens readability and consistently in all code affected.

Not accepted.

def validate_checkpoint(path, expected_hash):
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
if calculated_hash != expected_hash:
raise ValueError("Invalid checkpoint file!")
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please utilize Flake8 as well as some standardized code formatter. I'm noticing many inconsistencies in code you submit. There's no problem that I notice with your usage of logging. You just have not submitted an actual commit with fixes for file context management.



def validate_checkpoint(path, expected_hash):
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the Text import is superfluous and could just as easily be replaced with str without importing any extra type hints.

@Aareon
Copy link

Aareon commented Mar 27, 2024

The initial review is still out for fixing, as the problem is still present i.e.

def validate_checkpoint(path, expected_hash):
  calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()

Two problems with the above code. No handling for closing the file context, and inconsistent styling.

def validate_checkpoint(path, expected_hash):
    try:  # handle issues with opening the path, as well as hashing
        with open(path, "rb") as f:
            calculated_hash = hashlib.sha256(f.read()).hexdigest()
            ...  # rest of the inner function
    except Exception as e:  # refine this exception (OSError, FileNotFoundError, etc.)
        logging.error(f"Failed to validate checkpoint path {path}: {e}")
        raise

@MiChaelinzo MiChaelinzo requested a review from Aareon April 3, 2024 15:38
@Aareon
Copy link

Aareon commented Apr 5, 2024

You requested a new review but have not addressed the previous review. Please correct the issues described and request a review.

@Aareon
Copy link

Aareon commented Apr 5, 2024

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

@MiChaelinzo
Copy link
Author

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code.

@Aareon
Copy link

Aareon commented Apr 6, 2024

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code.

You cannot delete reviews. Your PR also does nothing for bettering the security of the codebase, nor the maintainability.
Bad PR is bad. Close and try again.

@MiChaelinzo
Copy link
Author

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code.

You cannot delete reviews. Your PR also does nothing for bettering the security of the codebase, nor the maintainability. Bad PR is bad. Close and try again.

Alright, thanks also your review is not that good etc. that's why you don't get paid enough you should stay in your lane.

Copy link

@Aareon Aareon left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do not merge



def validate_checkpoint(path, expected_hash):
calculated_hash = hashlib.sha256(open(path, 'rb').read()).hexdigest()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still left to fix. Calling open outside of a context manager is bad practice and not recommended for production.

# Validate checkpoint integrity
validate_checkpoint(CKPT_PATH, CKPT_HASH)

grok_1_model = LanguageModelConfig(
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only change here is a dedent from the PEP8 standard 4 space indent.

bs_per_device=0.125,
checkpoint_path=CKPT_PATH,
# Limit inference rate
inference_runner.rate_limit = 100
Copy link

@Aareon Aareon Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This appears to reference inference_runner.rate_limit before it is defined.


name="local",
load=CKPT_PATH,
tokenizer_path="./tokenizer.model",
Copy link

@Aareon Aareon Apr 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you were to improve anything, I'd suggest improving how file paths are defined by utilizing pathlib

run.py Show resolved Hide resolved
logging.basicConfig(level=logging.INFO)
main()
logging.basicConfig(level=logging.INFO)
main()
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, a complete waste of a PR. Nothing of value was added.

@Aareon
Copy link

Aareon commented Apr 6, 2024

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

@Aareon
Copy link

Aareon commented Apr 6, 2024

I recommend closing this PR as the changes made are not well implemented, has formatting issues throughout, and does not utilize known Python idioms well. The contributor has not addressed the reviews.

I'm trying to delete your reviews since it doesn't make sense at all, are you using an AI or just blabbing nonsense? I tried using an AI with your comment too, it said your review doesn't make sense and making hard people trying to contribute for better security/code.

You cannot delete reviews. Your PR also does nothing for bettering the security of the codebase, nor the maintainability. Bad PR is bad. Close and try again.

Alright, thanks also your review is not that good etc. that's why you don't get paid enough you should stay in your lane.

My pay is acceptable. This PR is not.

@Aareon
Copy link

Aareon commented Apr 6, 2024

I'm going to go ahead and unsubscribe to updates from this PR as the user is using abusive language because of the reviews that this PR has received.

@MiChaelinzo
Copy link
Author

MiChaelinzo commented Apr 6, 2024

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing AI or something.

@MiChaelinzo
Copy link
Author

I'm going to go ahead and unsubscribe to updates from this PR as the user is using abusive language because of the reviews that this PR has received.

Good, thanks for the review.

@Aareon
Copy link

Aareon commented Apr 6, 2024

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing God or something.

Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional.

@MiChaelinzo
Copy link
Author

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing God or something.

Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional.

You don't know me and I made more projects than you can count with your fingers in the past years. Your reviews are professionally time-consuming and pretty useless etc. as some can confirm in the PR. It's professional but it doesn't do anything, or modifying it add more code. Also you haven't worked above board/ become an investor or make a hard decision to fire people. You think you can keep your job? tech industry are offloading, you are looking for new jobs.

@Aareon
Copy link

Aareon commented Apr 7, 2024

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing God or something.

Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional.

You don't know me and I made more projects than you can count with your fingers in the past years. Your reviews are professionally time-consuming and pretty useless etc. as some can confirm in the PR. It's professional but it doesn't do anything, or modifying it add more code. Also you haven't worked above board/ become an investor or make a hard decision to fire people. You think you can keep your job? tech industry are offloading, you are looking for new jobs.

No idea what you're on about. But your code smells so whatever it is you're selling, I'm not buying.

@MiChaelinzo
Copy link
Author

@Aareon Looing at your reviews at other PR's we can smell your comments/reviews miles away you need to take a long week bath. Also, you'll be out of jobs and internet anyway...

Who is we? Contributors with the barest grasp on Python that have access to ChatGPT? Oh no...

I haven't used ChatGPT in here or in months, you need to find a job. Rather than reviewing like you think your a Python All-knowing God or something.

Your attitude shows you've never worked professionally in software development. These PRs are public and will be seen by future recruiters, just so you know. I'd suggest a change in attitude, because my reviews were objective and professional.

You don't know me and I made more projects than you can count with your fingers in the past years. Your reviews are professionally time-consuming and pretty useless etc. as some can confirm in the PR. It's professional but it doesn't do anything, or modifying it add more code. Also you haven't worked above board/ become an investor or make a hard decision to fire people. You think you can keep your job? tech industry are offloading, you are looking for new jobs.

No idea what you're on about. But your code smells so whatever it is you're selling, I'm not buying.

I'm not selling anything I'm just doing a pull request and I already reviewed my code in the latest LLM's from AWS, I don't know why you are reviewing it like you're an AI more accurate or can give more than this LLM's.

@Aareon
Copy link

Aareon commented Apr 7, 2024

I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness.

@MiChaelinzo
Copy link
Author

I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness.

Your python coding is excellent but your 10 years' experience is nothing against the LLM's, you make mistake etc. your not perfect even the AI's/LLM's, my attitude is not wrong for complaining for your spamming reviews. And I didn't generate the code, stub X devs could use it for their future integrations/API. If I'm ever hiring people for a blockchain project I wouldn't hire you seem to would take several years before the project is finished/done. Companies are laying off because they are seeing the inefficiency of human employees.

@Aareon
Copy link

Aareon commented Apr 7, 2024

I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness.

Your python coding is excellent but your 10 years' experience is nothing against the LLM's, you make mistake etc. your not perfect even the AI's/LLM's, my attitude is not wrong for complaining for your spamming reviews. And I didn't generate the code, stub X devs could use it for their future integrations/API. If I'm ever hiring people for a blockchain project I wouldn't hire you seem to would take several years before the project is finished/done. Companies are laying off because they are seeing the inefficiency of human employees.

No worries there, I'm selective when it comes to employers, and you're not one I'd ever consider working for. LLMs are a thousand miles away from replacing software engineers. Also, define "spamming reviews". What's the point of a PR if it's not an implementation. This project has absolutely no need for stub functions in production. That's just bad practice. You would know that if you had any wherewithall. Who cares how long it takes to develop something? I'm one person, and I work on things as I see fit.

@MiChaelinzo
Copy link
Author

I have 10+ years of Python experience. I don't need an LLM to review code. I use Python in my workplace daily. LLM's are not infallible, and have been known to generate bad code. I could tell from the start that you used something to generate code for you, because there wasn't a single bit of code in your entire PR that didn't smell like an LLM. You literally pushed a stub function and called it an implementation. Get out of here with that, dude. No project maintainer wants someone who can't even read code to push garbage spewed out by an LLM. And your attitude on top of everything is the icing. You have zero self awareness.

Your python coding is excellent but your 10 years' experience is nothing against the LLM's, you make mistake etc. your not perfect even the AI's/LLM's, my attitude is not wrong for complaining for your spamming reviews. And I didn't generate the code, stub X devs could use it for their future integrations/API. If I'm ever hiring people for a blockchain project I wouldn't hire you seem to would take several years before the project is finished/done. Companies are laying off because they are seeing the inefficiency of human employees.

No worries there, I'm selective when it comes to employers, and you're not one I'd ever consider working for. LLMs are a thousand miles away from replacing software engineers. Also, define "spamming reviews". What's the point of a PR if it's not an implementation. This project has absolutely no need for stub functions in production. That's just bad practice. You would know that if you had any wherewithall. Who cares how long it takes to develop something? I'm one person, and I work on things as I see fit.

I don't know about not replacing jobs but it seems they are replacing a lot of jobs recently., we hold several millions on mutual funds in banks/malls/tech/real-estate back home, AI/AGI seems very efficiency that's why investors give them millions/billions of dollars, also WEF plans and sees this too. Even Sam Altman going to the government of UAE asking for trillions of dollars he should go to Qatar trillion here is available.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants