-
-
Notifications
You must be signed in to change notification settings - Fork 104
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't store request specific data into the Layer object #113
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just looking from my phone so not indepth code review yet. I made a comment and we probably should figure out if match
being two different shapes causes v8 bailouts or not and if so what to do. We can land this in 2.0 since folks reach into these layer parameters currently.
I just want clarify one thing: putting a secret in the URL is never acceptable. Every proxy layer, access log system, and tracing implementation will store those in plain text and often ship them off the instance to log aggregation systems. I am not opposed to the general idea of not keeping around information about completed requests, but if the purpose of this request is some "security" thing like it was tagged with when originally opened then I think we need to be clear that nothing in the URL is considered secure or secret. |
Putting a secret in a URL should be avoided when possible, but there are a lot of use cases where you don't have a choice. Any reset password or magic connection link sent by email for example will have a token in the URL, and there is no other way to do it. Even if there was no legitimate use cases for putting secrets in an URL, mitigating user mistakes (developers are a user of the lib) is always a good thing. I agree that this is a very low security concern, anyone having the capability to read a token from the express service would obviously have others means to do so at disposal. FYI, I was looking for memory leak in an application when I found this behaviour in express, as some new string / objects are stored on each request (replacing the previous one, so this is not a memory leak, it was just flagged by the memory profiler). |
This is an application design choice, and if you are going to make that choice then you need to be aware of the potential consequences. In this case, the consequences go far beyond Express keeping a reference around in memory. For the sake of this PR any security discussion points are straw man arguments. If an attacker has an RCE or access to the process memory then they have more than your individual users password reset link as well. This is just not at all a security concern for the express project even in a "protect users from themselves" sort of way. I am not saying we should not clean this up when the request is finished. Sorry for being so pedantic, we just don't want someone who doesn't know what is going on to see this and be concerned for no reason (or worse, file a CVE we need to fight). |
I should add, all of the above comments still need to be addressed to consider moving this forward. Please clean up all of the extra changes and slim this down to just the change to clean up the state. |
@wesleytodd, do you mean this comment?
I don't really see a way to clean the sate without breaking these use cases. An obvious way would be to just reset the state ( I'm not familiar with the express codebase so there might be something I'm missing here. |
EDIT: I didn't notice you had pushed some of the things which were requested to be changed. With this slimmed down part I see maybe those swaps were because you set
No I meant all the unrelated stuff like swapping
Yeah, I agree we might not be able to find a place to clear those out that is not a breaking change. Either way, I and other contributors are not going to spend a bunch of time helping find that while the PR could never be merged because of all the unrelated changes. Once this is slimmed down to just the changes you need to achieve the goal we can discuss options for landing it. |
return { | ||
path: path, | ||
params: {'0': decode_param(path)} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code has been around in this state for a very long time. My worry is that this new way creates garbage to collect (the new object added) on a hot path, which might be the reason it was done this way in the first place. Maybe @dougwilson will remember because this predates me by quite a bit. Without more context I would like to see us do some sort of performance pass on this since I don't think the feature addition would be worth it if we impacted performance in a negative way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A big part was preventing bail out due to the same variable being different shapes (i.e. here we now have something that can be a bool or an object, and v8 will struggle to make a bytecode and may be falling back to slow interpreter mode). Ideally if we want to keep things in hot paths fast they should definitely keep the same shapes for the byte code generator.
I don't remember off hand all the details for why it was mutating properties, but it was very likely to reduce generating a bunch of little objects only to trash them immediately. That can be addressed if an object pool is used for them, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't run through a perf yet as I am away, but I suspect using null
instead of false
would fix any bail outs since those would both be objects
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah that is a good point as well. I think ideally we have some real world benchmarks to base these decisions off of, but I think in theory both the de-opt from different shapes and increased GC from these little objects will be perf regressions. It would be nice to comment these with links to the benchmarks at some point so that the context is kept in the code for future contributors.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So far what I have done is run the perf with the test suite, which the deopts from there usually pretty low hanging fruit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But ya, if it needs a shaped object can make a no match object at the top level and return that reference for the no match
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made the change to use null
instead of false
for the no match scenario (I added a new commit to help reading the new changes, but I can squash them if you prefer).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure I see how that addresses the worry here. The change you made does mean you could return some of those boolean checks back to their original form I think, which is overall good to reduce the change for change's sake, but I think this comment thread here is about the method returning a boolean and not producing unnecessary garbage in the process.
I am not suggesting removing the last changes, but I think ideally we find a way to not create these objects yet still clean up the state after the layers are finished being handled.
I re-checked the PR, and all the changes I made are only related to the removal of the state. |
@Congelli501 yeah sorry I had not fully re-reviewed the code before making that comment. I edited as soon as I noticed that. Sorry if that was confusing. |
Issue
Currently, the path & params of the latest request is stored into the last matched layer.
The "Layer" object should be immutable: a request should not modify its attributes.
This means that parameters are kept into memory after the end a request.
Fix
This PR simply removes the
path
¶ms
parameters of theLayer
object, and return them as the result of thematch
function.Test case
A test case is included, to check that the
params
attribute is not available anymore in the Layer.History
An initial PR was made on the express project: expressjs/express#5426
Example
For example, this code will show the latest value of the parameter, as the value is stored until a new request is made:
Result: