Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clarify lifecycle of JobExecution/StepExecution and user data instances (e.g. "pass by reference" vs. "pass by value") #41

Open
follis opened this issue Jul 13, 2020 · 2 comments

Comments

@follis
Copy link
Contributor

follis commented Jul 13, 2020

Originally opened as bug 4834 by ScottKurz

--------------Original Comment History----------------------------

Comment from = ScottKurz on 2013-03-24 19:02:24 +0000

If I do this for a running job:

JobExecution exec = jo.getJobExecution(execId);

Do I see, in this 'exec' object, the updates as the job executes (e.g. BatchStatus getting set)?

Or am I only getting a snapshot, and might I need to get a new one to see the ongoing execution?

I.e. does the container have a reference (PBR) back to this JobExecution or is it taking a snapshot of the "value" (PBV)?

Since the spec has no remotable use cases... you might think it should maintain a reference.

I don't think the TCK requires PBR behavior.. I think we ask for a new object each time we want to check the status whether we need to or not.

Since there are no setters on JobExecution, JobInstance, it seems PBR would have no undesirable side effects...

I think it should be spec'd and not left to the implementation, since it is a very basic aspect of the JobOperator programming model.


Comment from = mminella on 2013-03-24 19:53:29 +0000

The JobExecution should be viewed as a value object with no guarantees of it's ties to a running job. jo.getJobExecution(execId) may return a previously run JobExecution as an example. Also, with regards to the "no remoteable use cases", while the spec does not address remote executions, it does not preclude them either (there is nothing to prevent a user from developing a RemoteJobOperator implementation of the JobOperator interface for example).

I don't think we can say within the spec that the JobExecution returned by the JobOperator is the same instance being acted upon by an executing job.


Comment from = ScottKurz on 2013-03-26 02:22:56 +0000

(In reply to comment #1)
Michael,

I agree with your angle on this one, which then leads me to the question of should the spec say that is is by value, e.g. that

JobExecution exec = jo.getJobExecution(execId);
JobExecution exec2 = jo.getJobExecution(execId);

return different instances?

Since there's no setters maybe it doesn't matter.

Te RI uses "by-reference" for a currently-executed job and "by-value" for a terminated job.

The TCK assumes "by-value", e.g. in the polling routine we get a new JobExecution each time we poll rather than rechecking status on the existing. But it doesn't require "by-value" either (obviously since the RI doesn't always use it).

I guess it could be left as an implementation detail. The only code I can conjure up where it matters woul be:

// JobOperator "monitor"

while (..running..) {
JobExecution exec = jo.getJobExecution(execId);
logMonitorEntry(exec)
}

// now go process logged executions


If I have PBR then instead of a list of "snapshots" I'm going to get a bunch of copies of the execution in its final state.

Specifying PBV "breaks" the RI though...however, we could adjust still I believe. Maybe it would be a clearer API to just specify "by value".


Comment from = cvignola on 2013-08-28 15:07:46 +0000

Leaving it open that a spec clarification might be desirable.


Comment from = BrentDouglas on 2013-10-28 07:51:31 +0000

I'm not sure if this is settled but can I put in a vote for the spec and TCK to be changed to allow JobExecution, StepExecution && JobInstance to be value classes.

The way I see it these values are tied to the the implementation of the job repository and requiring these classes to provide an up to date view of the repository will add unnecessary performance penalties to some implementations.


Comment from = ScottKurz on 2013-11-04 22:26:56 +0000

This hasn't been settled, so thanks Brent too for your comments.

I had another thought in this area: that the spec might say that StepExecution#getPersistentUserData() is returned by value.

The way the RI is currently coded, it's theoretically possible for an operator to introduce side effects to the persistent user data that then affect say a Decider operating on the StepExecution.

We might also consider waiting to see if it turns out to be a real-world issue, since it might not be too likely someone would do that.


Comment from = cf126330 on 2013-11-04 22:59:19 +0000

(In reply to ScottKurz from comment #5)

This hasn't been settled, so thanks Brent too for your comments.

I had another thought in this area: that the spec might say that
StepExecution#getPersistentUserData() is returned by value.

The way the RI is currently coded, it's theoretically possible for an
operator to introduce side effects to the persistent user data that then
affect say a Decider operating on the StepExecution.

We might also consider waiting to see if it turns out to be a real-world
issue, since it might not be too likely someone would do that.

By exposing only java.io.Serializable to the client, the expectation is the client should not modify it.

I think it's more likely application itself (which knows the actual type of the user data) will use getPersistentUserData() to pass data between differnet parts, and the apps may expect the returned object is by reference. StepContext.getPersistentUserData() and StepExecution.getPersistentUserData should be consistent in this regard.


Comment from = ScottKurz on 2013-11-04 23:41:58 +0000

(In reply to cf126330 from comment #6)

StepContext.getPersistentUserData() and
StepExecution.getPersistentUserData should be consistent in this regard.

Cheng, I agree with everything up until this last sentence, but wasn't sure where you were trying to end up.

You seemed to think that StepContext returns the persistent data by-reference and StepExecution returns it by-value, but then what did you mean in saying they should be "consistent"?


Comment from = cf126330 on 2013-11-05 02:39:40 +0000

(In reply to ScottKurz from comment #7)

(In reply to cf126330 from comment #6)

StepContext.getPersistentUserData() and
StepExecution.getPersistentUserData should be consistent in this regard.

Cheng, I agree with everything up until this last sentence, but wasn't sure
where you were trying to end up.

You seemed to think that StepContext returns the persistent data
by-reference and StepExecution returns it by-value, but then what did you
mean in saying they should be "consistent"?

Application can access persistent user data via StepContext, and a job client can access persistent user data via StepExecution. So whether we settle on by-reference or by-value, the same rule should apply to both StepContext.getPersistentUserData() and StepExecution.getPersistentUserData(). I think by-reference is a better fit. If an application is really concerned with the risk of persistent user data being modified by a job client, it can choose immutable or unmodifiable types for persistent data.


Comment from = mminella on 2013-11-05 03:39:59 +0000

I strongly think this should be by value (per my previous statements). One thing to keep in mind is that the StepExecution and JobExecution may not retrieved within the same JVM as the job actually executing. I could have a Job running in JVM 1 and use the JobOperator to retrieve the current state of the job from the job repository in JVM 2. How would this work if it was pass by reference and the expectation was that modifications to the data by the client in JVM 2 was to impact JVM 1? Allowing the client to modify the StepExecution/JobExecution in a way that the runtime is expected to react is a dangerous path IMHO.


Comment from = ScottKurz on 2014-02-13 13:12:32 +0000

Not committing to address in future, just listing as candidates and pointing out they're excluded from current Maintenance Rel.

@rmannibucau
Copy link

Hi,

It is a bit hard to read with current formatting but I think TCK enforce it to be by value - and anyway it does not work in all cases but local JVM one.
It is also concistent with context states which are serialized to freeze them - no more sure but I think it is tested too?

Hope it makes sense.
Romain

@scottkurz
Copy link
Contributor

I don't think the TCK helps us much here. It does not enforce PBV ... though it also doesn't assume PBR.

So far we've brought up:

  1. The lifecycle of the JobExecution/StepExecution instances
  2. The lifecycle of the user data instances (obtained via context or execution).

A complete answer should also address the Step metrics (hard to imagine answering the two questions above wouldn't settle this though).

@scottkurz scottkurz changed the title JobExecution handed to JobOperator client "by reference" or "by value"? Clarify lifecycle of JobExecution/StepExecution and user data instances (e.g. "pass by reference" vs. "pass by value"). Jul 13, 2020
@scottkurz scottkurz changed the title Clarify lifecycle of JobExecution/StepExecution and user data instances (e.g. "pass by reference" vs. "pass by value"). Clarify lifecycle of JobExecution/StepExecution and user data instances (e.g. "pass by reference" vs. "pass by value") Jul 13, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants