Skip to content
This repository has been archived by the owner on Jul 16, 2020. It is now read-only.

Weekly Meeting 2016 10 27

Kristen Carlson Accardi edited this page Nov 3, 2016 · 2 revisions

Agenda

##Minutes

#ciao-project: weekly_meeting

Meeting started by kristenc at 16:02:20 UTC. The full logs are available at ciao-project/2016/ciao-project.2016-10-27-16.02.log.html .

Meeting summary

Meeting ended at 17:00:49 UTC.

Action Items

  • mark and obed to check image service roadmap and make sure features are in github and complete.

Action Items, by person

  • UNASSIGNED
    • mark and obed to check image service roadmap and make sure features are in github and complete.

People Present (lines said)

  • kristenc (107)
  • markusry (106)
  • tcpepper (30)
  • tcpepper1 (10)
  • rbradford (7)
  • obedmr (5)
  • jvillalo (3)
  • ciaomtgbot (3)
  • mrkz (1)
  • albertom (1)
  • mcastelino (1)

Generated by MeetBot_ 0.1.4

.. _MeetBot: http://wiki.debian.org/MeetBot

###Full IRC Log

16:02:20 <kristenc> #startmeeting weekly_meeting
16:02:20 <ciaomtgbot> Meeting started Thu Oct 27 16:02:20 2016 UTC.  The chair is kristenc. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:02:20 <ciaomtgbot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:02:20 <ciaomtgbot> The meeting name has been set to 'weekly_meeting'
16:02:28 <kristenc> #topic Role Call
16:02:34 <markusry> o/
16:02:38 <rbradford> o/
16:02:57 <tcpepper> o/
16:03:50 <obedmr> o/
16:04:17 <albertom> o/
16:04:42 <kristenc> #topic Opens
16:04:54 <kristenc> anyone?
16:05:35 <kristenc> I think this is going to be a short meeting.
16:05:42 <kristenc> #topic Bug Triage
16:05:47 <mrkz> o/
16:05:49 <kristenc> #link https://github.com/01org/ciao/issues?utf8=0.000000E+002    16:06:29 <kristenc> while we filed many issues this week, luckily we've already triaged most of them.
16:07:02 <kristenc> it looks like #745 is our first not triaged bug.
16:07:17 <kristenc> #link https://github.com/01org/ciao/issues/745
16:07:22 <jvillalo> 0/
16:08:10 <kristenc> in my opinion this bug should be a p2. our first concern isn't quotas IMO.
16:08:36 <rbradford> sgtm
16:08:42 <kristenc> it is however, and incompletion in our storage support which I know we really wanted to wrap up.
16:09:13 <kristenc> lacking other opinions - we will mark it p2 and not include in the current sprint.
16:09:46 <kristenc> #link https://github.com/01org/ciao/issues/746
16:09:59 <markusry> I just have one remaining question about this bug
16:10:04 <markusry> or not bug as the case may be
16:10:14 <kristenc> ok.
16:10:18 <markusry> Is the same behaviour seen on openstack?
16:10:31 <kristenc> which behavior explicitly?
16:10:42 <markusry> In that are there times in the creation lifecycle where you cannot delete an instance?
16:11:40 <kristenc> ah - well, I'm not an expert, but if you look at the api spec, it says this:
16:11:54 <kristenc> Preconditions
16:11:54 <kristenc> The server must exist.
16:11:54 <kristenc> Anyone can delete a server when the status of the server is not locked and when the policy allows.
16:11:54 <kristenc> If the server is locked, you must have administrator privileges to delete the server.
16:11:54 <kristenc> Asynchronous postconditions
16:11:55 <kristenc> With correct permissions, you can see the server status as DELETED through API calls.
16:11:57 <kristenc> The port attached to the server is deleted.
16:11:59 <kristenc> The server does not appear in the list servers response.
16:12:01 <kristenc> The server managed by OpenStack Compute is deleted on the compute node.
16:12:29 <kristenc> so we don't have a "locked" status.
16:12:49 <kristenc> and I don't know what "policy allows" means exactly.
16:13:15 <kristenc> and I don't know what "The server must exist" implies.
16:13:28 <kristenc> so my question is - why do we care what their behavior is?
16:13:29 <markusry> Okay, that could cover our case.
16:13:45 <tcpepper> within the ambiguity, in an asynchronous flow, and a complex set of actors who asynchronously send messages every which way...
16:13:54 <tcpepper> users must be accustomed to being told something didn't work
16:13:58 <markusry> Well I thought it would be a justification for not changing our currrent behaviour
16:14:01 <tcpepper> that seems to be a common thing in openstack
16:14:03 <tcpepper> so you retry
16:14:04 <markusry> if openstack disallowed this as well
16:14:28 <tcpepper> (you the human)
16:14:34 <markusry> One last question.
16:14:37 <kristenc> markusry, I am very reluctant to change the behavior because of the complexity I imagine it will introduce.
16:14:51 <markusry> Okay.
16:15:15 <markusry> Scratch that question then
16:15:40 <markusry> What priority do we give to wrong error code bugs?
16:15:42 <kristenc> markusry, I did send a pull request to send a 403 rather than a 500 when we can't delete due to not being scheduled yet.
16:16:29 <kristenc> #link https://github.com/01org/ciao/pull/754
16:17:16 <tcpepper> that seems reasonable to me
16:17:32 <kristenc> I would give the bug itself a p3, I fixed it because it was easy :)
16:18:02 <markusry> Okay. Fine for me too.
16:18:31 <kristenc> #link https://github.com/01org/ciao/issues/749
16:19:01 <kristenc> that one might be tricky as we don't actually currently know in the bat tests which workloads are cattle and which aren't.
16:19:10 <kristenc> so it might take some thought.
16:19:26 <kristenc> but I think its a high priority to get our bulk workload start test back in.
16:19:53 <kristenc> and I don't know if I'd restrict it to only cattle, but maybe 2 tests with different expectations on start time.
16:20:25 <kristenc> I personally would make it a P1 and include it in our sprint.
16:20:35 <tcpepper> if nothing else, we know which ones they are and could hard code those uuid's right?
16:21:01 <markusry> What exactly do we want to avoid?  Boot from volume?
16:21:17 <kristenc> yes - it would require more maintanance overhead to do this though as we will need to keep the hard coded list up to date.
16:21:24 <tcpepper> sure
16:21:40 <tcpepper> but that's what we do anyway...these csv files will eventually go away
16:21:42 <kristenc> markusry, I believe tim thought maybe we shouldn't bulk start pets.
16:21:57 <tcpepper> if pets are slow
16:22:07 <tcpepper> if pets are fast, I'm cool starting them in bulk
16:22:10 <kristenc> tcpepper, we only update the csv files though - we'd be adding more places to keep them updated.
16:22:12 <markusry> And they're slow because they're boot from volume?
16:22:38 <tcpepper> but from a usage scenario perspective, cattle being fast in bulk is a higher thing to insure imho than pets fast in bulk
16:22:44 <kristenc> markusry, I think it's the creation of the volume that is currently slow. presumably as we fix this performance problem we care less.
16:22:54 <tcpepper> esp. if we have trade offs that make a book from volume pet slower
16:23:06 <markusry> But we may fix that problem in the Sprint
16:23:22 <kristenc> markusry, very true and I hope we do.
16:23:25 <markusry> And if we do can we just reanble the tests?
16:23:28 <kristenc> yes.
16:23:32 <tcpepper> kristenc: I was thinking once the csv files go away, the bat would be the one place we keep updated.  but yes for sure, we have duplicated workload descriptions today and its annoying.
16:23:44 <markusry> Note one other thing about the BAT tests
16:23:46 <kristenc> shall we mark this as p1 and then revisit to see if we should just close it ?
16:23:54 <tcpepper> markusry: yes.  if it's fast we'll just reenable as is / revert this patch
16:23:56 <markusry> A lot of the tests start a random volume
16:24:02 <markusry> So you might get a pet with the other tests as well
16:24:06 <tcpepper> I'm good with P1
16:24:09 <markusry> But you'd only need to launch 1 and not 10
16:24:21 <markusry> tcpepper:  Okay.  Sounds good.
16:24:25 <kristenc> I'm going to mark it as sprint4 as well.
16:24:33 <tcpepper> sounds good!
16:24:39 <kristenc> we'll reserve the right to modify it though.
16:25:11 <kristenc> #link https://github.com/01org/ciao/issues/751
16:25:24 <markusry> P1 I think
16:25:27 <kristenc> this one seems bad.
16:25:32 <markusry> Yep.
16:25:35 <tcpepper> yep
16:25:36 <markusry> An easy fix but it's bad
16:25:48 <markusry> I'll fix it as soon as I'm done with my current patch
16:25:52 <markusry> for the BAT
16:26:00 <kristenc> ok, it's a p1.
16:26:10 <markusry> And also part of Sprint 4
16:26:14 <kristenc> yes.
16:26:18 <kristenc> #link https://github.com/01org/ciao/issues/752
16:27:00 <markusry> This one isn't great either.
16:27:23 <markusry> Short term the easiest thing to do is to revert the patch and rewrite the verify.sh scriot
16:27:37 <markusry> Which I was planning to do anyway, i.e., move it into BAT
16:27:46 <tcpepper> we also need to document our storage objects lifetimes
16:28:02 <markusry> But there might also be a bug in our image service with this patch reverted
16:28:15 <markusry> Should the image service delete meta data for deleted images?
16:28:33 <tcpepper> as with the other issue around delete and the async nature of stuff, we get oddly overlapping lifetimes for objects across entities
16:28:40 * kristenc is looking at the api
16:28:54 <markusry> What I can see from glance, it seems to keep meta data for deleted images
16:29:01 <markusry> http://developer.openstack.org/api-ref/image/v2/index.html?expanded=create-an-image-detail,delete-an-image-detail#images
16:29:06 <markusry> Take a look at image status
16:29:14 <rbradford> seem a little odd to me
16:29:27 <mcastelino> markusry, I joined late.. what is the issue with the setup.sh script?
16:29:38 <markusry> No issue with setup.sh
16:29:43 <markusry> It's with verify.sh
16:30:01 <markusry> To get it to work with the image service, I modified an image service API
16:30:15 <markusry> But in doing so I might have opened up a security hole.
16:30:20 <kristenc> markusry, openstack glance api allows sending the uuid.
16:30:27 <kristenc> if we want to be compatible, we need to as well.
16:30:27 <markusry> Yes, I know.
16:30:32 <markusry> That's why I added it.
16:30:40 <markusry> But I think it also maintains meta data
16:30:42 <kristenc> so - the feature needs to stay.
16:30:43 <markusry> which we don;t
16:30:53 <markusry> So the attack mentioned in the bug wouldn't work
16:31:19 <markusry> maintains meta data for deleted instances I mean
16:31:31 <markusry> Okay, so the bug then becomes fix delete
16:31:55 <kristenc> maintain meta data forever for deleted instances?
16:32:01 <markusry> The uuid is optional in the openstack APIs
16:32:15 <markusry> So I think we could get away without supporting it
16:33:25 <kristenc> so is the bug real or not? does it need editing?
16:33:28 <markusry> Maybe.  I guess we should figure out what glance really does here?
16:33:41 <markusry> I think the bug is real?
16:33:47 <markusry> But there are two ways to fix it.
16:33:50 <markusry> Revert the patch
16:34:08 <markusry> Change our implementation of delete to not delete meta data for deleted images.
16:34:29 <tcpepper> we need to spec our desired behavior more thoroughly
16:34:44 <kristenc> if there is a real security problem, then taking the short term action of reverting the patch seems reasonable.
16:35:13 <kristenc> then maybe filing a separate issue on what you see as issues with the delete operation.
16:35:15 <tcpepper> I think we can delete the meta data, but would need to do it carefully.  if the ciao-image service is the one place where info exists, at worst that 2nd comer trying to use after delete would fail to launch.
16:35:21 <markusry> We'll just need to rewrite verify.sh which I was sort of planning to do anyway
16:35:38 <rbradford> all workloads right now have hardcoded image uuids
16:35:50 <kristenc> ok - so let's mark this as p1 - and I'm going to assign it to markusry to get it all done including verify.sh
16:36:13 <markusry> Sounds good.
16:36:20 <rbradford> how will we update the workloads to reflect the image uuids?
16:36:21 <markusry> It will be nice to have verify.sh in BAT as well
16:36:51 <markusry> But also I agree with tcpepper.
16:37:08 <rbradford> because when we move from files to images as volumes the uuids will be different on each upload
16:37:19 <markusry> We need to document the image service better
16:37:25 <kristenc> remember as well that image upload is supposed to be an admin operation.
16:37:44 <markusry> And check the glance APIs to make sure we're not missing any other important features.
16:38:16 <markusry> kristenc: Is that currently the case on our clusters?
16:38:28 <markusry> I've only tried on ciao-cli where of course it is not
16:38:36 <markusry> But then we don't have keystone
16:38:38 <tcpepper> there's other bugs for that...workloads are global currently and public
16:38:46 <kristenc> markusry, right, you are using fake keystone.
16:38:52 <markusry> Yes
16:38:55 <tcpepper> all aspects of the workload def'n need decomposed into public and tenant specific
16:39:10 <markusry> I was just wondering if upload is admin on our current clusters
16:39:11 <markusry> ?
16:39:12 <kristenc> markusry, when an api is called without a {tenant_id} var it should be admin only. let me double check the identity layer.
16:40:29 <kristenc> markusry, yes - without tenant var, we will make sure we're an admin.
16:40:44 <kristenc> so this call should only work when you are admin when using a real keystone.
16:41:10 <kristenc> markusry, we don't use upload right now in any of the bat tests - so I doubt it's been tested on a real cluster.
16:41:20 <kristenc> a good one to add :).
16:41:28 <rbradford> (image service isn't in ansible yet.)
16:41:46 <kristenc> release cluster also doesn't have image service enabled.
16:41:47 <tcpepper> that makes this a more theoretical issue then?  ie: admin can shoot self in foot today.  non-admin can't do much.
16:41:53 <tcpepper> still important to make right
16:41:53 <markusry> I tested in on singlevm and it works
16:42:10 <markusry> But I'm confused
16:42:12 <kristenc> singlevm has fake keystone which gives permission to everyone.
16:42:18 <markusry> Is image add admin only?
16:42:32 <markusry> I don't see a tenant id in that API?
16:43:12 <tcpepper> looking at other clouds (beyond openstack), I see some where images are read-only and admin added only.
16:43:16 <markusry> It's /v2/images
16:43:28 <tcpepper> but we do want to support tenant specific uploads of tenant specific read-only images.
16:43:56 * tcpepper doesn't have his giant print out of interesting glance and nova api's in front of him
16:44:17 <kristenc> markusry, the way our code is written, it is.
16:44:33 <markusry> Okay, so then this bug maybe isn't as serious as I thought
16:44:38 <obedmr> keystone verifies  that user is member of service role
16:45:09 <markusry> Or at least doesn't make our security worse than it already is
16:45:18 <markusry> Any user who can add can also upload
16:45:33 <obedmr> yep
16:45:45 <kristenc> I'm not sure if image add should be an admin only thing. but I suppose that's another issue.
16:45:46 <markusry> So he can just wipe the existing image referred to by a workload any time he likes
16:45:58 <markusry> Right.
16:46:05 <markusry> We should fix both.
16:46:25 <kristenc> we need to have public/private image support to fix this.
16:46:28 <markusry> But fixing this issue won't buy us any more security in the short term
16:46:31 <kristenc> we currently only have public images.
16:46:34 <kristenc> which means admin only.
16:46:36 <kristenc> so it makes sense.
16:46:46 <obedmr> right
16:47:22 <kristenc> markusry, do you want to close the issue then?
16:47:37 <markusry> Well, if we make image add a non admin task
16:47:42 <markusry> the bug will become valid again
16:47:50 <markusry> I think it is probably still valid
16:47:56 <markusry> It's just that we need to fix both
16:48:05 <kristenc> ok - let's lower the priority then.
16:48:13 <markusry> Maybe the best option would be to enter a new bug for making image add non-admin
16:48:43 <kristenc> markusry, we need to make sure the image service features we need are in github. supporting public/private images needs to be on the list.
16:49:04 <kristenc> part of this work is going to be allowing members of a tenant to add images that they own.
16:49:04 <markusry> Do we currently have a ticket for this?
16:49:10 <kristenc> not sure - let me check.
16:49:15 <markusry> Okay, I'll work offline with obed on this.
16:49:20 <markusry> We should move on
16:49:28 <obedmr> sure
16:50:02 <kristenc> markusry, no.
16:50:08 <kristenc> not sure it's in our roadmap doc either.
16:50:14 <kristenc> moving on.
16:50:25 <markusry> Okay, give obed and I an action to figure it out
16:51:03 <kristenc> #action mark and obed to check image service roadmap and make sure features are in github and complete.
16:51:13 <kristenc> #link https://github.com/01org/ciao/issues/753
16:51:32 <markusry> I just entered this
16:51:36 <markusry> It looks a bit weird
16:51:54 <markusry> I though the "Subnet " prefix was being inserted by ciao-cli
16:51:59 <markusry> But it's coming from controller
16:52:18 <tcpepper1> what's swagger say is the intended behavior?
16:52:18 <kristenc> let me take a look at the code.
16:52:34 <markusry> Ah, it might be an API thing.
16:53:40 <kristenc> markusry, this is a ciao api.
16:53:46 <kristenc> so we can define it how we want.
16:53:47 <markusry> Okay so we can change it
16:54:02 <kristenc> markusry, we need to check with jvillalo if we are going to change anything.
16:54:03 <markusry> It doesn't seem to mention anything in the docs about the Subnet prefix
16:54:10 <markusry> good point
16:54:24 <kristenc> markusry, it's not obvious to me yet where this code gets added, but if it needs to be changed I'm sure it's easy.
16:54:35 <kristenc> so it's just a matter of making sure nobody cares.
16:54:44 <tcpepper1> can you drop a link to the swagger docs for the api in the issue?
16:54:51 <markusry> datastore.go:1332
16:55:12 <tcpepper1> and what do folks think about how we'd change this...just do it, or do it with api backwards compat?
16:55:50 <tcpepper1> my preference is just do it, given we're still early from a real usage perspective.  but at some point we have to settle down.
16:56:09 <markusry> Right.  Let's check with jvillalo
16:56:29 <kristenc> yeah - let's not pretend we have a zillion users and make our lives more difficult than we need to.
16:56:33 <kristenc> we have 2 clients.
16:56:36 <jvillalo> I guess it's ok, we can just rise an issue that is going to change, and we can do, it wouldn't take too much time
16:56:38 <jvillalo> :)
16:56:58 <kristenc> if we break someone it will actually be exciting, because it will make them show up and announce themselves :)
16:57:00 <markusry> right a ciao-cli update would be required to maintain the same output
16:57:33 <markusry> Anyway, in terms of priority, it's pretty low for now.
16:57:35 <tcpepper1> also the way I read the API doc, it doesn't look like it's implemented according to the doc?  or markusry did you only copy paste the subnet part?
16:57:40 <markusry> Unless I need this information for BAT tests
16:57:56 <kristenc> markusry, you set the priority - it's an easy fix.
16:57:59 <markusry> I didn't copy and paste anything
16:58:04 <tcpepper1> in the issue
16:58:05 <tcpepper1> ?
16:58:08 <kristenc> I did.
16:58:12 <kristenc> because you asked for it.
16:58:13 <markusry> I was just trying to choose a format that would explain the issue
16:58:20 <tcpepper1> ooooh
16:58:31 <markusry> If you're talking about the []string{""}
16:58:42 <tcpepper1> yes that bit.  does it also return the "instance" id, tenant id, etc too?
16:58:46 <markusry> Yes
16:58:59 <tcpepper1> if it didn't...it's that much easier to change anything/everything b/c it wasn't to doc. but nevermind :)
16:59:08 <kristenc> I have a hard stop in one minute.
16:59:17 <kristenc> let's assign a priority and wrap this up.
16:59:21 <kristenc> p3?
16:59:32 <markusry> But the swagger is missing Geography
16:59:50 <markusry> Sounds good.
17:00:05 <kristenc> markusry, I am not sure if swagger ever recovered from the api rewrite I did.
17:00:22 <kristenc> leoswaldo, was that your project? are you handing that off to someone?
17:00:43 <kristenc> well - I have to go.
Clone this wiki locally