Skip to content
This repository has been archived by the owner on Jul 16, 2020. It is now read-only.

Weekly Meeting 2016 11 03

Kristen Carlson Accardi edited this page Nov 3, 2016 · 2 revisions

Agenda

##Minutes

#ciao-project: weekly_meeting

Meeting started by kristenc at 16:00:18 UTC. The full logs are available at ciao-project/2016/ciao-project.2016-11-03-16.00.log.html .

Meeting summary

Meeting ended at 16:56:20 UTC.

Action Items

  • kristen and manohar to work on test plan for external ips
  • kristen to add github issues for bat tests for external ips

Action Items, by person

  • UNASSIGNED
    • kristen and manohar to work on test plan for external ips
    • kristen to add github issues for bat tests for external ips

People Present (lines said)

  • kristenc (70)
  • markusry (23)
  • rbradford (22)
  • tcpepper1 (22)
  • mcastelino (14)
  • obedmr- (4)
  • jvillalo_mobl (4)
  • ciaomtgbot (3)
  • sameo (1)

Generated by MeetBot_ 0.1.4

.. _MeetBot: http://wiki.debian.org/MeetBot

###Full IRC Log

16:00:18 <kristenc> #startmeeting weekly_meeting
16:00:18 <ciaomtgbot> Meeting started Thu Nov  3 16:00:18 2016 UTC.  The chair is kristenc. Information about MeetBot at http://wiki.debian.org/MeetBot.
16:00:18 <ciaomtgbot> Useful Commands: #action #agreed #help #info #idea #link #topic.
16:00:18 <ciaomtgbot> The meeting name has been set to 'weekly_meeting'
16:00:29 <kristenc> #topic rollCall
16:00:37 <kristenc> o/
16:01:05 * kristenc notes that the minutes from last week were forgotten
16:01:15 <kristenc> I will update them this week.
16:01:50 <markusry> o/
16:02:36 <tcpepper1> o/
16:03:41 <kristenc> #topic Opens
16:03:43 <kristenc> anyone?
16:03:47 <sameo> o/
16:04:00 <rbradford> o/
16:04:01 <jvillalo_mobl> o/
16:04:11 <kristenc> I have one - what is the status of rbradford 's look into our image service redesign.
16:04:46 <mcastelino> o/
16:05:16 <kristenc> rbradford, ping?
16:06:10 <kristenc> ok - well, maybe that can wait.
16:06:13 <kristenc> let's move on then.
16:06:16 <rbradford> i have some patches up, for what I think the most difficult part is - handling the "ephemeral" storage requirements. next up i'm going to move the images to be backed by volumes
16:06:23 <kristenc> ah great.
16:06:28 * rbradford (was typing slowly)
16:06:33 <kristenc> rbradford, so did you decide to go ahead with it?
16:07:08 * kristenc adjusts expectations for rbradford typing skills.
16:07:37 <rbradford> kristenc, i think it's easy enough to have both methods without over complexity. with the removal of the legacy image booting as a later step.
16:08:21 <rbradford> kristenc, the next patch this week will check if the image uuid is also a block id, and if so, use that rbd image as the basis of a clone.
16:09:13 <mcastelino> kristenc, I had one open... I would like to restructure the public IP code on a new branch... I want to check in the CNCI and libsnnet changes as a united tested PR. And then we can add the scheduler and controller in phases. Right now we have too many things go on in that branch
16:09:33 <kristenc> rbradford, ok, thanks for the update.
16:09:36 <rbradford> so we wouldn't go image uuid -> volume uuid. instead image uuid == volume uuid == rbd name iff the image is backed by volume
16:09:37 <rbradford> (removing a layer of indirection from the previous design)
16:09:37 <rbradford> the final part would be the image upload part.
16:09:51 <tcpepper1> discussing with rbradford and watching his patches this looks viable to me
16:10:32 <kristenc> tcpepper1, will it be possible to get any performance comparisons to make sure it is indeed faster?
16:11:16 <rbradford> i think we only commit with the image upload code and dropping of the old behaviour for simplicity i was going to mandate that all workloads for VMs have to have a storage resource associated with them.
16:11:45 <tcpepper1> kristenc: in theory
16:11:53 <rbradford> kristenc, we're not doing this for speed are we? we're doing this to simplify the architecture and make it possible to e.g. do migrations.
16:12:06 <tcpepper1> kristenc: in practice we don't have a sufficiently architected or implemented ceph system for performance analysis
16:12:18 <kristenc> rbradford, i thought it was driven by a complaint about how long the upload took.
16:12:18 <rbradford> kristenc, booting from rbd is always going to be slower than booting form the local SSD.
16:12:22 <tcpepper1> and yes to the simplified arch, over perf
16:13:02 <rbradford> kristenc, i don't think that's the primary motivation.
16:13:24 <kristenc> ok.
16:14:24 <kristenc> mcastelino, your external IP branch is on your own fork, right?
16:15:01 <kristenc> mcastelino, makes sense to break it up - my experience with having a branch with a huge number of changes on it is negative all the way.
16:15:04 <mcastelino> kristenc, yes... I will deprecate that and create a new one.. both you and tcpepper1 may have to add more PRs on top of that
16:15:10 <mcastelino> for controller and sch
16:15:46 <kristenc> mcastelino, fine with me - I can send you pull requests if that's how you want to do it.
16:16:28 <kristenc> anything else, or shall we move on to triage?
16:16:30 <markusry> mcastelino: Do we have a ticket for external IP BAT tests?
16:16:46 <kristenc> good question.
16:16:51 <mcastelino> markusry, no.. it does not even have unit tests yet
16:17:18 <mcastelino> will be a bit tricky... we may only be able to test it in single VM
16:17:19 <markusry> Do we want to add the BAT tests for external IP as part of Sprint 4 as we are doing for storage and image
16:17:29 <markusry> Ah, okay
16:17:40 <kristenc> markusry, yes - there will be cli changes so we should make bat tests for them.
16:17:40 <mcastelino> so we need tests for it in BAT for sure
16:17:47 <markusry> So not something we'd want to run on the release cluster
16:18:09 <mcastelino> markusry, we should be able to run on release also with a non routable public IP
16:18:11 <kristenc> markusry, I have to figure out how we test external IPs
16:18:22 <markusry> Okay, great.
16:18:27 <markusry> Do we need to enter a ticket now?
16:18:35 <kristenc> yes.
16:19:03 <kristenc> we need one to make it possible to run in single vm mode as well.
16:19:27 <markusry> Okay
16:19:49 <kristenc> I'm not sure how to do that - you need fake external ips you can add to your pool.
16:20:14 <markusry> Can we do it with dnsmasq?
16:20:35 <markusry> Well, anyway that can wait I guess
16:20:37 <tcpepper1> I see the same couple options as for a hardware cluster:
16:21:10 <tcpepper1> 1) we define a (private/fake/test) external network...192.168.72.0/24 or something and allocate "external" IPs to the pool from it
16:21:32 <tcpepper1> 2) we have an agent that does a series of dhcp requests to whatever is the "external" network beyond the VM or real cluster
16:21:56 <tcpepper1> 3) we have an operator pre-request some static IP's from whatever is the "external" network beyond the VM or real cluster
16:22:06 <tcpepper1> 3's awkward for automation
16:22:15 <tcpepper1> 1 is quite faked, but maybe sufficient
16:22:31 <tcpepper1> 2 means we need to build such an agent, and assumes there's an external dhcp server
16:22:55 <kristenc> tcpepper1, and it in practice will limit the number of ips we can add fairly severely
16:23:24 <mcastelino> tcpepper1, 2 is not real a option.. you may not want the local DHCP based IP..
16:23:55 <mcastelino> for test purposes we can just create a non routable IP and point the route to the CNCI
16:24:24 <mcastelino> in real world it will come from the pool.. and the route setup for anyone to get to that public IP is something the operator needs to worry about
16:24:35 <mcastelino> either with explicit route management or route adv
16:25:16 <tcpepper1> ok so then we do #1 for test, and #3 for actual usage?
16:25:43 <kristenc> mcastelino, we should work on a test plan for external ips together at some point in the next 2 weeks or so.
16:25:50 <mcastelino> yes
16:25:59 <tcpepper1> +1 for that
16:26:14 <mcastelino> I will add some tests for it in unit tests and single VM and then we can extend it to cluster
16:26:55 <kristenc> mcastelino, in the next 2 weeks I'll have the cli changes ready and we'll be able to make a detailed plan.
16:28:25 <kristenc> #action kristen and manohar to work on test plan for external ips
16:28:52 <kristenc> #action kristen to add github issues for bat tests for external ips
16:29:30 <kristenc> triage time?
16:29:47 <kristenc> #topic Bug Triage
16:30:14 <kristenc> oh good, it's short.
16:30:20 <kristenc> #link https://github.com/01org/ciao/issues?utf8=0.000000E+002    16:31:24 <kristenc> jvillalo_mobl, I had a question about #753 - can we fix this, or are we waiting for you to give us the go ahead?
16:32:40 <jvillalo_mobl> kristenc let me give it a check
16:32:40 <jvillalo_mobl> oh that one, yes please do
16:32:45 <kristenc> #757 has no priority - I assume since rbradford is working on this currently, it's a P1.
16:32:57 <kristenc> jvillalo_mobl, thanks.
16:32:57 <markusry> Do you strip out the Subnet prefix in the web ui?
16:33:18 <jvillalo_mobl> However we do release on wednesdays so on next wednesday it should be available on ui's master branch
16:33:30 <rbradford> kristenc, yeh, and we can sprint it
16:34:40 <kristenc> rbradford, for #758, is that a p2 then? and are you including it in this sprint?
16:35:23 <rbradford> kristenc, i wondered if we wanted to bikeshed on it, but if nobody objects to the change in policy. then yes, p2 and sprint.
16:36:05 <tcpepper1> rbradford: feels like a natural likely follow on consequence to the current bikeshed around simplification in storage
16:36:12 <rbradford> yup,
16:36:14 <tcpepper1> #758 that is
16:36:35 <kristenc> rbradford, I think eventually we'll have to move back to allowing local storage, but I think it'd be better to have the default mode not be local as you suggested.
16:37:10 <rbradford> kristenc, aws lets you have local storage for !boot fs
16:37:19 <kristenc> btw - I object to the term bikeshedding when it comes to this. let's bikeshed on whether we should call it bikeshedding :).
16:37:31 <tcpepper1> blue!
16:37:38 <rbradford> what about, architectural realignment?
16:37:40 <kristenc> ok, p2, sprint 4.
16:37:58 <tcpepper1> since nobody objects...blue it is!
16:38:47 <kristenc> rbradford, #759 - were you able to confirm this doesn't actually work, or was this just a question about the code?
16:39:05 <rbradford> kristenc, i didn't have the opportunity to test it.
16:39:53 <kristenc> rbradford, ok - so we don't know if this is a bug or not, it's a question, right?
16:40:03 <rbradford> kristenc, but that's what i feel this bug has morphed into
16:40:03 <rbradford> + possibly comment refinement
16:40:12 <tcpepper1> yes
16:40:15 <kristenc> p3 then?
16:41:19 <kristenc> ok - that's all there is. shall we check for any p1 bugs then.
16:41:53 <kristenc> #topic bug scrub
16:41:57 <kristenc> #link https://github.com/01org/ciao/issues?q=is0X0P+0open+is0X0P+0issue+label0X0P+0bug+label0X0P+0P1
16:42:40 <kristenc> are these actually P1 bugs?
16:42:52 <kristenc> none of them are being worked on right now.
16:42:52 <markusry> I think maybe we can close 649
16:43:03 <markusry> As we decided to switch to keystone
16:43:14 <markusry> And I think there's a separate bug for that
16:43:22 <kristenc> markusry, ok - what is the other bug?
16:43:30 <markusry> I'm looking ...
16:43:48 <markusry> https://github.com/01org/ciao/issues/614
16:43:51 <markusry> It's a P3
16:44:04 <kristenc> ok - will close as duplicate of 614.
16:44:10 <markusry> Yep.
16:45:37 <markusry> Is 643 still a problem now your storage changes are merged?
16:45:49 <kristenc> markusry, I think #643 is resolved.
16:46:12 <kristenc> that was part of not updating the status of an attachment when an instance was deleted.
16:46:40 <kristenc> I will close it.
16:46:52 <kristenc> we can reopen if it's not addressed - but I remember fixing this.
16:47:09 <markusry> I can check this quickly
16:48:22 <kristenc> for #626 - I'm not sure if this is still a problem or not. Let me verify really quickly.
16:48:34 <markusry> Yep it's fixed
16:48:48 <tcpepper1> markusry: which?
16:48:50 <kristenc> markusry, you mean 643.
16:48:50 <markusry> The volume becomes available after the instance is deleted
16:49:03 <markusry> Yes, 643 is fixed
16:49:16 <rbradford> #626 works too
16:49:23 <rbradford> as i kinda rely on that :-)
16:49:31 <kristenc> cool - we are closing lots of bugs today.
16:49:37 <kristenc> and all p1s :).
16:50:24 <kristenc> we are only left with 681
16:50:30 <kristenc> is this really a P1?
16:51:07 <tcpepper1> I thought we decided last week it was
16:51:11 <obedmr-> mmm, I don't think so
16:51:33 <obedmr-> I'd be working with this once the generic db layer is complete
16:52:34 <kristenc> so I made sure obedmr- was assigned to the bug.
16:52:47 <obedmr-> sure
16:52:56 <kristenc> I think it's obviously not his top priority - so I'd vote for making it a p2.
16:53:06 <obedmr-> agree
16:54:04 <kristenc> ok - that's all the P1 bugs!
16:54:53 <kristenc> I think we might be done early unless anyone has anything else they want to discuss.
16:55:35 <tcpepper1> I think 681 might get some addressing as we look at commonizing volume and image, but we'll see.  if something happened there, it's not on a P1 timeframe.  later refactoring.
16:56:15 <kristenc> ok - we are done then.
Clone this wiki locally