-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CNF-15663: Full DU profile example #313
base: main
Are you sure you want to change the base?
Conversation
@irinamihai: This pull request references CNF-15663 which is a valid jira issue. Warning: The referenced jira issue has an invalid target version for the target branch this PR targets: expected the story to target the "4.18.0" version, but no target version was set. In response to this: Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the openshift-eng/jira-lifecycle-plugin repository. |
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: The full list of commands accepted by this bot can be found here.
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold cleanup |
/unhold |
type: string | ||
cluster-log-fwd-outputs: | ||
type: string | ||
cluster-log-fwd-pipelines: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 -- or going a bit further I think this is static content that likely doesn't need to be part of the defaults either.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It has been agreed that this will be locked in the new version of the ClusterLogForwarder, currently WIP under OCPBUGS-44518, so it will be removed from this ClusterTemplate.
policyTemplateParameters: | ||
description: policyTemplateSchema defines the available parameters for cluster configuration | ||
properties: | ||
cluster-log-fwd-filters: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be a default
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, do you mean have them directly in the Policy Generator and not expose them in the policyTemplateParameters?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, these would be set in the default configmap vs being passed in by the client
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As proposed above I think we should narrow this down to just the additional labels. One question I have is whether the user/orchestrator would add one or more labels which are cluster specific (like a higher level cluster identifier, etc)? In that case would the labels (or at least one label) need to be part of this schema?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The filters are also going to be partially locked in in the ClusterLogForwarder source-cr under OCPBUGS-44518. Yes, these labels will be set in the ClusterInstance defaults ConfigMap, but we also need a way for them to reach the ConfigMap used by the ACM PGs, so they need to also be included in the policyTemplate defaults ConfigMap.
type: string | ||
cluster-log-fwd-pipelines: | ||
type: string | ||
sriov-fec-bbDevConfig: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Default
type: string | ||
sriov-fec-bbDevConfig: | ||
type: string | ||
sriov-fec-pciAddress: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default
type: string | ||
sriov-fec-pciAddress: | ||
type: string | ||
sriov-fec-pfDriver: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default
# (e.g., "40m" for 40 minutes) | ||
clusterConfigurationTimeout: "40m" | ||
policytemplate-defaults: | | ||
cluster-log-fwd-filters: '[{"name":"test-labels", "type": "openshiftLabels", "openshiftLabels": {"label1": "test1", "label2": "test2"}}]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really filters, additional metadata labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to narrow the templating down to just the additional labels, ie the user configures only the value for openshiftLabels?
type: string | ||
hugepages-count: | ||
type: string | ||
machine-config-storage-source-1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove, see comment above
type: string | ||
machine-config-storage-source-1: | ||
type: string | ||
machine-config-storage-source-2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove
type: string | ||
hugepages-count: | ||
type: string | ||
machine-config-storage-source-1: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
type: string | ||
machine-config-storage-source-1: | ||
type: string | ||
machine-config-storage-source-2: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Generally speaking, if this is something that will need to be kept in-sync with the cnf-features-deploy repo, perhaps it's worth engineering a way to automatically synchronize them or generate one from the other.
clusterConfigurationTimeout: "40m" | ||
policytemplate-defaults: | | ||
cluster-log-fwd-filters: '[{"name":"test-labels", "type": "openshiftLabels", "openshiftLabels": {"label1": "test1", "label2": "test2"}}]' | ||
cluster-log-fwd-outputs: '[{"type":"kafka","name":"kafka-open", "kafka": {"url":"tcp://10.46.55.190:9092/test"}}]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we allow customization of all of this? Or just the Kafka url?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
url only
additionalKernelArgs: | ||
- rcupdate.rcu_normal_after_boot=0 | ||
- vfio_pci.enable_sriov=1 | ||
- vfio_pci.disable_idle_d3=1 | ||
- efi=runtime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't override this section, but rely on the source-crs original value
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1
This is also missing the module_blacklist=irdma
machineConfigPoolSelector: | ||
pools.operator.machineconfiguration.openshift.io/master: "" | ||
nodeSelector: | ||
node-role.kubernetes.io/master: '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
machineConfigPoolSelector: | |
pools.operator.machineconfiguration.openshift.io/master: "" | |
nodeSelector: | |
node-role.kubernetes.io/master: '' | |
machineConfigPoolSelector: | |
$patch: replace | |
pools.operator.machineconfiguration.openshift.io/master: "" | |
nodeSelector: | |
$patch: replace | |
node-role.kubernetes.io/master: '' |
And then we don't need the SetSelector cr variant any more.
(Repeat for *-SetSelector.yaml
elsewhere in this file!)
phc2sysOpts: -a -r -n 24 | ||
ptp4lOpts: -2 -s --summary_interval -4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't override these; the source-crs has the right values.
- path: source-crs/MachineConfigGeneric.yaml | ||
complianceType: mustonlyhave # This is to update array entry as opposed to appending a new entry. | ||
patches: | ||
- metadata: | ||
name: 02-master-workload-partitioning | ||
spec: | ||
config: | ||
storage: | ||
files: | ||
- contents: | ||
# crio cpuset config goes below. This value needs to be updated and matched with PerformanceProfile. Check the link for more info on the content. | ||
source: '{{hub fromConfigMap "" (printf "%s-pg" .ManagedClusterName) "machine-config-storage-source-1" hub}}' | ||
mode: 420 | ||
overwrite: true | ||
path: /etc/crio/crio.conf.d/01-workload-partitioning | ||
user: | ||
name: root | ||
- contents: | ||
# openshift cpuset config goes below. This value needs to be updated and matched with crio cpuset (array entry above this). Check the link for more info on the content. | ||
source: '{{hub fromConfigMap "" (printf "%s-pg" .ManagedClusterName) "machine-config-storage-source-2" hub}}' | ||
mode: 420 | ||
overwrite: true | ||
path: /etc/kubernetes/openshift-workload-pinning | ||
user: | ||
name: root |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i don't think any of this is needed any more since the SiteConfig added cpuPartitioningMode: AllNodes
in 4.14
complianceType: musthave | ||
patches: | ||
- spec: | ||
configDaemonNodeSelector: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't this be in the source cr ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The selector can't be, because depending on whether you're deploying SNO or MNO the source CR may need master
or worker
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this model, the cluster template would only be used for SNO, a MNO would have a different one,
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For SNO we should be able to use either master or worker here since the node has both labels. If we use worker is it valid for all topologies?
summary=Configuration changes profile inherited from performance created tuned | ||
include=openshift-node-performance-openshift-node-performance-profile | ||
[bootloader] | ||
cmdline_crash=-tsc=nowatchdog |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to override the default profile
? 🤔 The ztp git example is just using the default profile values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No need to override the source cr here
include=openshift-node-performance-openshift-node-performance-profile | ||
[bootloader] | ||
cmdline_crash=-tsc=nowatchdog | ||
cmdline_crash1=tsc=reliable |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
cmdline_crash=-tsc=nowatchdog | ||
cmdline_crash1=tsc=reliable | ||
[sysctl] | ||
kernel.timer_migration=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
cmdline_crash1=tsc=reliable | ||
[sysctl] | ||
kernel.timer_migration=1 | ||
kernel.sysrq=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
[service] | ||
service.stalld=start,enable | ||
service.chronyd=stop,disable | ||
# MACHINE CONFIG |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Delete
|
||
For details about setting up the Git repo, please refer to the the Gitops setup [README.md](./samples/git-setup/README.md). | ||
|
||
**Note:** Make sure all the value used in hub templates in the PGs are exposed in the corresponding ClusterTemplate, under `spec.templateParameterSchema.policyTemplateParameters` and are present either in the `spec.templates.policyTemplateDefaults` ConfigMap or are specified through the ProvisioningRequest (`spec.templateParameters.policyTemplateParameters`). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: values
@@ -2,7 +2,8 @@ generators: | |||
- sno-ran-du/sno-ran-du-pg-v4-Y-Z-v1.yaml | |||
# This ACM PG is needed when the previous one has to be updated. | |||
- sno-ran-du/sno-ran-du-pg-v4-Y-Z-v2.yaml | |||
|
|||
- sno-ran-du/sno-ran-du-pg-v4-Y-Z-v3.yaml | |||
- sno-ran-du/sno-ran-du-pg-v4-Y-Z-v4-full-DU.yaml |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since this isn't really a progression from v3, but a whole new PG, would it be better to create new sno-ran-full-du directories for the cluster templates and policy templates and start them at v1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
- data: | ||
config.yaml: | | ||
alertmanagerMain: | ||
enabled: false | ||
telemeterClient: | ||
enabled: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By default, observability is not enabled. I think we should just use the default values from source-cr.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we should enable observability.
@@ -0,0 +1,376 @@ | |||
apiVersion: policy.open-cluster-management.io/v1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this DU profile based on the OCP 4.17? I wonder if adding a comment to mention that might be helpful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
agree
name: redhat-operators | ||
spec: | ||
displayName: redhat-operators | ||
image: registry.redhat.io/redhat/redhat-operator-index:v4.Y |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to use disconnected registry as example?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes
ManagedCluster: | ||
test-annotation: test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this intentional?
- name: clustertemplate-sample.v1.0.0-extramanifests | ||
nodes: | ||
- role: master | ||
bootMode: UEFI |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reference is now UEFISecureBoot.
networkType: OVNKubernetes | ||
sshPublicKey: ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDTca4Qyu5AYBmZbSl74cNTKuNINJ7d+ceBRzKUrhHcQpMbl8UnAYhjh/ffTyVCsgwzm1RjTAm6/tPj9euEa+YX4U78Sx+ioLHmjDvACYsti4DekIR+opFwfIw+JTDXoyVv06lOPaTOa/vtgpe+gDEL364j47f3p9H/tGhsLmpjeG3DVAhbqSh3s0IHpd4OzF/r6g6mbPyHadvedkBZp/qeUX054Gc2QqJeg/s/eddPlQDJbmL8yRVkZu+SsFTOEOAtrdA3czeaEaA8s+aWP9PN3X539Ddw3qahyOSCXpCE2eJXPh8DJCBWVEcFFYgmIFVvCQ+o9cjEmIYg6drGGvRV | ||
installConfigOverrides: '{"capabilities": {"baselineCapabilitySet": "None", "additionalEnabledCapabilities": ["NodeTuning", "OperatorLifecycleManager", "Ingress"]}}' | ||
ignitionConfigOverride: '{"ignition": {"version": "3.2.0"}, "storage": {"files": [{"overwrite": true, "path": "/etc/containers/policy.json", "contents": {"source":"data:text/plain;base64,ewogICAgImRlZmF1bHQiOiBbCiAgICAgICAgewogICAgICAgICAgICAidHlwZSI6ICJpbnNlY3VyZUFjY2VwdEFueXRoaW5nIgogICAgICAgIH0KICAgIF0sCiAgICAidHJhbnNwb3J0cyI6CiAgICAgICAgewogICAgICAgICAgICAiZG9ja2VyLWRhZW1vbiI6CiAgICAgICAgICAgICAgICB7CiAgICAgICAgICAgICAgICAgICAgIiI6IFt7InR5cGUiOiJpbnNlY3VyZUFjY2VwdEFueXRoaW5nIn1dCiAgICAgICAgICAgICAgICB9CiAgICAgICAgfQp9Cgo="}}]}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For IBU there is a need for a separate partition for /var/lib/containers. Should this example set that up as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, good catch
name: bond99 | ||
state: up | ||
type: bond |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there an intent to include ports eth0 and eth1 in this bond, or other ports in it?
# (e.g., "40m" for 40 minutes) | ||
clusterConfigurationTimeout: "40m" | ||
policytemplate-defaults: | | ||
cluster-log-fwd-filters: '[{"name":"test-labels", "type": "openshiftLabels", "openshiftLabels": {"label1": "test1", "label2": "test2"}}]' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we want to narrow the templating down to just the additional labels, ie the user configures only the value for openshiftLabels?
outputs: '{{hub fromConfigMap "" (printf "%s-pg" .ManagedClusterName) "cluster-log-fwd-outputs" | toLiteral hub}}' | ||
pipelines: '{{hub fromConfigMap "" (printf "%s-pg" .ManagedClusterName) "cluster-log-fwd-pipelines" | toLiteral hub}}' | ||
filters: '{{hub fromConfigMap "" (printf "%s-pg" .ManagedClusterName) "cluster-log-fwd-filters" | toLiteral hub}}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As noted above we should use more fixed content in the source CR (ie be more opinionated) and allow the user to override the URL and labels.
serviceAccount: | ||
name: collector |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already in source CR
kernel.panic_on_rcu_stall=1 | ||
kernel.hung_task_panic=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are not in the reference. Is their addition intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove
kernel.hung_task_panic=1 | ||
[scheduler] | ||
group.ice-ptp=0:f:10:*:ice-ptp.* | ||
group.ice-gnss=0:f:10:*:ice-gnss.* |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing:
group.ice-dplls=0:f:10:*:ice-dplls.*
name: root | ||
- name: v4-sriov-config-policy | ||
manifests: | ||
# SRIOV |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For all of these we should not repeat any of the content already in the source CR.
orderPolicies: true | ||
policies: | ||
# REDUCE FOOTPRINT | ||
- name: v4-footprint-policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a reason this is in its own policy? This creates more policies than are necessary. Consider combining with the next policy as "baseline config"
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think this can be included in the v4-config-policy
.
path: /etc/kubernetes/openshift-workload-pinning | ||
user: | ||
name: root | ||
- name: v4-sriov-config-policy |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could sriov configs be included in the v4-config-policy
? I don't see the reason why they couldn't be🤔 .
enabled: false | ||
ipv4: | ||
enabled: false | ||
name: bond99 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the bond, bonding is going to be very rare for this use case
PR needs rebase. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. |
@irinamihai: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here. |
No description provided.