PioneerAI in Federal Frontier Kubernetes Platform (FKP)

How to enable PioneerAI on Frontier managed workload clusters

What is PioneerAI?

PioneerAI is an artificial-intelligence tool packaged with FKP to help clients to diagnose and troubleshoot issues found within their kubernetes cluster. This tool utilizes OpenAI to perform scans in goals of searching, analyzing, and providing recommendations on repairs to your Frontier clusters. Additionally, scans can also be used for security issues founded within your Kubernetes clusters. This tool is compatible for all Frontier clusters, regardless of CPU architecture, operating system image, Kubernetes distribution, or infrastructure provider.

Troubleshooting Clusters

In the event of pod and services failure, PioneerAI can be utilized to help diagnose your ill Frontier cluster using the Frontier CLI and Frontier Outpost.

Scan Your Cluster

Both Frontier application tools makes it easy for users and administrators to perform scans against their clusters to receive a list of all issues that were found. Below we have an example of this list:

- [Pod] openebs/openebs-ndm-operator-5b984f4966-b8cdd: back-off 5m0s restarting failed container=node-disk-operator pod=openebs-ndm-operator-5b984f4966-b8cdd_openebs(4ae1644d-d263-441b-a97d-63a61d793fd4)
- [Pod] openebs/pvc-c3af4d91-5d41-4ad5-ad3d-f2db1b0a82ae-jiva-rep-0: 0/4 nodes are available: 1 node(s) had untolerated taint {key1: value1}, 3 node(s) had volume node affinity conflict. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..
- [Service] capa-system/capa-metrics-service: Service has no endpoints, expected label cluster.x-k8s.io/provider=infrastructure-aws
- [Service] frontier/frontier-cluster-api: Service has not ready endpoints, pods: [Pod/frontier-cluster-api-6458c5fcf9-72zfw], expected 1

Repair Recommendations

With each of these issues, PioneerAI can provide recommendations on how to repair your cluster. We can use for [Pod] openebs/pvc-c3af4d91-5d41-4ad5-ad3d-f2db1b0a82ae-jiva-rep-0 and [Service] frontier/frontier-cluster-api as examples:

[Pod] openebs/pvc-c3af4d91-5d41-4ad5-ad3d-f2db1b0a82ae-jiva-rep-0:

Error: 0/4 nodes are available: 1 node(s) had untolerated taint {key1: value1}, 3 node(s) had volume node affinity conflict. preemption: 0/4 nodes are available: 4 Preemption is not helpful for scheduling..

Solution: 
1. Check the taint on the node with the untolerated taint.
2. Remove the taint on the node or adjust the tolerations in the pod specification.
3. Resolve the volume node affinity conflict by ensuring the pods are scheduled on nodes with compatible volumes.
4. If preemption is not helpful, consider increasing the number of available nodes or adjusting the scheduling constraints.

[Service] frontier/frontier-cluster-api:

Error: The service does not have any ready endpoints and the expected number of pods is 1.

Solution: 
1. Check the status of the pod "frontier-cluster-api-6458c5fcf9-72zfw".
2. Ensure that the pod is running and ready.
3. If the pod is not running, investigate and resolve any issues that may be preventing it from running.
4. Once the pod is running and ready, the service should have a ready endpoint.

Pods, services, and more…

PioneerAI can be used to troubleshoot even more outside of pods and services for your Frontier clusters. Other categories it can diagnose issues include:

  • Nodes
  • Deployments
  • Ingresses
  • PersistentVolumeClaims
  • ReplicaSets
  • StatefulSets
  • ValidatingWebhookConfigurations
  • MutatingWebhookConfigurations
  • HorizontalPodAutoScalers
  • NetworkPolicies

Security Issues

PioneerAI is integrated with Trivy, a well-known open-source security scanner, to stay up-to-date on security reports from your Frontier clusters. These security report scans include the following:

  • Vulnerability Scans
  • ConfigAudit Scans
  • Exposed Secret Scans
  • RBAC scans

Vulnerability Report Example:

openebs/openebs-localpv-provisioner-686945bb7c: critical Vulnerability found ID: CVE-2022-37434 (learn more at: https://avd.aquasec.com/nvd/cve-2022-37434)

ConfigAudit Report Example:

ingress-nginx/ingress-nginx-admission-create: Config issue with severity \"MEDIUM\" found: container create of job ingress-nginx-admission-create in ingress-nginx namespace should specify a seccomp profile