Skip to content

Latest commit

 

History

History
143 lines (100 loc) · 5.78 KB

File metadata and controls

143 lines (100 loc) · 5.78 KB

Pelorus Demo

Mean Time to Restore and Change Failure Rate

  • Mean Time to Restore utilizes the failure exporter to measure bug resolution and bug resolution time stamp.
  • Change Failure Rate utilizes the failure exporter to measure bug creation and creation time stamp.

Todolist application

Additional Reference

Assumptions

start

Goals

  • Create and resolve bugs in Github issues that exercise Pelorus metrics
  • View the changes to the Mean Time to Restore and Change Failure Rate metrics.

Pelorus configuration

  1. Make a copy of the values.yaml file in the pelorus repo (raw link for curl-ing) and save it to /var/tmp/values.yaml
cp charts/pelorus/values.yaml /var/tmp/
  1. Update the /var/tmp/values.yaml to match the following text. You may already have the deploytime and committime exporters configured from a previous example.

Note:

exporters:
  instances:
  - app_name: deploytime-exporter
    exporter_type: deploytime
    source_ref: master
    extraEnv:
    - name: NAMESPACES
      value: mongo-persistent
  - app_name: committime-exporter
    exporter_type: committime
    source_ref: master
  - app_name: failure-exporter
    exporter_type: failure
    source_ref: master
    extraEnv:
     - name: PROVIDER
       value: github
     - name: TOKEN
       value: ghp_J<snip>
     - name: PROJECTS
       value: <your_org>/mig-demo-apps
  1. Documentation regarding values.yaml can be found on our readthedocs page. Apply the updated values for Pelorus by executing:
helm upgrade pelorus charts/pelorus --namespace pelorus --values /var/tmp/values.yaml

Note: pause to allow the failure exporter pod to build and deploy.

  1. Check the output from the failure exporter. No bugs should be found at this time.
curl $(oc get route -n pelorus failure-exporter -o=template='http://{{.spec.host | printf "%s\n"}}')

Github Issues

Configuration: Enable the appropriate tags

Pelorus will utilize two tags to determine if a Github issue is associated with the todolist-mongo application. We'll need the default bug tag. Additionally, Pelorus requires that all issues associated with a particular application be labelled with the app.kubernetes.io/name=<app_name> label. This works the same way as the deployment configuration.

github_start

Create a Github issues

Now we will create an issue in Github and set the appropriate labels. Pelorus will register an issue as a deployment failure only if it is labeled as a bug and labeled with the application name app.kubernetes.io/name=todolist

  1. Create a Github issue and label it appropriately to register a failure.

github_issue1

  1. Now refresh the Grafana dashboard and you should see the Change Failure Rate go up.

change_failure_rate_1

  1. Let's now create a non critical bug. A bug that does not indicate a deployment failure in your todolist application. A the bug label however do not add the application label

issue_untagged

  1. Ensure that issue #2 is not impacting our failure rate metric by curl-ing the output of the failure exporter.
curl $(oc get route -n pelorus failure-exporter -o=template='http://{{.spec.host | printf "%s\n"}}')
  • Issue #1 should be found in the output of the curl. Issue #2 will not be registered as a deployment failure because the issue is not tagged with app.kubernetes.io/name=todolist

Note: Notice the message failure_creation_timestamp. This indicates the time the issue was created.

failure_creation_timestamp{app="todolist",issue_number="1"} 1.654704543e+09
  1. Now lets resolve issue #1 and see how that impacts our Failure Rate and the Mean Time to Restore
  • close issue #1

issue_close

  • Check the output from the failure exporter again and we should see a failure_resolution_timestamp, which indicates when a bug was closed.
curl $(oc get route -n pelorus failure-exporter -o=template='http://{{.spec.host | printf "%s\n"}}')


failure_creation_timestamp{app="todolist",issue_number="1"} 1.654704543e+09
failure_resolution_timestamp{app="todolist",issue_number="1"} 1.654705784e+09
  • Now we should also data in the Mean Time to Restore metric in Grafana

mean_time_to_restore