Skip to content
This repository has been archived by the owner on May 3, 2022. It is now read-only.

Releases: bookingcom/shipper

v0.10.0-alpha.0

17 Nov 13:16
Compare
Choose a tag to compare

Changelog since v0.8

You might notice that there is no version 0.9 of Shipper. This is
because in version 0.9, we tried to split Shipper into two components
(shipper-mgmt and shipper-app) which would run in management and
application clusters respectively. However, that version was behaving
erratically in a way that was hard to predict and debug. After
spending months trying to patch all the holes, we decided to forgo the
separation for now and move on with the development of other features.

Please note that this means that Shipper is still one component,
running only in the management cluster.

Breaking Changes

  • Shipper now uses different names for its service accounts, roles,
    rolebindings and clusterrolebindings. Refer to the migrating to Shipper 0.10
    section for more information on how to migrate to
    the new version safely.

Improvements

  • Shipperctl admin clusters apply is split into multiple commands,
    so that each operation can be done separately. For example, this
    allows operators to only set up the application clusters, without
    touching the management cluster (#358)
  • Shipper now rejects all modifications to the environment field of
    all releases. This fixes an issue where users would modify this
    field and cause an unsupported behavior (#357)
  • Shipper now exposes metrics on the health of the webhook. For now,
    that includes the time that the SSL certificate expires, and a secondly
    heartbeat (#366)
  • Shipperctl now creates and modifies the webhook with the failure
    policy
    set to Fail (#366. This means that the webhook
    becomes a very important piece of the user experience, and we
    suggest you monitor the Shipper webhook's health using the metrics
    mentioned above.

Migrating to 0.10

  • Run shipperctl clusters setup management, shipperctl clusters join to create the
    relevant CRDs, service accounts and RBAC objects
  • Make sure your context is set to the management cluster, and apply
    the Shipper 0.10 deployment object by doing kubectl apply -f https://github.com/bookingcom/shipper/releases/download/v0.10.0/shipper.deployment.v0.10.0.yaml
    and for shipper-state-metrics kubectl apply -f https://github.com/bookingcom/shipper/releases/download/v0.10.0/shipper-state-metrics.deployment.v0.10.0.yaml
  • Start monitoring the health of the webhook. You can use the
    shipper_webhook_health_expire_time_epoch and
    shipper_webhook_health_heartbeat Prometheus metrics.

Reverting to 0.8

  • Remove the Shipper deployments on management cluster
  • Run shipperctl 0.8 to revert service accounts and cluster role objects
    to the state that Shipper 0.8
    expects them to be in
  • Create the Shipper deployment on the management cluster with the
    relevant image tag, v0.8.2

v0.8.2: Fixed inconsistency in historical release strategy state conditions

24 Mar 11:14
Compare
Choose a tag to compare
This commit fixes release status strategy condition insonsistency
reported in #299. The issue boils down to the way strategy executor
handles historical release: it only runs against traffic and capacity
ensurers and updates the release strategy state if it looks incomplete.
Under a positive scenario branch, an update never happens, causing
strategy conditions to stall in mistakenly broken state.

This commit strategy executor behavior slightly. In particular, all
releases except contender are forced to look ahead only. The prior
implementation was not strict on this behavior and all releases kept
reporting their incumbent state along with their own state. A contender
is the only release in the chain that is looking behind at it's
contender. This implies on a more deterministic strategy state and
conditions buildup: non-contender releases drop any information about
incumbent's state in their conditions, only remaining the information
about the owning release's state. The motivation behind this move is to
reduce the number of oscillations in the release chain by eliminating
simultaneous look-ahead and look-behind actions. In fact, there is no
use in non-contender release's incumbent state: the only essential
transition is happening between incumbent and contender.

Apart from this change, the commit includes a series of bug fixes. It
introduces a minor change in shipper v1alpha1 types
`ReleaseStrategyCondition` definition: attributes `Reason`, `Message`
and `Step` have dropped `omitempty` flag. This fixes the problem when
conditions remained in an inconsistent state where status was indicating
a healthy state but the reason and the message attributes were present.
This behavior can be explained by the combination of `omitempty` k8s
strategic merge patch usage. When an attribute is set to a zero-value
(https://dave.cheney.net/2013/01/19/what-is-the-zero-value-and-why-is-it-useful)
`json.Marshal` omits these values causing k8s strategic merge strategy
to merge old non-empty attributes of a structure (like: `Reason`, `Step`) with
updated ones (like: `Status`). Explicit no-omitempty behavior forces json to
encode empty values in patches and fixes this inconsistency.

Signed-off-by: Oleg Sidorov <[email protected]>

v0.8.1: Removed SecretChecksumAnnotation from cluster secret annotations

12 Mar 09:01
Compare
Choose a tag to compare
This commit removes SecretChecksumAnnotation annotation. In the current
implementation this annotation is a must-have item that ensures that the
cluster was provisioned correctly and notifies clusterclientstore on
secret changes.

In this commit we remove this annotation and re-compute cluster secret
checksum every time syncSecret/1 is being invoked. This approach ensures
no preliminary config is needed to get a cluster object to an operating
state. Every time a cluster secret changes, the cache invalidates the
stored cluster object and re-creates it.

Signed-off-by: Oleg Sidorov <[email protected]>

v0.8.0-beta.1: capacity controller: better summarization of conditions

04 Mar 12:08
Compare
Choose a tag to compare
Before this, the capacity controller would only put a list of unready
clusters in the CapacityTarget's Ready condition when it would set it to
False. This requires users to go digging into each cluster condition,
and most likely they would only be directed to SadPods, where they could
finally get some useful information.

Now, that information is summarized in a very brief format, in the hopes
that users will have to do less jumping around when investigating why
their CapacityTarget is not progressing.

For instance, if the CapacityTarget is stuck because one container can't
pull its image, we'll now have the following in the CapacityTarget's
.stauts.conditions:

```
[
  {
    "lastTransitionTime": "2020-02-12T13:16:44Z",
    "status": "True",
    "type": "Operational"
  },
  {
    "lastTransitionTime": "2020-02-12T13:16:44Z",
    "message": "docker-desktop: PodsNotReady 3/3: 3x\"test-nginx\" containers with [ImagePullBackOff]",
    "reason": "ClustersNotReady",
    "status": "False",
    "type": "Ready"
  }
]
```

As a bonus, this is also shown on a `kubectl get ct`:

```
% kubectl get ct  snowflake-db84be2b-0
NAME                   OPERATIONAL   READY   REASON                                                                                AGE
snowflake-db84be2b-0   True          False   docker-desktop: PodsNotReady 3/3: 3x"test-nginx" containers with [ImagePullBackOff]   8d
```

v0.8.0-alpha.6: capacity controller: better summarization of conditions

24 Feb 10:50
Compare
Choose a tag to compare
Before this, the capacity controller would only put a list of unready
clusters in the CapacityTarget's Ready condition when it would set it to
False. This requires users to go digging into each cluster condition,
and most likely they would only be directed to SadPods, where they could
finally get some useful information.

Now, that information is summarized in a very brief format, in the hopes
that users will have to do less jumping around when investigating why
their CapacityTarget is not progressing.

For instance, if the CapacityTarget is stuck because one container can't
pull its image, we'll now have the following in the CapacityTarget's
.stauts.conditions:

```
[
  {
    "lastTransitionTime": "2020-02-12T13:16:44Z",
    "status": "True",
    "type": "Operational"
  },
  {
    "lastTransitionTime": "2020-02-12T13:16:44Z",
    "message": "docker-desktop: PodsNotReady 3/3: 3x\"test-nginx\" containers with [ImagePullBackOff]",
    "reason": "ClustersNotReady",
    "status": "False",
    "type": "Ready"
  }
]
```

As a bonus, this is also shown on a `kubectl get ct`:

```
% kubectl get ct  snowflake-db84be2b-0
NAME                   OPERATIONAL   READY   REASON                                                                                AGE
snowflake-db84be2b-0   True          False   docker-desktop: PodsNotReady 3/3: 3x"test-nginx" containers with [ImagePullBackOff]   8d
```

v0.8.0: capacity controller: better summarization of conditions

11 Mar 12:22
Compare
Choose a tag to compare
Before this, the capacity controller would only put a list of unready
clusters in the CapacityTarget's Ready condition when it would set it to
False. This requires users to go digging into each cluster condition,
and most likely they would only be directed to SadPods, where they could
finally get some useful information.

Now, that information is summarized in a very brief format, in the hopes
that users will have to do less jumping around when investigating why
their CapacityTarget is not progressing.

For instance, if the CapacityTarget is stuck because one container can't
pull its image, we'll now have the following in the CapacityTarget's
.stauts.conditions:

```
[
  {
    "lastTransitionTime": "2020-02-12T13:16:44Z",
    "status": "True",
    "type": "Operational"
  },
  {
    "lastTransitionTime": "2020-02-12T13:16:44Z",
    "message": "docker-desktop: PodsNotReady 3/3: 3x\"test-nginx\" containers with [ImagePullBackOff]",
    "reason": "ClustersNotReady",
    "status": "False",
    "type": "Ready"
  }
]
```

As a bonus, this is also shown on a `kubectl get ct`:

```
% kubectl get ct  snowflake-db84be2b-0
NAME                   OPERATIONAL   READY   REASON                                                                                AGE
snowflake-db84be2b-0   True          False   docker-desktop: PodsNotReady 3/3: 3x"test-nginx" containers with [ImagePullBackOff]   8d
```

v0.8.0-alpha.5

20 Feb 15:47
Compare
Choose a tag to compare
Fixed historical release awakening due to targetStep resolve inconsis…

v0.8.0-alpha.4: Bugfixes in release controller strategy executor

20 Feb 10:02
Compare
Choose a tag to compare
This commit addresses multiple issues identified in strategy executor,
among which:
  * Wrong recepients for patch updates: due to an error in the code some
  patches were applied to a wrong generation of release and target
  objects.
  * Target object spec checkers used to return an incomplete spec if
  only some of the clusters are misbehaving: there was a risk of
  de-scheduling the workload on healthy clusters.

Signed-off-by: Oleg Sidorov <[email protected]>

v0.8.0-alpha.3: Release controller: patches are applied on the owning release only

13 Feb 14:21
Compare
Choose a tag to compare
Since the moment of introduction of Patch interface, release controller
and strategy executor in particular started using `Patch.Alters()`
method in order to distinguish altering patches from no-op. It turned
out there was an inconsistency between the recepient and the validation objects.
In essense, we were checking if a patch alters a predecessor release object
whereas on a positive check it was sent to alter the successor release. This
patch ensures all patches are validated against the same generation of releases.

Signed-off-by: Oleg Sidorov <[email protected]>

v0.8.0-alpha.2: Fixed out-of-range index dereferencing error in release controller

13 Feb 09:56
Compare
Choose a tag to compare
This commit fixes a problem where a corresponding release strategy was
resolved twice: once for an actual execution and second time for
reporting. These resolutions were happening in distinct places: the 1st
one in strategy executor, the latter in release controller. As a result,
the release controller one was causing a panic as it was not taking into
account the updated logic of strategy resolution where an incumbent is
supposed to look ahead and use it's successor's strategy, and the index
validity was only happening in strategy executor, which was calculating
the desired strategy correctly. This commit is also moving things
around: strategy executor is being initialized with a specific strategy
and a pointer to the target step and Execute() step takes the sequence
of executable releases as arguments.

Signed-off-by: Oleg Sidorov <[email protected]>