Retries
Retries are an example of how EG extends the Kubernetes Gateway API using Policy Attachments.
kubectl api-resources | grep gateway
Here is a slightly sanitized copy of the captured output:
| gateway.envoyproxy.io/v1alpha1 Backend
gateway.envoyproxy.io/v1alpha1 BackendTrafficPolicy
gateway.envoyproxy.io/v1alpha1 ClientTrafficPolicy
gateway.envoyproxy.io/v1alpha1 EnvoyExtensionPolicy
gateway.envoyproxy.io/v1alpha1 EnvoyPatchPolicy
gateway.envoyproxy.io/v1alpha1 EnvoyProxy
gateway.envoyproxy.io/v1alpha1 SecurityPolicy
gateway.networking.k8s.io/v1alpha2 BackendLBPolicy
gateway.networking.k8s.io/v1alpha3 BackendTLSPolicy
gateway.networking.k8s.io/v1 GatewayClass
gateway.networking.k8s.io/v1 Gateway
gateway.networking.k8s.io/v1 GRPCRoute
gateway.networking.k8s.io/v1 HTTPRoute
gateway.networking.k8s.io/v1beta1 ReferenceGrant
gateway.networking.k8s.io/v1alpha2 TCPRoute
gateway.networking.k8s.io/v1alpha2 TLSRoute
gateway.networking.k8s.io/v1alpha2 UDPRoute
|
| ---
apiVersion: gateway.envoyproxy.io/v1alpha1
kind: BackendTrafficPolicy
metadata:
name: httpbin-traffic-policy
spec:
targetRefs:
- group: gateway.networking.k8s.io
kind: HTTPRoute
name: httpbin
retry:
numRetries: 5
perRetry:
backOff:
baseInterval: 100ms
maxInterval: 10s
timeout: 250ms
retryOn:
httpStatusCodes:
- 500
triggers:
- connect-failure
- retriable-status-codes
|
kubectl apply -f retries/httpbin-policy.yaml
Review the proxy's "stats"
Specifically, the envoy_cluster_upstream_rq_retry
metric:
watch 'egctl x stats envoy-proxy -n envoy-gateway-system \
-l gateway.envoyproxy.io/owning-gateway-name=eg \
-l gateway.envoyproxy.io/owning-gateway-namespace=default \
| grep "envoy_cluster_upstream_rq_retry{envoy_cluster_name=\"httproute/default/httpbin/rule/0\"}"'
In another terminal, call a failing endpoint:
Another convenient way to get at the stats exposed by the Envoy proxy is through the Envoy admin interface:
egctl x dashboard envoy-proxy -n envoy-gateway-system \
-l gateway.envoyproxy.io/owning-gateway-name=eg \
-l gateway.envoyproxy.io/owning-gateway-namespace=default
Click on the stats
endpoint and look for metrics with "retry" in their name.
Verify: Tail the gateway logs
kubectl logs --tail 1 -n envoy-gateway-system \
-l gateway.envoyproxy.io/owning-gateway-name=eg \
-l gateway.envoyproxy.io/owning-gateway-namespace=default | jq
Below is a copy of the prettified JSON log line:
| {
"start_time": "2024-05-07T21:28:06.447Z",
"method": "HEAD",
"x-envoy-origin-path": "/status/500",
"protocol": "HTTP/2",
"response_code": "500",
"response_flags": "URX",
"response_code_details": "via_upstream",
"connection_termination_details": "-",
"upstream_transport_failure_reason": "-",
"bytes_received": "0",
"bytes_sent": "0",
"duration": "1837",
"x-envoy-upstream-service-time": "-",
"x-forwarded-for": "136.49.247.103",
"user-agent": "curl/8.7.1",
"x-request-id": "f8d9ee84-0f3b-4bc8-a8b7-a023226704b9",
":authority": "httpbin.esuez.org",
"upstream_host": "10.48.2.12:8080",
"upstream_cluster": "httproute/default/httpbin/rule/0",
"upstream_local_address": "10.48.0.13:36898",
"downstream_local_address": "10.48.0.13:10443",
"downstream_remote_address": "136.49.247.103:52999",
"requested_server_name": "httpbin.esuez.org",
"route_name": "httproute/default/httpbin/rule/0/match/0/httpbin_esuez_org"
}
|
Note the Envoy response flag is URX: UpstreamRetryLimitExceeded.
Verify: Review the proxy configuration
egctl config envoy-proxy route -n envoy-gateway-system \
-l gateway.envoyproxy.io/owning-gateway-name=eg \
-l gateway.envoyproxy.io/owning-gateway-namespace=default \
-o yaml | bat -l yaml
Confirm that the routing configuration has been updated with the retry rule.
Here is a sanitized copy of the captured output:
| envoy-gateway-system:
envoy-default-eg-e41e7b31-c7657fcf5-gsgvs:
dynamicRouteConfigs:
...
- routeConfig:
name: default/eg/https
virtualHosts:
- domains:
- httpbin.esuez.org
name: default/eg/https/httpbin_esuez_org
routes:
- match:
prefix: /
name: httproute/default/httpbin/rule/0/match/0/httpbin_esuez_org
route:
cluster: httproute/default/httpbin/rule/0
retryPolicy:
hostSelectionRetryMaxAttempts: "5"
numRetries: 5
perTryTimeout: 0.250s
retriableStatusCodes:
- 500
retryBackOff:
baseInterval: 0.100s
maxInterval: 10s
retryHostPredicate:
- name: envoy.retry_host_predicates.previous_hosts
retryOn: connect-failure,retriable-status-codes
|
Summary
To configure retries, we had to resort to a BackingTrafficPolicy, an extension to the Gateway API.
In contrast, compare with timeouts,
which are configured directly on the HTTPRoute resource, since timeouts are a part of the Kubernetes Gateway API specification.