Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add startupProbe and replace readiness probe with liveness probe #5407

Merged
merged 52 commits into from
Dec 7, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
52 commits
Select commit Hold shift + click to select a range
32dcbc2
fix: run health servicer in thread to allow readinessProbe while req
JoanFM Nov 17, 2022
89d9645
test: add test with slow process
JoanFM Nov 17, 2022
56e1d53
Merge branch 'master' into health_servicer_thread
JoanFM Nov 18, 2022
3a6bc63
fix: implement async HealthCheck methods for WorkerRuntime
Nov 18, 2022
dbbea06
Merge branch 'health_servicer_thread' of github.com:jina-ai/jina into…
Nov 18, 2022
c974bad
fix: implement async HealthCheck methods for WorkerRuntime
Nov 18, 2022
04bb57d
fix: implement async HealthCheck methods for WorkerRuntime
Nov 18, 2022
bbc25ae
fix: use first port of gatway.port argument for target address
Nov 18, 2022
308a3fa
test: add test proving readinessProbe can pass while processing
JoanFM Nov 21, 2022
cfa50ad
feat: add livenessProbe
JoanFM Nov 21, 2022
fd10e2f
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 21, 2022
9974167
fix: tcpProbe as livenessProbe
JoanFM Nov 21, 2022
a779cf9
feat: run blocking endpoint in thread proteced by lock
JoanFM Nov 21, 2022
0d7d75f
Merge branch 'master' into health_servicer_thread
JoanFM Nov 21, 2022
26d0142
fix: avoid error no eventloop outside MainThread
JoanFM Nov 21, 2022
ce699d6
fix: livenessProbe delayed
JoanFM Nov 21, 2022
ac9ea98
Merge branch 'master' into health_servicer_thread
JoanFM Nov 22, 2022
db45ba7
fix: fix add timeout to readiness
JoanFM Nov 22, 2022
988d03b
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 22, 2022
2ef8881
test: remove unneeded test
JoanFM Nov 22, 2022
29b2213
fix: remove the timeout from checker now
JoanFM Nov 22, 2022
d12a659
test: change test k8s failures
JoanFM Nov 22, 2022
470ab20
fix: try to downgrade grpcio
JoanFM Nov 23, 2022
22fe668
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 23, 2022
f014035
style: fix overload and cli autocomplete
jina-bot Nov 23, 2022
92ea0d5
fix: change readinessProbe for startupProbe
JoanFM Nov 23, 2022
3c7a4cc
fix: change the startupProbe values
JoanFM Nov 23, 2022
6122f04
test: try to see how many ids are sent and responded
JoanFM Nov 23, 2022
28bd253
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 23, 2022
a53cc8b
ci: fix reqs
JoanFM Nov 23, 2022
02bc738
style: fix overload and cli autocomplete
jina-bot Nov 23, 2022
bd2b8b0
test: try to see what happens with `continue_on_error
JoanFM Nov 23, 2022
eb3c157
Merge branch 'master' into health_servicer_thread
JoanFM Nov 24, 2022
c72546b
Merge branch 'master' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 24, 2022
dd61349
ci: some changes in k8s tests
JoanFM Nov 24, 2022
e73f1f1
refactor: set SERVING after start
JoanFM Nov 24, 2022
b4b8bb6
Merge branch 'ci-k8s' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 25, 2022
42eef99
refactor: add prestop hook
JoanFM Nov 25, 2022
04f8fcc
Merge branch 'ci-k8s' of https://github.com/jina-ai/jina into health_…
JoanFM Nov 25, 2022
f258607
Merge remote-tracking branch 'origin/master' into health_servicer_thread
Dec 1, 2022
1a38453
Merge remote-tracking branch 'origin/master' into health_servicer_thread
Dec 1, 2022
f03ac42
Merge branch 'master' into health_servicer_thread
girishc13 Dec 2, 2022
16cd69b
Merge branch 'master' into health_servicer_thread
girishc13 Dec 6, 2022
6d8f52d
feat: retry on grpc UNKNOWN and INTERNAL error codes
Dec 7, 2022
5356ed5
ci: remove excess debug logs
Dec 7, 2022
4d4a05a
ci: replace pod portforward with service portforward
Dec 7, 2022
0820501
test: add perf tools to docker image
Dec 7, 2022
cd51bf1
Revert "ci: replace pod portforward with service portforward"
Dec 7, 2022
60abe21
Merge remote-tracking branch 'origin/master' into health_servicer_thread
Dec 7, 2022
681b081
ci: use default sleep time between requests
Dec 7, 2022
0740761
ci: remove GRPC debug flags
Dec 7, 2022
b2437d8
Merge remote-tracking branch 'origin/master' into health_servicer_thread
Dec 7, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .github/workflows/cd.yml
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,7 @@ jobs:
export LINKERD2_VERSION=stable-2.11.4
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
pytest -v -s --suppress-no-test-exit-code --force-flaky --min-passes 1 --max-runs 5 --cov=jina --cov-report=xml ./tests/k8s/test_k8s.py ./tests/k8s/test_graceful_request_handling.py
timeout-minutes: 30
timeout-minutes: 45
env:
JINA_K8S_USE_TEST_PIP: 1
- name: Check codecov file
Expand Down
2 changes: 1 addition & 1 deletion .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -194,7 +194,7 @@ jobs:
export LINKERD2_VERSION=stable-2.11.4
curl --proto '=https' --tlsv1.2 -sSfL https://run.linkerd.io/install | sh
pytest -v -s --suppress-no-test-exit-code --force-flaky --min-passes 1 --max-runs 5 --cov=jina --cov-report=xml ./tests/k8s/test_k8s.py ./tests/k8s/test_graceful_request_handling.py
timeout-minutes: 30
timeout-minutes: 45
env:
JINA_K8S_USE_TEST_PIP: 1
- name: Check codecov file
Expand Down
3 changes: 2 additions & 1 deletion jina/checker.py
Original file line number Diff line number Diff line change
Expand Up @@ -31,12 +31,13 @@ def __init__(self, args: 'argparse.Namespace'):
) as tc:
if args.target == 'executor':
hostname, port, protocol, _ = parse_host_scheme(args.host)
r = WorkerRuntime.is_ready(f'{hostname}:{port}')
r = WorkerRuntime.is_ready(ctrl_address=f'{hostname}:{port}', timeout=args.timeout)
elif args.target == 'gateway':
hostname, port, protocol, _ = parse_host_scheme(args.host)
r = GatewayRuntime.is_ready(
f'{hostname}:{port}',
protocol=GatewayProtocolType.from_string(protocol),
timeout=args.timeout
)
elif args.target == 'flow':
r = Client(host=args.host).is_flow_ready(timeout=args.timeout)
Expand Down
7 changes: 4 additions & 3 deletions jina/serve/runtimes/worker/__init__.py
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
import argparse
from abc import ABC
from typing import TYPE_CHECKING, List, Optional
from concurrent.futures import ThreadPoolExecutor

import grpc
from grpc_health.v1 import health, health_pb2, health_pb2_grpc
Expand Down Expand Up @@ -32,7 +33,7 @@ def __init__(
:param args: args from CLI
:param kwargs: keyword args
"""
self._health_servicer = health.aio.HealthServicer()
self._health_servicer = health.HealthServicer(experimental_thread_pool=ThreadPoolExecutor(1))
JoanFM marked this conversation as resolved.
Show resolved Hide resolved
super().__init__(args, **kwargs)

async def async_setup(self):
Expand Down Expand Up @@ -140,7 +141,7 @@ async def _async_setup_grpc_server(self):
)

for service in service_names:
await self._health_servicer.set(
self._health_servicer.set(
service, health_pb2.HealthCheckResponse.SERVING
)
reflection.enable_server_reflection(service_names, self._grpc_server)
Expand All @@ -164,7 +165,7 @@ async def async_cancel(self):

async def async_teardown(self):
"""Close the data request handler"""
await self._health_servicer.enter_graceful_shutdown()
self._health_servicer.enter_graceful_shutdown()
await self.async_cancel()
self._request_handler.close()

Expand Down
8 changes: 3 additions & 5 deletions tests/k8s/slow-process-executor/debug_executor.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,13 @@


class SlowProcessExecutor(Executor):
def __init__(self, *args, **kwargs):
def __init__(self, time_sleep=1.0, *args, **kwargs):
super().__init__(*args, **kwargs)
from jina.logging.logger import JinaLogger

self.logger = JinaLogger(self.__class__.__name__)
self.time_sleep = time_sleep

@requests
def process(self, docs: DocumentArray, *args, **kwargs):
time.sleep(1.0)
time.sleep(self.time_sleep)
for doc in docs:
doc.tags['replica_uid'] = os.environ['POD_UID']
doc.tags['time'] = time.time()
Expand Down
46 changes: 46 additions & 0 deletions tests/k8s/test_k8s.py
Original file line number Diff line number Diff line change
Expand Up @@ -1008,3 +1008,49 @@ async def test_flow_with_stateful_executor(

assert len(resp) == 1
assert resp[0].parameters == {'__results__': {'statefulexecutor': {'length': 10.0}}}


@pytest.mark.asyncio
@pytest.mark.parametrize(
'docker_images', [['slow-process-executor', 'jinaai/jina']], indirect=True
)
async def test_slow_executor_readinessProbe_works(docker_images, tmpdir, logger):
dump_path = os.path.join(str(tmpdir), 'test-flow-slow-process-executor')
namespace = f'test-flow-slow-process-executor'.lower()
flow = Flow(name='test-flow-slow-process-executor',).add(
name='slow_process_executor',
uses=f'docker://{docker_images[0]}',
uses_with={'time_sleep': 200},
replicas=2,
)

flow.to_kubernetes_yaml(dump_path, k8s_namespace=namespace)

from kubernetes import client

api_client = client.ApiClient()
core_client = client.CoreV1Api(api_client=api_client)
app_client = client.AppsV1Api(api_client=api_client)
await create_all_flow_deployments_and_wait_ready(
dump_path,
namespace=namespace,
api_client=api_client,
app_client=app_client,
core_client=core_client,
deployment_replicas_expected={
'gateway': 1,
'slow-process-executor': 2,
},
logger=logger
)

resp = await run_test(
flow=flow,
namespace=namespace,
core_client=core_client,
n_docs=10,
request_size=1,
endpoint='/',
)

assert len(resp) == 10