-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
initial draft (to be reviewed) for API def. #2
Conversation
docs/api.yaml
Outdated
example: 1500 | ||
message_size: | ||
type: integer | ||
description: "Maximum message size in KBs." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think avg msg size is more important metric. But depends on, what do we intend to use this info for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this i just added for reference/discussion. We should keep either one of them.
docs/api.yaml
Outdated
example: 10 | ||
max_storage: | ||
type: integer | ||
description: "Maximum size in MB" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if we have avg msg size & throughput & retention policy. Will this info be useful? It looks to be inferable.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i was more inclined towards max_storage than msg size as msg size might vary and tough to guess avg size. Also max storage might help in force purge or billing aspect.
docs/api.yaml
Outdated
type: object | ||
properties: | ||
max_throughput: | ||
type: integer |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should we make it more human friendly, like
10Kb, 1024B, 1MB. The units can be assumed to be per sec. and we only ask for string that can be parsed as unit of some bytes.
or we reduce the scale?. just ask KB or B. Mb might be too large.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
merge both retention and capacity, also only throughput.
docs/api.yaml
Outdated
multiplier: | ||
type: integer | ||
description: "step for iteration" | ||
example: 10 | ||
retry_attempts: | ||
type: integer | ||
description: "maximum retry attempts allowed." | ||
maximum: 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so there are "small" / frequent retries that we we can do inline (i.e. during consumption itself), but there are longer retries that we can do using topics.
How do configs map to the above behaviour?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As discussed, multiplier is just that, its description may not be be correct.
So, now the current policy will be backed by topics.
From implementation point of view, there may be scope of doing some initial retries locally only, before we park such messages in to RQ. do we need to make that explicit as part of policy.
docs/api.yaml
Outdated
dlq_from_recovery: | ||
type: number | ||
example: 0.5 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not too inclined on "dlq_from_recovery" ratio. Makes things slightly complicated, too fine grained which users probably can't manage.
since messages in RQ & DLQ are "failed" messages, users might want to prioritize one over the other, dependeing on their usecase. Cant we just say, "recovery_allocation" generally applies to recovery of failed messages, doen't matter RQ / DLQ. By default RQ messages will be prioritized.
If need be we can just provide the toggle to prioritize RQ or DLQ over the other, (as this probably will be business usecase specific).
Also, just to point out the behaviour, one nuance that a plain "0.5" ratio doesn't capture, is that it is max allocation. if there are not enough RQ / DLQ messages, the unused allocation is just kind of used for main messages. Similar behaviour when prioritizing RQ over DLQ, if there are not enough messages/parallelism in RQ, then DLQ will be consumed from pending parallelism quota.
dlq_from_recovery: | ||
type: number | ||
example: 0.5 | ||
max_error_threshold: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In case of grouped subscription, will the error_rate be calculated in terms of failed groups, instead of messages failing to RQ / DLQ. right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking of later.. as primarily MTTR and also rq/dq size will be determined by number of messages.
type: integer | ||
description: "wait in seconds before attempting next increase/decrease step." | ||
example: 10 | ||
ping_seconds: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is ping achieved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be needs renaming...
post complete throttle, consumer needs to continuously check at regular interval if service has come back and start recovery process. This regular interval is being called as ping_seconds.
Think of it as attempt to deliver the message continuously where failure doesn't move the msg to different queue.
$ref: '#/components/schemas/dlq_policy' | ||
consumption_policy: | ||
$ref: '#/components/schemas/consumption_policy' | ||
version: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
when we say version, we mean schema version right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was object versioning and not schema versioning.
Shall we cover schema version using API versioning ?
docs/api.yaml
Outdated
description: "Not used for now. Intended for message deduplication semantics." | ||
properties: | ||
$ref: '#/components/schemas/property_bag' | ||
groupkey: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
groupkey: | |
group_key: |
docs/api.yaml
Outdated
type: integer | ||
|
||
parameters: | ||
tenanat_param: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tenanat_param: | |
tenant_param: |
docs/api.yaml
Outdated
$ref: '#/components/responses/error_response' | ||
|
||
|
||
/v1/tenants/{tenant_name}/subscriptions/{sub_name}/status: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how is this status different from state.
This state vs status, led me to search the difference between them. I think I align with this https://softwareengineering.stackexchange.com/questions/219351/state-or-status-when-should-a-variable-name-contain-the-word-state-and-w
I can understand the active, inactive (/ blocked), running, stopped. being the status of the subscription. Similarly, active, inactive (/blocked) for the topic status.
Is consumer_status for internal details of the consumer, lag and other stuff? Internal Stats?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
intresting.. I was thinking of it as "state" being more like state-machine state, status being more detailed running info. kind of single value and multi-value attribute.
Both info as of now intended for external consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
stats/info instead of status.
- $ref: '#/components/parameters/from_param' | ||
- $ref: '#/components/parameters/to_param' | ||
- $ref: '#/components/parameters/msg_id_param' | ||
- $ref: '#/components/parameters/group_id_param' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
group_id vs group_key. we can stick to one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aligned .. any preferences ?
|
||
put: | ||
description: >- | ||
Unsideline messages from DLT for further consumption. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How will this unsideline api interact with normal consumer that is running for all messages present in MQ,RQ,DLQ.
If a filter is provided, will that stop normal DLQ consumption and only consume matching messages (without moving pointer)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As path includes "dlt", not applicable for MQ/RQ.
get is returning msgs w/o moving the pointer .. kind of idempotent.
put is modifying the dlt internal state by accepting the unsideline request i.e. moving the pointer on consumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Request tracking mechanism needs to be figured out.
* adding a basic authz layer. routeDefinitions describe the required permission, and the authz handler checks that against the authz provider. * adding LICENSE
Initial API draft for tenant/topic/subscription entity mgmt, produce and dlq consume.