Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

initial draft (to be reviewed) for API def. #2

Merged
merged 13 commits into from
May 29, 2023
Merged

initial draft (to be reviewed) for API def. #2

merged 13 commits into from
May 29, 2023

Conversation

kmrdhruv
Copy link
Collaborator

Initial API draft for tenant/topic/subscription entity mgmt, produce and dlq consume.

@kmrdhruv kmrdhruv requested a review from anvinjain April 20, 2023 08:07
docs/api.yaml Outdated
example: 1500
message_size:
type: integer
description: "Maximum message size in KBs."
Copy link
Collaborator

@gauravAshok gauravAshok Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think avg msg size is more important metric. But depends on, what do we intend to use this info for?

Copy link
Collaborator Author

@kmrdhruv kmrdhruv Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this i just added for reference/discussion. We should keep either one of them.

docs/api.yaml Outdated
example: 10
max_storage:
type: integer
description: "Maximum size in MB"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if we have avg msg size & throughput & retention policy. Will this info be useful? It looks to be inferable.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was more inclined towards max_storage than msg size as msg size might vary and tough to guess avg size. Also max storage might help in force purge or billing aspect.

docs/api.yaml Outdated
type: object
properties:
max_throughput:
type: integer
Copy link
Collaborator

@gauravAshok gauravAshok Apr 24, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we make it more human friendly, like
10Kb, 1024B, 1MB. The units can be assumed to be per sec. and we only ask for string that can be parsed as unit of some bytes.

or we reduce the scale?. just ask KB or B. Mb might be too large.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge both retention and capacity, also only throughput.

docs/api.yaml Outdated
Comment on lines 548 to 555
multiplier:
type: integer
description: "step for iteration"
example: 10
retry_attempts:
type: integer
description: "maximum retry attempts allowed."
maximum: 3
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so there are "small" / frequent retries that we we can do inline (i.e. during consumption itself), but there are longer retries that we can do using topics.

How do configs map to the above behaviour?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed, multiplier is just that, its description may not be be correct.
So, now the current policy will be backed by topics.
From implementation point of view, there may be scope of doing some initial retries locally only, before we park such messages in to RQ. do we need to make that explicit as part of policy.

docs/api.yaml Outdated
Comment on lines 575 to 577
dlq_from_recovery:
type: number
example: 0.5
Copy link
Collaborator

@gauravAshok gauravAshok Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not too inclined on "dlq_from_recovery" ratio. Makes things slightly complicated, too fine grained which users probably can't manage.

since messages in RQ & DLQ are "failed" messages, users might want to prioritize one over the other, dependeing on their usecase. Cant we just say, "recovery_allocation" generally applies to recovery of failed messages, doen't matter RQ / DLQ. By default RQ messages will be prioritized.
If need be we can just provide the toggle to prioritize RQ or DLQ over the other, (as this probably will be business usecase specific).

Also, just to point out the behaviour, one nuance that a plain "0.5" ratio doesn't capture, is that it is max allocation. if there are not enough RQ / DLQ messages, the unused allocation is just kind of used for main messages. Similar behaviour when prioritizing RQ over DLQ, if there are not enough messages/parallelism in RQ, then DLQ will be consumed from pending parallelism quota.

dlq_from_recovery:
type: number
example: 0.5
max_error_threshold:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of grouped subscription, will the error_rate be calculated in terms of failed groups, instead of messages failing to RQ / DLQ. right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking of later.. as primarily MTTR and also rq/dq size will be determined by number of messages.

type: integer
description: "wait in seconds before attempting next increase/decrease step."
example: 10
ping_seconds:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is ping achieved?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may be needs renaming...
post complete throttle, consumer needs to continuously check at regular interval if service has come back and start recovery process. This regular interval is being called as ping_seconds.
Think of it as attempt to deliver the message continuously where failure doesn't move the msg to different queue.

$ref: '#/components/schemas/dlq_policy'
consumption_policy:
$ref: '#/components/schemas/consumption_policy'
version:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when we say version, we mean schema version right?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was object versioning and not schema versioning.
Shall we cover schema version using API versioning ?

docs/api.yaml Outdated
description: "Not used for now. Intended for message deduplication semantics."
properties:
$ref: '#/components/schemas/property_bag'
groupkey:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
groupkey:
group_key:

docs/api.yaml Outdated
type: integer

parameters:
tenanat_param:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
tenanat_param:
tenant_param:

docs/api.yaml Outdated
$ref: '#/components/responses/error_response'


/v1/tenants/{tenant_name}/subscriptions/{sub_name}/status:
Copy link
Collaborator

@gauravAshok gauravAshok Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how is this status different from state.

This state vs status, led me to search the difference between them. I think I align with this https://softwareengineering.stackexchange.com/questions/219351/state-or-status-when-should-a-variable-name-contain-the-word-state-and-w

I can understand the active, inactive (/ blocked), running, stopped. being the status of the subscription. Similarly, active, inactive (/blocked) for the topic status.

Is consumer_status for internal details of the consumer, lag and other stuff? Internal Stats?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

intresting.. I was thinking of it as "state" being more like state-machine state, status being more detailed running info. kind of single value and multi-value attribute.
Both info as of now intended for external consumption.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

stats/info instead of status.

- $ref: '#/components/parameters/from_param'
- $ref: '#/components/parameters/to_param'
- $ref: '#/components/parameters/msg_id_param'
- $ref: '#/components/parameters/group_id_param'
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

group_id vs group_key. we can stick to one.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aligned .. any preferences ?


put:
description: >-
Unsideline messages from DLT for further consumption.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How will this unsideline api interact with normal consumer that is running for all messages present in MQ,RQ,DLQ.
If a filter is provided, will that stop normal DLQ consumption and only consume matching messages (without moving pointer)?

Copy link
Collaborator Author

@kmrdhruv kmrdhruv Apr 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As path includes "dlt", not applicable for MQ/RQ.
get is returning msgs w/o moving the pointer .. kind of idempotent.
put is modifying the dlt internal state by accepting the unsideline request i.e. moving the pointer on consumption.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Request tracking mechanism needs to be figured out.

gauravAshok and others added 5 commits May 3, 2023 18:09
* adding a basic authz layer. routeDefinitions describe the required permission, and the authz handler checks that against the authz provider.
* adding LICENSE
@kmrdhruv kmrdhruv temporarily deployed to github-pages May 11, 2023 16:25 — with GitHub Pages Inactive
@kmrdhruv kmrdhruv temporarily deployed to github-pages May 11, 2023 17:05 — with GitHub Pages Inactive
@kmrdhruv kmrdhruv temporarily deployed to github-pages May 11, 2023 17:11 — with GitHub Pages Inactive
@kmrdhruv kmrdhruv temporarily deployed to github-pages May 22, 2023 06:07 — with GitHub Pages Inactive
@kmrdhruv kmrdhruv temporarily deployed to github-pages May 29, 2023 09:04 — with GitHub Pages Inactive
@kmrdhruv kmrdhruv temporarily deployed to github-pages May 29, 2023 09:36 — with GitHub Pages Inactive
@kmrdhruv kmrdhruv merged commit dd5db9e into master May 29, 2023
2 of 3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants