Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "Jay Kreps (JIRA)" <>
Subject [jira] [Updated] (KAFKA-642) Protocol tweaks for 0.8
Date Mon, 03 Dec 2012 19:53:59 GMT


Jay Kreps updated KAFKA-642:

    Attachment: KAFKA-642-v2.patch

I should clarify that my goal here is to make the most minimal change that fixes the client
protocol for at least the user-facing apis. I also wussed out on trying to generalize offsetrequest/response.
Basically I think trying those things at these point would just take and we are trying to
stabalize and release.

So I made some of the changes you recommend, but some I think are bigger, and my hope was
to hold off on those.

For example my goal is not to implement correlation id, just add it to the protocol. To  properly
handle correlation id we need to make it so that we have a single counter across all requests
on a single connection which is hard to do right now.  I have some thoughts on generalizing
some of our serialization and request handling stuff which I started to discuss in KAFKA-643.
All I want to do now is fix as much of the protocol as I can while breaking as little as possible
in the process.

1. Agreed, fixed.

2. Ack, I missed the correlation id in OffsetResponse. I had intended to leave it out of the
non-public apis since this was meant to be a minimal change, but it is easy to add so i will
do so. This should simplify future upgrades.

3.1 Yeah, but see above comment.
3.2 I mean properly speaking having a default correlation id doesn't really make sense does
it? Anything other than a per-connection counter is basically a bug...

4. No, it is a signed int so it should be fine for it to roll over every 4 billion requests
per connection, that will take a while.

5. Good point.

6. Done

7. See above comment on correlationId

8. Did it for DefaultEventHandler as that is easy, cowardly not attempting for consumer.

9.1 Done.
9.2  Deleted, should not duplicate protocol docs.
9.3 I chickened out on this. We will have to do it as a follow-up post 0.8 item.

10.1 Agreed, but this is several weeks of work I think. This is a pretty big refactoring.
Some thoughts on KAFKA-643.

11. Yeah, I mean basically that constructor shouldn't exist at all since it isn't setting
client id either. 
> Protocol tweaks for 0.8
> -----------------------
>                 Key: KAFKA-642
>                 URL:
>             Project: Kafka
>          Issue Type: Bug
>            Reporter: Jay Kreps
>            Priority: Blocker
>         Attachments: KAFKA-642-v1.patch, KAFKA-642-v2.patch
> There are a couple of things in the protocol that are not idea. It would be good to tweak
these for 0.8 so we start clean.
> Here is a set of problems and proposals:
> Problems:
> 1. Correlation id is not used across all the requests. I don't think it can work as intended
because of this.
> 2. On reflection I am not sure that we need a correlation id field. I think that since
we need to guarantee that processing is sequential on any particular socket we can correlate
with a simple queue. (e.g. as the client sends messages it adds them to a queue and as it
receives responses it just correlates to whatever is at the head of the queue).
> 3. The metadata response seems to have a number of problems. Among them is that it weirdly
repeats all the broker information many times. The response includes the ISR, leader (maybe),
and the replicas. Each of these repeat all the broker information. This is super weird. I
think what we should be doing here is including all broker information for all brokers and
then just having the appropriate ids for the isr, leader, and replicas.
> 4. For topic discovery I think we need to support the case where no topics are specified
in the metadata request and for this return information about all topics. I don't think we
do this now.
> 5. I don't understand what the creator id is.
> 6. The offset request and response is not fully thought through and should be generalized.
> Proposals:
> 1, 2. Correlation id. This is not strictly speaking needed, but it is maybe useful for
debugging to be able to trace a particular request from client to server. So we will extend
this across all the requests.
> 3. For metadata response I will try to fix this up by normalizing out the broker list
and having the isr, replicas, and leader field just have the node id.
> 4. This should be uncontroversial and easy to add.
> 5. Let's remove creator id, it isn't used.
> 6. Let's generalize offset request. My proposal is below:
> Rename TopicMetadata API to ClusterMetadata, as this will contain all the data that is
known cluster-wide. Then let's generalize the offset request to be PartitionMetadata--namely
stuff about a particular partition on a particular server.
> The format of PartitionMetdata would be the following:
> PartitionMetadataRequest => [TopicName [PartitionId MinSegmentTime MaxSegmentInfos]]
>   TopicName => string
>   PartitionId => uint32
>   MinSegmentTime => uint64
>   MaxSegmentInfos => int32
> PartitionMetadataResponse => [TopicName [PartitionMetadata]]
>   TopicName => string
>   PartitionMetadata => PartitionId LogSize NumberOfSegments LogEndOffset HighwaterMark
>   SegmentData => StartOffset LastModifiedTime
>   LogSize => uint64
>   NumberOfSegments => int32
>   LogEndOffset => int64
>   HighwaterMark => int64
> This would be general enough that we could continue to add to it for any new pieces of
data we need.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message