Mailing list archives

Site index · List index
Message view « Date » · « Thread »
Top « Date » · « Thread »
From "David Arthur (JIRA)" <>
Subject [jira] [Commented] (KAFKA-374) Move to java CRC32 implementation
Date Fri, 14 Dec 2012 16:10:13 GMT


David Arthur commented on KAFKA-374:

Akka seems a bit overkill for this (although it does have some nice properties). It would
be interesting to refactor the threading in Kafka with Akka and see what kind of performance
differences there are (certainly beyond the scope of this JIRA).

As for the CRC implementation, is there consensus of what do here - Java or Scala?

I say +1 for Java since no one will need to modify this code and it doesn't really matter
that it's not Scala.
> Move to java CRC32 implementation
> ---------------------------------
>                 Key: KAFKA-374
>                 URL:
>             Project: Kafka
>          Issue Type: New Feature
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jay Kreps
>            Priority: Minor
>              Labels: newbie
>         Attachments: KAFKA-374-draft.patch, KAFKA-374.patch
> We keep a per-record crc32. This is fairly cheap algorithm, but the java implementation
uses JNI and it seems to be a bit expensive for small records. I have seen this before in
Kafka profiles, and I noticed it on another application I was working on. Basically with small
records the native implementation can only checksum < 100MB/sec. Hadoop has done some analysis
of this and replaced it with a Java implementation that is 2x faster for large values and
5-10x faster for small values. Details are here HADOOP-6148.
> We should do a quick read/write benchmark on log and message set iteration and see if
this improves things.

This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see:

View raw message