ortizfabio_185816 avatar image
ortizfabio_185816 asked ortizfabio_185816 commented

Is it possible to define a custom partitioner for 1 table in the cluster?

I need to change my schema so that the data is distributed evenly across the partitions. Currently I have one broker that has 10 times more that all the others. The PK field is prfl_id is basically the primary key although there are two other columns part of the PK those almost never change (only when a new version of another field changes). Here is how my schema looks like:


prfl_id bigint ,

type_cd text ,

ver_nb bigint ,

txn_dtl_tx text ,

cre_ts timestamp ,

cre_usr_id text ,

last_txn_ts timestamp ,

PRIMARY KEY (prty_ol_prfl_id,pre_calc_type_cd,ver_nb)


I can find out what the range and distribution of prfl_id is, so based on that I would like to use the ByteOrderedPartitioner for this table. Could you point me to an example?

Due to the way the data is accessed I can't add any dummy column to distributed so it has to be a ByteOrderedPartitioner. Now since the source data is in Hive and the prfl_id barely changes I can find out ahead the best range. Even if it changes over time those changes are small. I am aware that this is an anti-pattern but the client only has the prfl_id to query the table for so changing the schema in this case would not work. The other point is that the data can be load from spark very fast so in case of changes we could reload the data in 30 mins.

Basically I can't find the syntax of how to specify a RandomPartitioner in the table schema. Can somebody point me to an example?


10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

ortizfabio_185816 avatar image
ortizfabio_185816 answered ortizfabio_185816 commented

I guess I can't change the Partitioner for a single table. Partitioner is assigned at the cluster level and all the tables behave the same way.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

I am going to retract my request. Unfortunately the Graphana tool that we use here does not have good labels in the graphs. The particular one that I used to figure out the size of the partitions was the wrong one. Upon further research I look at the right one and I can see that using the Murmur3 partitioner the difference between the min size 901 Mib and the maximum 1Gib is within acceptable parameters.

By the way the reason I asked this question was because lately I been working with Hive tables mostly and I just got to use some data in Cassandra. Hive allow you to define a partitioner for each table so there is more flexibility there. I am not sure why Cassandra is cluster wide, it could be a nice feature for the future.

0 Likes 0 ·
Aleks Volochnev avatar image
Aleks Volochnev answered ortizfabio_185816 commented

Hello @ortizfabio_185816,

The best way to fix this situation will be to design a proper data model for the table. Usage of custom partitioners is possible but not recommended and often considered as an antipattern.

Custom Partitioner (not recommended)
In the case of custom partitioner, you will have to support it and make it compatible with the upcoming versions of Cassandra. Technically it's possible but usually brings bigger problems than you have right now.

Composite Partition Key
To address the issue, it will be better to redesign the data model of the application. To make data spread evenly over the cluster, you could add second component to the partition key. Use bucketing, when the first part of the partition key is `prfl_id` and the second is another value, real or 'synthetical'. As the partition key becomes 'more unique', this leads to smaller partitions evenly distributed over the cluster.

This approach requires some work and data migration to a new table as partition key can not be changed, but it benefits with the proper data model and let you not fork cassandra to solve a data model issue.

Bonus Additionally, I'd recommend taking a course about data modeling at the Datastax Academy, it's free and extremely helpful!

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thank you for you answer I am aware that using the ByteOrderedPartitioner is consider an antipattern. However immagine that your user only has an SSN to query by. How would you create a table if that was your requirement?


0 Likes 0 ·
Cedrick Lunven avatar image
Cedrick Lunven answered ortizfabio_185816 commented

Hi @ortizfabio_185816, Thank you for you question.

As you stated, a paritioner is defined at the node level in the cassandra.yaml configuration file and you want all nodes in your keyspace to have the same partitioner.

partitioner: org.apache.cassandra.dht.Murmur3Partitioner

Reference Documentation: HERE

You are correct this is basically almost at the cluster level except you have different keyspaces on different DC (and would make no sense to have join the DC as a cluster)

Byteorderedpartitioner is an antipattern because it may lead to unbalanced nodes in your cluster creating some hotspot (data skew). What seems to be a good idea for a table maybe terrible for other data in your keyspace. Some post on the subject :

Introducing randomness with RandomPartitionner and default Murmur3Partitioner ensure distribution of your tokens smoothly among your nodes specially if you use the virtual nodes (num_tokens).

What you want to do is working your datamodel:

  • 1 query = mostly 1 table
  • Too big partitions => Split using bucketing
  • Too small partitions or large number of partitions scanned => consider to group in same partition for this query.

You may want have look to this documentation extract from the Cassandra Definitive Guide from @jeffreyscarpenter of course the DS220 Course on the DataStax academy.


1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Definitely it does not work in Cassandra to change the partitioner. I as you can see in my answer the reason being that it is applied to all the keyspaces and tables. It would be good if it could be overwritten in a table by table basis.

0 Likes 0 ·