Hello everybody,
I am new to Cassandra and I am trying to extract sketches (e.g. bloom filter) from some given data. While doing that, I came this far - my questions break down to the below:
CREATE TYPE bloomfilter_udt ( n_as_sample_size int, m_as_number_of_buckets int, p_as_next_prime_above_m bigint, hash_for_string_coefficient_a list <bigint>, hash_for_number_coefficients_a list <bigint>, hash_for_number_coefficients_b list <bigint>, bloom_filter_as_map map<int, int> ); CREATE OR REPLACE FUNCTION bloomfilter_udf ( state bloomfilter_udt, value text, sample_size int ) CALLED ON NULL INPUT RETURNS bloomfilter_udt LANGUAGE java AS $$ //fill state = bloomfilter_udt with some data return state; $$ ; CREATE OR REPLACE AGGREGATE bloomfilter_uda ( text, int ) SFUNC bloomfilter_udf STYPE bloomfilter_udt INITCOND {};
1) When I call the aggregate, I would like to pass sample_size with a sub-query, e.g.
==> "SELECT bloomfilter_uda(name, (SELECT count(*) FROM test_table)) FROM test_table;" <==
Is that possible with Cassandra?
2) When I try to register the bloomfilter_uda, I get the following error:
==> InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid set literal for (dummy) of type bloomfilter_udt" <==
Can I just pass Cassandra data types as a state (map, list, set)?
3) If I assume, all of the above is my bad, how can I access the props of the state? Like
==> state.n_as_sample_size <==
Is this somehow possible?
I'd appreciate some help/hints.
Thanks
Andreas
[EDIT] For my minimum example, I follow these steps:
CREATE KEYSPACE my_keyspace WITH replication = {'class': 'SimpleStrategy', 'replication_factor' : 3}; CREATE TABLE my_table ( id timeuuid, name text, PRIMARY KEY(id, name) ); INSERT INTO my_table (id, name) VALUES( now(), 'This is some random string'); INSERT INTO my_table (id, name) VALUES( now(), 'This is another random string'); INSERT INTO my_table (id, name) VALUES( now(), 'stringstringstring'); INSERT INTO my_table (id, name) VALUES( now(), 'test test test test'); INSERT INTO my_table (id, name) VALUES( now(), 'some final string'); CREATE OR REPLACE FUNCTION my_user_defined_function_triggered_on_each_row (state map<text,int>, value text, sample_size int) CALLED ON NULL INPUT RETURNS map<text,int> LANGUAGE java AS $$ //if sample size is not in result map, then put it if(!state.containsKey("sample_size")) {state.put("sample_size", sample_size);}; //put the hashvalue (integer represented as a string) of each row as a key in result map and set the value to 1 String hashvalue = "" + value.hashCode(); if(!state.containsKey(hashvalue)) {state.put(hashvalue, 1);}; return state; $$; CREATE OR REPLACE AGGREGATE my_user_defined_aggregate_to_get_sketch (text, int) SFUNC my_user_defined_function_triggered_on_each_row STYPE map<text, int> INITCOND {}; SELECT my_user_defined_aggregate_to_get_sketch(name, 5) FROM my_table;
It work's until here and gives me the following output:
Now instead of storing everything in one map<text,int>, I would like to use a user defined data type, where I can put some more attributes with different data types. I follow there steps:
CREATE TYPE my_sketch( sample_size int, values map<text,int> ); CREATE OR REPLACE FUNCTION my_user_defined_function_triggered_on_each_row_new(state my_sketch, value text, sample_size int) CALLED ON NULL INPUT RETURNS my_sketch LANGUAGE java AS $$ //For a minimal example let's do nothing and just return the state. //I am guessing though, accessing my_sketch will be the next problem. I already tried something like state.my_size... // I tried some versions like: return state; $$;
I test the new function calling on my_table. I expect the first parameter to be the return of each row. I call.
SELECT my_user_defined_function_triggered_on_each_row_new({"sample_size": 5, "values": {'value1': 5 }}, name, 5 ) FROM my_table ;
Gives me the expected output:
When i now try to wrap this up to a single return row with an aggregate, I implement my new aggregate like:
CREATE OR REPLACE AGGREGATE my_user_defined_aggregate_to_get_sketch_new (text, int) SFUNC my_user_defined_function_triggered_on_each_row_new STYPE my_sketch INITCOND {};
But this gives me the error:
InvalidRequest: Error from server: code=2200 [Invalid query] message="Invalid set literal for (dummy) of type my_sketch"
I am using Cassandra 3.11.6 though.