question

secretshardul avatar image
secretshardul asked Erick Ramirez commented

How is Document API data mapped to Cassandra columns?

Suppose I write JSON documents to Astra using the document API. I want to read the data using CQL afterwards. Want to know if this is possible, and what in what format is the data stored.

Consider these two points:

1. Document API has a new concept of namespaces for holding collections. This is on top of Cassandra's keyspaces. How are namespaces implemented?

2. Is Document API schema-less where arbitrary documents with different schemas can be written? If yes, then how is the data stored? For example I want to write two documents:

{
  "name": "jack"
}

and

{
  "name": [
    "first": "john",
    "last": "doe"
  ]
}

[UPDATE] I wrote the documents from above into a collection "my_collection". Then queried them using CQL:

SELECT * FROM gg.my_collection;

This is how the data is stored (skipped p1 to p44 columns for they are empty):

index key p0 bool_value dbl_value leaf text_value

0

27882e52-76a4-4a06-90e7-ec81d7c3443e

data



data

{"name": "jack"}

1

9ec92cd1-046e-4a2b-9c97-1392104b30a0

data



data

{"name": "jack", "city": "mumbai"}

2

dbc63e19-a13e-4688-8be1-b7ac0c66a336

data



data

{ "name": { "first": "john", "last": "doe" } }


Also tested this the other way round, creating tables using CQL and then fetching data using document API.

1. The table names are returned by the get collections API.

2. Get documents API does not work because column schemas don't match.

As per my understanding

1. The document UID is stored as `key`.

2. The unstructured JSON has been flattened into `text_value` column.

3. Namespace has been mapped to keyspace and collection name to table name.

astrastargate
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Erick Ramirez avatar image
Erick Ramirez answered Erick Ramirez commented

The documents are not structured in a CQL schema the way you are accustomed to.

In the blog post The Stargate Cassandra Documents API, it talks about how JSON documents are broken down into individual elements and mapped to CQL columns. For example, the JSON for document "x":

{"a": { "b": 1 }, "c": 2}

is stored in rows like this:

 key | p0 | p1   | dbl_value
-----+----+------+-----------
 x   | a  | b    | 1
 x   | c  | null | 2

The blog post goes into this topic in a bit more detail.

If you wish, you can query your keyspace directly in Astra so you can see the underlying CQL table. Cheers!

2 comments Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

secretshardul avatar image secretshardul commented ·

[Post converted to "comment" since it is not an "answer"]

[Follow up questions posted in #9940 and #9981]

0 Likes 0 ·
Erick Ramirez avatar image Erick Ramirez ♦♦ secretshardul commented ·

I've updated my answer with a bit more detail and responded to your follow up questions in separate posts.

For what it's worth, Stargate is an open-source project with contributions open to everyone. If you have any follow up questions, my recommendation is to engage with the Stargate Community directly, particularly on Discord since that is the right forum for your Stargate questions so you get access to the engineers working on the project. Cheers!

0 Likes 0 ·