Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

tapan.sharma_186956 avatar image
tapan.sharma_186956 asked ·

How can I connect to DSE with Spark running in a Docker container?

In order to run Spark SQL queries on DSE Graph, I want to connect to Spark engine running inside the docker container. I have mapped 9042, 7077 ports so that they can be accessed outside docker.

Following is the Java code:

spark = SparkSession.builder().appName("My application name").config(new SparkConf())
            .master("dse://localhost:9042?connection.local_dc=dc1;connection.host=;").getOrCreate();

I am getting this error:

Couldn't parse Master URL: dse://localhost:9042?connection.local_dc=dc1;connection.host=;

I also tried "spark://localhost:9042?connection.local_dc=dc1;connection.host=;" master URL but in that case, it gives "Invalid Master URL" error.

What am I missing here?

dsedocker
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Russell Spitzer avatar image
Russell Spitzer answered ·

Networking aside (this may still be an issue), the dse:// url can only be used with an application running with all of the DSE libraries on the classpath. Usually this means running with dse spark-submit from the command line. If your docker image does not have an install of DSE this will not be possible. In addition you should not include connection.host=; if you aren't specifying additional endpoints. Leave out this parameter entirely unless you actually have a value to set it to.

spark:// is the url for the OSS Spark Standalone manager, it doesn't know anything about Cassandra so parameters like "connection.local_dc" are invalid. It can only use single ip:port urls.

So the question here is whether or not you are attempting to contact a Dse Spark Master or a Resource Manager provided from some other Spark Service. If it's DSE you must make sure DSE is in your submitting docker image, or use another submission process like the CQL based RPC (which is a more advanced technique).

If the resource manager is not DSE you need to match the URL format specific to your cluster.

1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks Russell. I will look into it and get back to you.

0 Likes 0 · ·