Bringing together the Apache Cassandra experts from the community and DataStax.

Want to learn? Have a question? Want to share your expertise? You are in the right place!

Not sure where to begin? Getting Started

 

question

JoshPerryman avatar image
JoshPerryman asked ·

In a BYOS approach, what is the minimum config required to support DseGraphFrames?

I am attempting to configure Databricks with DSE BYOS. My Databricks Spark instances can see our Postgres. I'd like to connect them also to DSE Graph so that I can set up a data load pipeline from Postgres to DSE Graph.

So I'm wondering if I really need the DSEFS (hadoop) functionality. Of the various spark configuration items from the BYOS commands, what are the minimal ones required to support access to DSE Graph?

For example, my dse-byos.properties look's like the following. Does BYOS have to use DSEFS and DSE for the Hive metastore? Can we just connect our Databricks Spark to Cassandra & DSE Graph and let Databricks use its own resources for the rest, or is DseGraphFrames functionality dependent upon access to DSEFS?


spark.hadoop.cassandra.host 172.xxx.xxx.xxx
spark.hadoop.cassandra.auth.kerberos.enabled false
spark.cassandra.auth.conf.factory com.datastax.bdp.spark.DseByosAuthConfFactory
spark.hadoop.fs.dsefs.impl com.datastax.bdp.fs.hadoop.DseFileSystem
spark.sql.extensions com.datastax.spark.connector.DseSparkExtensions
spark.hadoop.dse.advanced_replication.directory /path/to/cassandra/advrep
spark.hadoop.com.datastax.bdp.fs.client.authentication.factory com.datastax.bdp.fs.hadoop.DseRestClientAuthProviderBuilderFactory
spark.cassandra.connection.port 9042
spark.hadoop.cassandra.ssl.enabled false
spark.hadoop.cassandra.auth.kerberos.defaultScheme false
spark.cassandra.connection.host 172.xxx.xxx.xxx
spark.hadoop.cassandra.ssl.optional false
spark.hadoop.cassandra.connection.native.port 9042
spark.hadoop.dse.client.configuration.impl com.datastax.bdp.transport.client.HadoopBasedClientConfiguration
spark.cassandra.connection.factory com.datastax.bdp.spark.DseCassandraConnectionFactory
spark.hadoop.cassandra.config.loader com.datastax.bdp.config.DseConfigurationLoader
spark.sql.hive.metastore.sharedPrefixes com.typesafe.scalalogging
spark.hadoop.dse.system_memory_in_mb 15745
spark.cassandra.dev.customFromDriver com.datastax.spark.connector.types.DseTypeConverter
spark.hadoop.cassandra.partitioner org.apache.cassandra.dht.Murmur3Partitioner
spark.hadoop.cassandra.dsefs.port 5598
sparkbyosdse graphdatabricksdsegraphframes
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

artem.aliev_111061 avatar image
artem.aliev_111061 answered ·

BYOS includes DseGraphFrames: https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/graph/graphAnalytics/dseGraphFrameOverview.html

BYOS does not use DSEFS and DSE for the Hive metastore. (but it allows to access DSEFS data if needed).

So, yes, it will just connect to DSE Graph and will use Databriks resource.

Most of the properties are for security and DSE only feature configuration.

I strongly recommend just copy all properties from byos.conf into your environment, that will prevent you from unexpected exceptions when you will try to use DSE only feature and data types.

If you do not use any authentication and DSE geo types, the bare minimum will be spark.cassandra.connection.host property only.

Example for standalone spark:

spark-shell --conf spark.cassandra.connection.host=10.200.179.132 --jars dse-byos_2.11-6.7.4.jar
scala> import com.datastax.bdp.graph.spark.graphframe._
scala> val g = spark.dseGraph("test")
scala> g.V.show



1 comment Share
10 |1000 characters needed characters left characters exceeded

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

Thanks Artem!

When I had all of the config properties loading, the Databricks cluster kept getting "Hive Metastore unavailable" errors. I'm all good now.

0 Likes 0 · ·