JoshPerryman avatar image
JoshPerryman asked JoshPerryman commented

In a BYOS approach, what is the minimum config required to support DseGraphFrames?

I am attempting to configure Databricks with DSE BYOS. My Databricks Spark instances can see our Postgres. I'd like to connect them also to DSE Graph so that I can set up a data load pipeline from Postgres to DSE Graph.

So I'm wondering if I really need the DSEFS (hadoop) functionality. Of the various spark configuration items from the BYOS commands, what are the minimal ones required to support access to DSE Graph?

For example, my look's like the following. Does BYOS have to use DSEFS and DSE for the Hive metastore? Can we just connect our Databricks Spark to Cassandra & DSE Graph and let Databricks use its own resources for the rest, or is DseGraphFrames functionality dependent upon access to DSEFS?
spark.hadoop.cassandra.auth.kerberos.enabled false
spark.cassandra.auth.conf.factory com.datastax.bdp.spark.DseByosAuthConfFactory
spark.hadoop.fs.dsefs.impl com.datastax.bdp.fs.hadoop.DseFileSystem
spark.sql.extensions com.datastax.spark.connector.DseSparkExtensions /path/to/cassandra/advrep com.datastax.bdp.fs.hadoop.DseRestClientAuthProviderBuilderFactory
spark.cassandra.connection.port 9042
spark.hadoop.cassandra.ssl.enabled false
spark.hadoop.cassandra.auth.kerberos.defaultScheme false
spark.hadoop.cassandra.ssl.optional false
spark.hadoop.cassandra.connection.native.port 9042
spark.hadoop.dse.client.configuration.impl com.datastax.bdp.transport.client.HadoopBasedClientConfiguration
spark.cassandra.connection.factory com.datastax.bdp.spark.DseCassandraConnectionFactory
spark.hadoop.cassandra.config.loader com.datastax.bdp.config.DseConfigurationLoader
spark.sql.hive.metastore.sharedPrefixes com.typesafe.scalalogging
spark.hadoop.dse.system_memory_in_mb 15745 com.datastax.spark.connector.types.DseTypeConverter
spark.hadoop.cassandra.partitioner org.apache.cassandra.dht.Murmur3Partitioner
spark.hadoop.cassandra.dsefs.port 5598
sparkdatabricksbyosdse graphdsegraphframes
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

artem.aliev_111061 avatar image
artem.aliev_111061 answered JoshPerryman commented

BYOS includes DseGraphFrames:

BYOS does not use DSEFS and DSE for the Hive metastore. (but it allows to access DSEFS data if needed).

So, yes, it will just connect to DSE Graph and will use Databriks resource.

Most of the properties are for security and DSE only feature configuration.

I strongly recommend just copy all properties from byos.conf into your environment, that will prevent you from unexpected exceptions when you will try to use DSE only feature and data types.

If you do not use any authentication and DSE geo types, the bare minimum will be property only.

Example for standalone spark:

spark-shell --conf --jars dse-byos_2.11-6.7.4.jar
scala> import com.datastax.bdp.graph.spark.graphframe._
scala> val g = spark.dseGraph("test")

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

JoshPerryman avatar image JoshPerryman commented ·

Thanks Artem!

When I had all of the config properties loading, the Databricks cluster kept getting "Hive Metastore unavailable" errors. I'm all good now.

0 Likes 0 ·