question

JoshPerryman avatar image
JoshPerryman asked Erick Ramirez edited

Include application.conf with spark-submit?

Is it possible to submit an application configuration file without compiling it into the jar as a resource?

I am using the marvelous Typesafe Config (https://github.com/lightbend/config) with my data loading Spark application to load data into DSE Graph. I have tried various approaches such as:

Just include the conf file

dse spark-submit  --class com.myorg.dataloader.MySparkLoader --files ./application.conf ./dataloader-spark-1.0.0.jar 

which gets:

Exception in thread "main" java.lang.IllegalArgumentException: Graph 'default_graphname' does not exist

indicating it used the resources/reference.conf file.

Include conf file and tell the driver

dse spark-submit  --class com.myorg.dataloader.MySparkLoader --files ./application.conf  --driver-java-options "-Dconfig.resource=application.conf" ./dataloader-spark-1.0.0.jar 

which gets:

Exception in thread "main" java.lang.ExceptionInInitializerError
....
Caused by: java.io.IOException: resource not found on classpath: application.conf
at com.typesafe.config.impl.Parseable$ParseableResources.rawParseValue(Parseable.java:726)

indicating it didn't find the actual file

Try loading from DSEFS

dse spark-submit  --class com.myorg.dataloader.MySparkLoader --files ./application.conf  --driver-java-options "-Dconfig.url=dsefs://application.conf" ./dataloader-spark-1.0.0.jar 

which gets:

Exception in thread "main" java.lang.ExceptionInInitializerError
....
Caused by: java.net.MalformedURLException: unknown protocol: dsefs
at java.net.URL.<init>(URL.java:607)

which just made me sad.


Has someone solved this already for DSE Analytics? If so, would you please share your syntax? I would prefer not to have to load the config into DSEFS, but I'm open to doing that if it gets me the ability to configure spark applications from the command line.


Final note: I do use command line arguments for configuration in other places, but that is impractical with this particular application. My most basic config runs ~200 lines.

analytics
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

1 Answer

Russell Spitzer avatar image
Russell Spitzer answered JoshPerryman commented

Depending on what you need to do, all of these are close. If the configuration class only needs to be read on the Driver and you are running in client mode, then it's enough to launch the app with


--driver-java-options "-Dconfig.resource=application.conf"


But you must also make sure that the application.conf file is on the classpath

--driver-class-path "./"

Assuming you are submitting from where the application.conf file is located




If you are running in Cluster Mode or require the config to also be accessible on the Executors then you'll also need

--files ./application.conf


as well as

spark.executor.extraClassPath ./


You can use the "driver" extraClassPath thing as well here.




For more info on the other failures, the files cannot be in a non-native filesystem because the Config loader doesn't use a HDFS file reading api.

and


--files only sends to the working directory of executors and won't forcibly put them on classpath I believe.

1 comment Share
10 |1000

Up to 8 attachments (including images) can be used with a maximum of 1.0 MiB each and 10.0 MiB total.

JoshPerryman avatar image JoshPerryman commented ·

Thanks Russ! That was just what I needed.

I'm on a single-node setup at present, so the following proved sufficient:

dse spark-submit  --class com.myorg.dataloader.MySparkLoader --driver-class-path "./"  --driver-java-options "-Dconfig.resource=application.conf" ./dataloader-spark-1.0.0.jar

Eventually I'll try in clustered mode and expect that I've now got enough to make that work. Happy Holidays!

0 Likes 0 ·