My development team experiences mkdirs authorization failures when writing out csv or parquet from spark to a NAS drive.
catCols.coalesce(1).write.csv("file:///mycompany/testcase/tmp/quick/pushdatazzz") Caused by: java.io.IOException: Mkdirs failed to create file:/mycompany/testcase/tmp/quick/pushdatazzz/_temporary/0/_temporary/attempt_20191118173538_0003_m_000000_9 (exists=false, cwd=file:/apps/cassandra/data/data2/spark/rdd/app-20191118173446-0234/0) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:450) at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:435) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:909) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:890) at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:787) at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStream (CodecStreams.scala:81) at org.apache.spark.sql.execution.datasources.CodecStreams$.createOutputStreamWriter(CodecStreams.scala:92) at org.apache.spark.sql.execution.datasources.csv.CsvOutputWriter.<init>(CSVFileFormat.scala:135) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anon$1.newInstance(CSVFileFormat.scala:77) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.newOutputWriter(FileFormatWriter.scala:303) at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:312)
[fakeuser@fakenode quick]$ ls -al total 36 drwxrwxrwt 8 fakeuser fakegroup 4096 Nov 18 17:35 . drwxrwxrwx 4 500 500 8192 Nov 18 15:56 .. drwxr-xr-x 2 fakeuser fakegroup 4096 Nov 18 17:35 pushdatazzz
No subdirectories are created whatsoever.
We would expect spark to create temporary directories from the nodes where the query executed and coalesce them into one file. If we did not coalesce, we would expect multiple files in this directory.
I read miscellaneous narratives searching for answers that mentioned
--conf spark.hadoop.mapreduce.fileoutputcommitter.algorithm.version=2 ... but this was of no help.
Has anyone else faced this dilemna? Are we expected to write to dsefs first and then get the file to local?