Hello,
I have a scala spark batch job that reads data from a C* table and writes results to another. But I want to split these job in 2 parts because some operations require a time overlap and some others don't.
The first part is for minute aggregation (without overlap to reduce the volume of data read & processed) and second part for other controls stored in additional columns (with an time overlap). But this mean that the second job is reading and writing data from/to the same table, and for some reason, it seems to be much less effective than the previous job which does everything at once in different tables.
Should I avoid to reading & writing from/to the same C* table in a single job ? Or it should'nt be an issue and in that case, why can it be so slow ? I tried using a temporary table in between the 2 jobs, it's much faster now (about 6 times faster).