Hi, I have a Python dataframe that I am trying to use explode on a column "transactions_details" that contains array of elements. It takes very long to return data. Is there an alternate for for explode
My current output:
userId username transactions 44782 eyuan object details: [{"expiration": "2022-09-08T00:00:00", "externalItemId": null, "from_sitelocationId": 1821, "itemDescription": "Avastin 2.5MG PINE SILICONE FREE", "itemId": 50679, "lot": "59529", "ndcCode": "50242-0060-01", "ndcCode10Digit": "50242-060-01", "ndcDesc": "Avastin 2.5MG PINE SILICONE FREE", "qty": 1, "to_sitelocationId": 0},{"expiration": "2022-09-08T00:00:00", "externalItemId": null, "from_sitelocationId": 1821, "itemDescription": "Avastin 2.5MG PINE SILICONE FREE", "itemId": 50679, "lot": "59530", "ndcCode": "50243-0060-01", "ndcCode10Digit": "50243-060-01", "ndcDesc": "Avastin 2.5MG PINE SILICONE FREE", "qty": 1, "to_sitelocationId": 0}] modifiedByUsername: null transactionType: 6 transactionTypeName: "Use"
My desired output:
userid username expiration externalItemId from_sitelocationId itemDescription itemId lot ndcCode ndcCode10Digit ndcDesc qty to_sitelocationId 44782 722 2022-09-08T00:00:00 null 1821 Avastin 2.5MG PINE SILICONE 50679 59529 50242-0060-01 50242-060-01 Avastin 2.5MG PINE SILICONE FREE 1 0 44782 722 2022-09-08T00:00:00 null 1821 Avastin 2.5MG PINE SILICONE 50679 59530 50243-0060-01 50243-060-01 Avastin 2.5MG PINE SILICONE FREE 1 0
Here is my code:
df2 = df1.withColumn("transactions_details.itemid", explode(col("transactions.details.itemid")))\ .withColumn("transactions_details.expiration", explode(col("transactions.details.expiration")))\ .withColumn("transactions_details.externalitemid", explode(col("transactions.details.externalitemid")))\ .withColumn("transactions_details.from_sitelocationid", explode(col("transactions.details.from_sitelocationid")))\ .withColumn("transactions_details.itemDescription", explode(col("transactions.details.itemDescription")))\ .withColumn("transactions_details.lot", explode(col("transactions.details.lot")))\ .withColumn("transactions_details.ndcCode", explode(col("transactions.details.ndcCode")))\ .withColumn("transactions_details.ndcCode10Digit", explode(col("transactions.details.ndcCode10Digit")))\ .withColumn("transactions_details.ndcDesc", explode(col("transactions.details.ndcDesc")))\ .withColumn("transactions_details.qty", explode(col("transactions.details.qty")))\ .withColumn("transactions_details.to_sitelocationid", explode(col("transactions.details.to_sitelocationid")))\ .drop("transactions") display(df2)