What is the difference betweendf.cache()anddf.persist()in Spark DataFrame?
A.
Bothcache()andpersist()can be used to set the default storage level (MEMORY_AND_DISK_SER)
B.
Both functions perform the same operation. Thepersist()function provides improved performance asits default storage level isDISK_ONLY.
C.
persist()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK_SER) andcache()- Can be used to set different storage levels to persist the contents of the DataFrame.
D.
cache()- Persists the DataFrame with the default storage level (MEMORY_AND_DISK) andpersist()- Can be used to set different storage levels to persist the contents of the DataFrame
df.persist()allows specifying any storage level such asMEMORY_ONLY,DISK_ONLY,MEMORY_AND_DISK_SER, etc.
By default,persist()usesMEMORY_AND_DISK, unless specified otherwise.
[Reference:Spark Programming Guide - Caching and Persistence, ]
Contribute your Thoughts:
Chosen Answer:
This is a voting comment (?). You can switch to a simple comment. It is better to Upvote an existing comment if you don't have anything to add.
Submit