List: Data | Curated by Sofiene Ben Khemis

Feb 3, 2025
26 stories
Data
Xiang (Ivy) Li
How I Mastered the ‘Databricks Certified Associate Developer for Apache Spark 3.0’If you know nothing about Spark but are interested in it, this article will take you from a beginner to a Spark master, step by step
Feb 3
Feb 3
In
Dev Genius
by
Prem Vishnoi(cloudvala)
Top 10 Spark Tuning Techniques for Efficient Data ProcessingMaster Spark performance optimization with these very important tuning or optimisation techniques.
Aug 25, 2024
1
Aug 25, 2024
1
In
Towards Dev
by
Avin Kohale
Nuances of Data Engineering ft. Spark and DatabricksMy collection of bad interview experiences wrapped up in a blog🥲
Jan 23
1
Jan 23
1
Avin Kohale
Spark — Beyond basics: Required Spark memory to process 100GB fileProcessing 100GBs file is a cake walk for spark ONLY if you know how to assign spark memory efficiently! Read to know more.
Aug 1, 2024
11
Aug 1, 2024
11
Jai Singh
Configuring Executors, Cores, and Memory for Spark : A Practical Visualisation GuideSetting up a Spark application on YARN can be tricky — especially when it comes to deciding on the right numbers for executors, cores, and…
Nov 6, 2024
1
Nov 6, 2024
1
In
SelectFrom
by
Wasurat Soontronchai
Spark Performance Tuning: SpillWhat happens when data is overload your memory in Spark?
Mar 19, 2022
2
Mar 19, 2022
2
Archana Goyal
Spark Interview Guide: Must-Know Multiple-Choice Questions with AnswersMy articles are open to everyone; non-member readers can read the full article by clicking this link.
Sep 1, 2024
Sep 1, 2024
Archana Goyal
Adaptive Query Execution (AQE) in Apache Spark 4.0 : Revolutionizing Query OptimizationAs big data processing advances, the demand for smarter and more efficient query optimization has never been greater.
Aug 25, 2024
3
Aug 25, 2024
3
Naveen Kumar
Tuning Spark Optimization: A Guide to Efficiently Processing 1 TB DataThe aim of this article is to provide a practical guide on how to tune Spark for optimal performance, focusing on partitioning strategy…
Oct 3, 2024
2
Oct 3, 2024
2
Anand Satheesh
Apache Spark Commonly seen errors in production and their solutions.Apache Spark is a powerful tool for big data processing, it uses distributed data processing in memory to reduce the execution time…
Jul 1, 2024
Jul 1, 2024
In
Google Cloud - Community
by
Nathan Brami
Implementing Incremental Strategies with DataformOverview and Prerequisites:
Oct 16, 2024
1
Oct 16, 2024
1
In
Data Engineer Things
by
Vu Trinh
I spent 6 hours learning how Apache Spark plans the execution for us.Catalyst, Adaptive Query Execution, and how Airbnb leverages Spark 3.
Sep 11, 2024
1
Sep 11, 2024
1
Siva Ilango
Principles of Data layers in Data PlatformData organizing principles are vital when we build the data platform to enable data maturity for the business.
Sep 9, 2023
2
Sep 9, 2023
2
Oindrila Chakraborty
Different Types of “Join Strategies” in “Apache Spark”What is “Join Selection Strategy”?
Oct 6, 2023
5
Oct 6, 2023
5
Zaid Erikat
Apache Spark — Repartitioning 101What is Repartitioning?
May 5, 2023
2
May 5, 2023
2
Vishal Barvaliya
How Many Partitions Will Be Created for a 10 GB File?Access this blog for free…
Aug 18, 2024
3
Aug 18, 2024
3
Ankush Singh
What: All About Bucketing and Partitioning in SparkSpark is an open-source distributed computing system that has gained significant traction in the big data space for its ability to handle…
Jun 13, 2023
2
Jun 13, 2023
2
In
Data Engineer Things
by
Vu Trinh
How does Uber handle petabytes of Spark shuffle data every day?The Remote External Service (RSS)
Jun 22, 2024
1
Jun 22, 2024
1
In
Google Cloud - Community
by
Vishal Bulbule
Restore deleted data from BigQuery using time travel feature🚀⏰✨Introduction
Jul 10, 2023
2
Jul 10, 2023
2
Suffyan Asad
Handling Data Skew in Apache Spark: Techniques, Tips and Tricks to Improve PerformanceDiscover how to detect and mitigate data-skew in Spark. Learn about the impact of data-skew and how to detect and fix it!
Jan 30, 2023
4
Jan 30, 2023
4