top | item 45454733

Show HN: FauxSpark – An Apache Spark Simulator Using SimPy

1 points| dadbod | 5 months ago |github.com | reply

Hi HN,

I've built FauxSpark. A discrete event simulation of Apache Spark, built with SimPy.

It's designed to let users experiment with and understand the runtime characteristics of Apache Spark workloads under different cluster configurations, failures & job schedules without spawning a real cluster.

In this initial version, FauxSpark implements a simplified version of Apache Spark which includes:

- DAG scheduling with stages, tasks, and dependencies

- Automatic retries of tasks & stages on executor failure

- Stage resubmission on shuffle-fetch failures

- Basic shuffle read (like really simple)

- Runs a single job at a time

- A simple CLI with a few knobs to configure cluster, simulate failures, scale up etc,.

Repo → https://github.com/fhalde/fauxspark

I'd appreciate your feedback and tips from anyone into discrete event simulation (DES).

discuss

order

No comments yet.