top | item 41499602

(no title)

ignoreusernames | 1 year ago

From the announcement “As of now, we have mined 1,580 PySpark tests from the Spark codebase, among which 838 (53.0%) are successful on Sail. We have also mined 2,230 Spark SQL statements or expressions, among which 1,396 (62.6%) can be parsed by Sail”

Kinda early to call this a drop in replacement with those numbers no?

But, with enough parity this project could be a dream for anybody dealing with spark’s dreadful performance. Kudos to the team

discuss

order

Kydlaw|1 year ago

The next paragraph explains that: "When looking at the test coverage numbers alone, Sail’s capability may seem limited. But we have found that there is a long tail of failed tests due to formatting discrepancies, edge cases, and less-used SQL functions, which we will continue tackling in future releases."

I am with you that it is still very very early. I'll personally keep an eye on the project.

SpicyLemonZest|1 year ago

I'll keep an eye on it too, but for a query engine formatting compliance and edge cases tend to be almost all of the work. It's easy to implement SELECT x FROM y WHERE z.

bburnett44|1 year ago

Yeah but the website literally says “zero code changes”. It’s the long tail that’s dangerous since most people don’t understand it as well as a the core functions