Add basic sql benchmark runner for running sql benchmarks#23052
Add basic sql benchmark runner for running sql benchmarks#23052Omega359 wants to merge 7 commits into
Conversation
… with either simple iterations or using criterion.
1cfedab to
1edafef
Compare
|
The test failure seems unrelated and all tests pass locally. |
|
will try and look at this one shortly |
|
CI failure will be fixed via |
alamb
left a comment
There was a problem hiding this comment.
The code looks good to e -- thank you @Omega359 . I also tried it out and it works great
Let me know if you want me to merge this PR now and iterate in follow on PRs or if you want to make changes to this one
Some minor usability feedback (for a follow on PR):
Explicit list command
Right now I can run
nice cargo run --profile=ci --bin benchmark_runner And it will print a nice list of available commands ❤️
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ DATA_DIR=benchmarks/data nice cargo run --profile=ci --bin benchmark_runner --
Finished `ci` profile [unoptimized] target(s) in 0.21s
Running `target/ci/benchmark_runner`
SQL benchmarks:
clickbench 43 queries
...
tpch 22 queries
wide_schema 4 queriesHowever, it was not at all clear to me given the help text:
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ DATA_DIR=benchmarks/data nice cargo run --profile=ci --bin benchmark_runner -- --help
Finished `ci` profile [unoptimized] target(s) in 0.21s
Running `target/ci/benchmark_runner --help`
Run DataFusion SQL benchmarks
Usage: benchmark_runner [OPTIONS] [BENCHMARK]
Arguments:
[BENCHMARK] SQL benchmark group to run
Options:
-q, --query <QUERY>
[env: BENCH_QUERY=]
....
-o, --output <OUTPUT>
Write simple runner results as JSON to this path
--save-baseline <BASELINE>
Save Criterion measurements to the named baseline
-h, --help
Print helpSuggestion: add an alias --list or something:
cargo run --profile=ci --bin benchmark_runner -- --listSimpler command
It would be great if I could use a simpler command line with good defaults.
For example,
nice cargo run --profile=ci --bin benchmark_runner -- clickbench --query 0However, that doesn't work:
...
Running `target/ci/benchmark_runner clickbench --query 0`
Error: Error during planning: No files found at file:///Users/andrewlamb/Software/datafusion/data/hits.parquet. Cannot infer schema from an empty location; either add data files or declare an explicit schema for the table.I had to explicitly specify the DATA_DIR
andrewlamb@Andrews-MacBook-Pro-3:~/Software/datafusion$ DATA_DIR=benchmarks/data nice cargo run --profile=ci --bin benchmark_runner -- clickbench --query 0
Finished `ci` profile [unoptimized] target(s) in 0.20s
Running `target/ci/benchmark_runner clickbench --query 0`
clickbench/Q00 iteration 0: 4.2 ms, 1 rows
clickbench/Q00 iteration 1: 4.4 ms, 1 rows
clickbench/Q00 iteration 2: 4.4 ms, 1 rows| // specific language governing permissions and limitations | ||
| // under the License. | ||
|
|
||
| //! Shared SQL benchmark runner used by `benchmark_runner` and the Criterion |
| #[tokio::main] | ||
| async fn main() { | ||
| env_logger::init(); | ||
| if let Err(error) = sql_benchmark_runner::run_cli().await { |
There was a problem hiding this comment.
Minor nit -- it might make sense to move run_cli and relevant code such as argument processing, main, etc in the actual binary rather than a function in the a crate.
I don't think the argument processing will be used by anything other than this binary
| }, | ||
| } | ||
|
|
||
| #[derive(Debug, Parser)] |
There was a problem hiding this comment.
see comment above -- I recommend moving the criterion / arg parsing out of this module (so this is only the shared implementation with the criterion runner)
| } | ||
|
|
||
| /// Runs the selected SQL benchmarks through a caller-provided Criterion instance. | ||
| pub fn run_criterion_benchmarks_impl( |
There was a problem hiding this comment.
I recommend moving this into benchmarks/benches/sql.rs so that the criterion specific stuff is separated from the core benchmark running logic
|
As usual I'll work on applying the improvements to this PR @alamb rather than followup PR's :) |
I always want to ask though :) |
Which issue does this PR close?
Rationale for this change
Running sql benchmarks using environment variables for configuration is awkward and error prone and strictly using criterion, while statistically much better, is quite slow compared to using simple iterations.
This PR is the first version of a benchmark runner for sql benchmarks that will eventually use arguments for all benchmark configuration options.
What changes are included in this PR?
A simple benchmark runner that can list out the sql benchmarks and run a benchmark using iterations or criterion allowing for specifying a single query if desired.
Future enhancements will use arguments for benchmark configuration vs just using environment variables as well as providing help and tying this into bench.sh
Are these changes tested?
Yes. I have a script that tests all current sql benchmarks both with and without criterion. Here is an portion of it for the single clickbench benchmark:
The existing
cargo benchapproach still works the same (criterion only):Are there any user-facing changes?
No.