Test Automation Framework

<!--

Testing Scope
What do we need to test? Everything possible. The testing framework needs to be able to test countless configurations of plugins, engines, configuration variables, performance scenarious, and more. So, the framework needs to abstract away these variations so that the test-writer can focus on writing the test and not making the test specific to one plugin or engine.

Types of Tests
Types of tests:


 * Functional/Use Case: test of a series of actions which should have a determinate behaviour
 * For instance, test that a CREATE TABLE, followed by a DROP TABLE, followed by a TRUNCATE TABLE results in an error.
 * Unit Test: a test of a code section or class. The test tests whether the publicly described interface works as described, and that inputs and outputs of the interface are correct
 * For example, a unit test of the Lex class would test all public methods for proper input and output, and verify that state changes are consistent with the published API
 * Bug Fix test: a test which verifies a bug's behaviour and demonstrates the correct behaviour.
 * For example, if a bug is issued which says that GROUP_CONCAT does not function as advertised, the test case is written to first verify the behaviour, and then, when the bug is fixed in the code, that the code fixes the bug.
 * Performance test: a test which tracks the regression or increase in performance/scalability of the server over time
 * Stress test: a test which attempts to overload the server or determine the breaking points of the server

Areas to Test
Which test suites should be built?


 * SQL
 * Testing of SQL syntax
 * Testing of SELECT results matching expected
 * Storage Engine API
 * Replication
 * Client/Server Communication
 * Transactions
 * Information Schema

Test Automation Framework
For designing a framework, various elements need to be taken into consideration. Some of them are:


 * What actions need to be commonly performed per test run
 * Communication with additional automation tools
 * Communicating between test client and various other clients and servers
 * Logging of the test run
 * How Errors and Warnings should be handled?
 * Standardized input and output of tests
 * How to deal with dependencies between tests?

Directory and Filename Standard
The test automation framework will structure directories and files into the following organization:

/runner                        # Houses the test automation framework /runnerlib                   # Library files for the framwork /tests                     # Unit tests for the runner library /tests                       # Root directory for all test files /sql                       # The "sql" suite tests alias.test               # The "alias" test in the "sql" suite analyze.test             # The "analyze" test in the "sql" suite bigint.test              # The "bigint" test in the "sql" suite /storage                   # The "storage" suite tests /myisam                  # The "storage/myisam" sub-suite tests key_buffer.test        # The "key_buffer" test in the "storage/myisam" suite /results                     # Root directory for all result files /sql                       # The "sql" suite results alias.result             # Results file for the "alias" test in the "sql" suite analyze.result           # Results file for the "analyze" test in the "sql" suite bigint.innodb.result     # Results file for the "bigint" test in the "sql" suite for InnoDB-specific results bigint.myisam.result     # Results file for the "bigint" test in the "sql" suite for MyISAM-specific results /storage                   # The "storage" suite results /myisam                  # The "storage/myisam" sub-suite results key_buffer.test        # Results file for the "key_buffer" test in the "storage/myisam" suite

The test runner, when given the following arguments:

$> drizzle_test_runner --suite=sql run bigint

Will automatically detect that there is expected differences between runs of the bigint.test file and will open the bigint.innodb.result and bigint.myisam.result file to determine how to configure each test run.

Functional Test Case Input Format
The existing system of using separate files for tests and for results will be kept. After considering a single file approach, it was determined that testing multiple engines, plugins, and options would make the single file approach untenable, as too many differences would need to be tested in a single test file.

The test case format is proposed as follows:

aggregate_no_rows.test: SETUP { DROP TABLE IF EXISTS t1; CREATE TABLE t1 (id INT NOT NULL); }
 * 1) Tests that COUNT(*), AVG, MIN, MAX on a table
 * 2) with no rows returns correct results
 * 1) with no rows returns correct results

TEARDOWN { TRUNCATE t1; }

TEST (count) { SELECT COUNT(*) FROM t1; }

TEST (max) { SELECT MAX(id) FROM t1; }

TEST (min) { SELECT MIN(id) FROM t1; }

TEST (avg) { SELECT AVG(id) FROM t1; }

Each test case file shall contain zero or one "SETUP {}" section, containing test commands run before each test in the test case file is run. Similarly, zero or one "TEARDOWN {}" section may be included that is run after each test in the test case file is run.

Comments are any line which begins with a # symbol

A "TEST(test_name) {}" section indicates a single test in the test case. Inside the parentheses, you should put a descriptive name for the single test. The name of the test will default to test_N where N is the ordinal position of the test in the test case file.

Within each test in a test case file, you may place one or more test commands along with SQL commands for the runner to execute in the test. Test commands include the following:

--execute_sql_from_file filepath; # Execute all statements in a supplied file

Other possible commands which may be issued in a test case:

--require_plugin plugin_name;    # Ensure that a plugin is loaded and available in order to run the test case

Functional Test Case Result Format
The new results format will follow a more standard xUnit way of testing (asserting) expected results. Here is an example results file which would accompany the above test:

aggregate_no_rows.result:

RESULT (count) { --begin_result COUNT(*) 0 --end_result }
 * 1) SELECT COUNT(*) FROM t1;
 * 1) SELECT COUNT(*) FROM t1;

RESULT (max) { --begin_result MAX(id) NULL --end_result }
 * 1) SELECT MAX(id) FROM t1;
 * 1) SELECT MAX(id) FROM t1;

RESULT (min) { --begin_result MIN(id) NULL --end_result }
 * 1) SELECT MIN(id) FROM t1;
 * 1) SELECT MIN(id) FROM t1;

RESULT (avg) { --begin_result AVG(id) NULL --end_result }
 * 1) SELECT AVG(id) FROM t1;
 * 1) SELECT AVG(id) FROM t1;

"--assert_xxx" commands simply check that expected output is what we got when we ran the test case. Here is a list of possible commands to be found in a results file:

--assert_sql_error errno;                  # Checks that an expected error is generated by the server --assert_rows num_rows;                    # Checks that the number of results in the output is a certain number --result_diff filepath;                    # Diff a returned result with the contents of a file

Test Case and Results Case Parser
The Drizzle Test Case Syntax has the following BNF grammar:

command_block_list   : command_block_list | command_block command_block        : command_block_header LBRACE command_list RBRACE command_block_header : IDENTIFIER LPAREN IDENTIFIER RPAREN | IDENTIFIER LPAREN RPAREN | IDENTIFIER command_list         : command_list command SEMI | command SEMI command              : IDENTIFIER LPAREN RPAREN | IDENTIFIER LPAREN command_arg_list RPAREN command_arg_list     : command_arg_list COMMA command_arg | command_arg command_arg          : command | literal literal              : LITERAL_STRING | LITERAL_INTEGER | LITERAL_BINARY_INTEGER | LITERAL_HEX_INTEGER | LITERAL_FLOAT

A parser has now been constructed in Python (using the PLY framework for Yacc/Lex compatible parsing). The parser is available in lp:~jaypipes/drizzle/new-test-runner

Using the following test case: SETUP {	EXECUTE_SQL("DROP TABLE IF EXISTS t1"); EXECUTE_SQL("CREATE TABLE t1 (id INT NOT NULL)"); }
 * 1) Tests that COUNT(*), AVG, MIN, MAX on a table
 * 2) with no rows returns correct results

TEARDOWN {	EXECUTE_SQL("TRUNCATE t1"); ASSERT_SQL(1,2,3,4,COUNT(5,3,4)); }

TEST (count) {	EXECUTE_SQL("SELECT COUNT(*) FROM t1"); }

TEST (max) {	EXECUTE_SQL("SELECT MAX(id) FROM t1"); }

TEST (min) {	EXECUTE_SQL("SELECT MIN(id) FROM t1"); }

TEST (avg) {	EXECUTE_SQL("SELECT AVG(id) FROM t1"); }

the parser now constructs this abstract syntax tree representation: [511][jpipes@serialcoder: runnerlib]$ python parser.py Command Block List Command Block Command Block Header Value: ('SETUP', None) Command List Command Value: EXECUTE_SQL Command Arg List Literal Value: DROP TABLE IF EXISTS t1     Command Value: EXECUTE_SQL Command Arg List Literal Value: CREATE TABLE t1 (id INT NOT NULL) Command Block Command Block Header Value: ('TEARDOWN', None) Command List Command Value: EXECUTE_SQL Command Arg List Literal Value: TRUNCATE t1     Command Value: ASSERT_SQL Command Arg List Literal Value: 1 Literal Value: 2 Literal Value: 3 Literal Value: 4 Command Value: COUNT Command Arg List Literal Value: 5 Literal Value: 3 Literal Value: 4 Command Block Command Block Header Value: ('TEST', 'count') Command List Command Value: EXECUTE_SQL Command Arg List Literal Value: SELECT COUNT(*) FROM t1 Command Block Command Block Header Value: ('TEST', 'max') Command List Command Value: EXECUTE_SQL Command Arg List Literal Value: SELECT MAX(id) FROM t1 Command Block Command Block Header Value: ('TEST', 'min') Command List Command Value: EXECUTE_SQL Command Arg List Literal Value: SELECT MIN(id) FROM t1 Command Block Command Block Header Value: ('TEST', 'avg') Command List Command Value: EXECUTE_SQL Command Arg List Literal Value: SELECT AVG(id) FROM t1

Required Actions of Test Runner
Framework development is facilitated using the same set of identified tools. Scripting language supported by the test automation tools is used to create the components. Tool extensibility utility/component can be developed using a different language. In addition to the re-usable components, driver scripts and worker scripts need to be created. The approach for developing re-usable utilities/components should include:


 * Record/Replay
 * Screen/Window/Transaction
 * Action/Keyword
 * Data Driven

Thoughts on a new Test Runner
Basically, the existing test runner (/tests/test-run.pl) is pretty good. A new test runner should built on its foundation and clean it up to make it more extensible. The existing framework does the following things, which should be kept/emulated:


 * Spawn a pool of threads to run individual test cases
 * Allow a developer to run a specific test or a suite of tests

New concepts for a new Test Runner
A "suite" is a collection of tests that check a related feature or functional unit. For instance, "replication" or "transactions"

A "config" is an input to the test runner which sets or unsets a variety of parameters in the test run.

A "type" is the type of test (functional, unit, stress, etc)

Calling the test runner in various formats
./test-runner --suite=replication --type=functional
 * 1) Run the functional replication tests

./test-runner --type=unit suite=replication slave-api
 * 1) Run the "slave-api" unit test in the replication suite

./test-runner --type=functional
 * 1) Run all the functional tests

./test-runner --type=functional --config=MyISAM
 * 1) Run all functional tests for the "MyISAM" configuration

Structure of the Testing Framework
/ # root source directory /tests # root testing directory /runner # location for the actual test runner and framework /functional # location of functional test /performance # location of performance tests /stress # location of stress tests /var # runtime location for test runner data and files

Thought: Unit tests should be in a /unit directory within each directory? For instance, server unit tests should be in /drizzled/unit ?

Existing Test Runner Options
Logging: ./dtr --help

./dtr [ OPTIONS ] [ TESTCASE ]

Options to control what engine/variation to run

compress             Use the compressed protocol between client and server bench                Run the benchmark suite small-bench          Run the benchmarks with --small-tests --small-tables

Options to control directories to use benchdir=DIR         The directory where the benchmark suite is stored (default: ../../mysql-bench) tmpdir=DIR           The directory where temporary files are stored (default: ./var/tmp). vardir=DIR           The directory where files generated from the test run is stored (default: ./var). Specifying a ramdisk or                       tmpfs will speed up tests. mem                  Run testsuite in "memory" using tmpfs or ramdisk Attempts to find a suitable location using a builtin list of standard locations for tmpfs (/dev/shm) The option can also be set using environment variable MTR_MEM=[DIR]

Options to control what test suites or cases to run

force                Continue to run the suite after failure do-test=PREFIX or REGEX Run test cases which name are prefixed with PREFIX or fulfills REGEX skip-test=PREFIX or REGEX Skip test cases which name are prefixed with PREFIX or fulfills REGEX start-from=PREFIX    Run test cases starting from test prefixed with PREFIX suite[s]=NAME1,..,NAMEN Collect tests in suites from the comma separated list of suite names. The default is: "main,binlog,rpl" skip-rpl             Skip the replication test cases. big-test             Set the environment variable BIG_TEST, which can be                        checked from test cases. combination="ARG1 .. ARG2" Specify a set of "mysqld" arguments for one combination. skip-combination     Skip any combination options and combinations files

Options that specify ports

master_port=PORT     Specify the port number used by the first master slave_port=PORT      Specify the port number used by the first slave mtr-build-thread=#   Specify unique collection of ports. Can also be set by                       setting the environment variable MTR_BUILD_THREAD.

Options for test case authoring

record TESTNAME      (Re)genereate the result file for TESTNAME check-testcases      Check testcases for sideeffects mark-progress        Log line number and elapsed time to .progress

Options that pass on options

mysqld=ARGS          Specify additional arguments to "mysqld"

Options to run test on running server

extern               Use running server for tests user=USER            User for connection to extern server

Options for debugging the product

client-ddd           Start drizzletest client in ddd client-debugger=NAME Start drizzletest in the selected debugger client-gdb           Start drizzletest client in gdb ddd                  Start mysqld in ddd debug                Dump trace output for all servers and client programs debugger=NAME        Start mysqld in the selected debugger gdb                  Start the mysqld(s) in gdb manual-debug         Let user manually start mysqld in debugger, before running test(s) manual-gdb           Let user manually start mysqld in gdb, before running test(s) manual-ddd           Let user manually start mysqld in ddd, before running test(s) master-binary=PATH   Specify the master "mysqld" to use slave-binary=PATH    Specify the slave "mysqld" to use strace-client        Create strace output for drizzletest client max-save-core        Limit the number of core files saved (to avoid filling                        up disks for heavily crashing server). Defaults to                       5, set to 0 for no limit.

Options for coverage, profiling etc

gcov                 FIXME gprof                See online documentation on how to use it. valgrind             Run the "drizzletest" and "mysqld" executables using valgrind with default options valgrind-all         Synonym for --valgrind valgrind-drizzletest   Run the "drizzletest" and "drizzle_client_test" executable with valgrind valgrind-mysqld      Run the "mysqld" executable with valgrind valgrind-options=ARGS Deprecated, use --valgrind-option valgrind-option=ARGS Option to give valgrind, replaces default option(s), can be specified more then once valgrind-path=[EXE]  Path to the valgrind executable callgrind            Instruct valgrind to use callgrind

Misc options

comment=STR          Write STR to the output notimer              Don't show test case execution time script-debug         Debug this script itself verbose              More verbose output start-and-exit       Only initialize and start the servers, using the startup settings for the specified test case (if any) start-dirty          Only start the servers (without initialization) for the specified test case (if any) fast                 Don't try to clean up from earlier runs reorder              Reorder tests to get fewer server restarts help                 Get this help text

testcase-timeout=MINUTES Max test case run time (default 15) suite-timeout=MINUTES Max test suite run time (default 180) warnings | log-warnings Pass --log-warnings to mysqld

sleep=SECONDS        Passed to drizzletest, will be used as fixed sleep time

Bogus Tests in Existing Suite
These are tests which we are currently passing that are serving no purpose:

1st.test: --disable_warnings --disable_query_log drop database if exists pbxt; --enable_query_log --enable_warnings
 * We don't have a mysql database, why are we checking that here?
 * 1) Check that we haven't any strange new tables or databases
 * 2) PBXT drop the pbxt database if it exits
 * 1) PBXT drop the pbxt database if it exits
 * 1) PBXT drop the pbxt database if it exits

show databases; show tables in mysql;

bench_count_distinct.test:
 * What the heck is this testing? delay_key_write?  or count(distinct)?
 * 1) Test of count(distinct ..)
 * 1) Test of count(distinct ..)

--disable_warnings drop table if exists t1; --enable_warnings create table t1(n int not null, key(n)) delay_key_write = 1; let $1=100; disable_query_log; while ($1) { eval insert into t1 values($1); eval insert into t1 values($1); dec $1; } enable_query_log; select count(distinct n) from t1; explain extended select count(distinct n) from t1; drop table t1;

-->
 * 1) End of 4.1 tests