Download the archive of our testing infrastructure or clone it as follows:
.../ae $ git clone https://github.com/plum-umd/adaptive-concretization.git
This infrastructure is tested under Python 2.7.x.
If you are using Python 3.2/3.3, replace the usage of subprocess32
at line 8 in psketch.py
with subprocess
.
If you are using Python 2.x and want to examine the back-end behavior for
various randdegrees, which corresponds to Table 1 in the paper, you need to install
subprocess32.
If you are not using Python 2.7.9 or Python 3.4 and later, you need to install
pip first.
On Debian and Ubuntu,
$ sudo apt-get install python-dev python-pip
On Mac OS X,
$ sudo easy_install pip
On other OSes, download get-pip.py and run the following (which may require administrator access):
$ (sudo) python get-pip.py
Once pip
is ready, install subprocess32
:
$ sudo pip install subprocess32
To post-analyze Sketch output, we use SciPy.
On Debian and Ubuntu,
$ sudo apt-get install python-numpy python-scipy python-matplotlib
On Mac OS X,
sudo pip install numpy scipy matplotlib
We thoroughly tested our algorithm by running Sketch over 12K times across different sets of benchmarks, different degrees, and different numbers of cores. Needless to say, it might be very painful if we run test cases; post-analyze outputs; and depict tables, like Tables 2 and 3 in the paper, by hand. The main purpose of this testing infrastructure is to conduct intensive experimental evaluations in an automated way. Our first step towards such automation is using database to save and retrieve experiment results easily. (You can still see raw data under data/ folder.)
On Debian and Ubuntu,
$ sudo apt-get install mysql-server mysql-client
On Mac OS X,
$ brew install mysql
$ mysql.server restart
The prompt (or install GUI) will ask you for a password for the MySQL root user. (This is not the system root password.) Type whatever password you want. (Note that this will be used later.)
Enter current password for root (enter for none):
OK, successfully used password, moving on...
Once you've installed MySQL, you need to activate it and run the MySQL
set up script. (Mac users with brew
can skip to the second
command below.)
$ sudo mysql_install_db
$ sudo /usr/bin/mysql_secure_installation
Then, the prompt will ask several options you can configure. Since we are using this database for experiment purpose only, we recommend you to set it up in the most secure manner, for example, no anonymous access and no remote access; the easiest way is just to say Yes to all options (except for changing the root password):
...
Remove anonymous users? [Y/n] Y
... Success!
...
Disallow root login remotely? [Y/n] Y
... Success!
...
Remove test database and access to it? [Y/n] Y
- Dropping test database...
... Success!
- Removing privileges on test database...
... Success!
...
Reload privilege tables now? [Y/n] Y
... Success!
Access to the db server and create an account for this experiment:
$ mysql --user=root -p -h 127.0.0.1 mysql
password: *******
mysql> CREATE USER 'sketchperf'@'localhost';
mysql> GRANT ALL PRIVILEGES ON *.* TO 'sketchperf'@'localhost';
mysql> exit
Bye
Access to the db server with that account and generate a database:
$ mysql --user=sketchperf -h 127.0.0.1
mysql> CREATE DATABASE concretization;
mysql> SHOW DATABASES;
mysql> exit
Bye
From now on, you can connect to that database like this:
$ mysql --user=sketchperf -h 127.0.0.1 concretization
In case you want to use different account and database names,
use options --user
and --db
in db.py
accordingly.
.../ae/adaptive-concretization $ ./db.py --user user_name --db db_name ...
To connect to the database, you need to install MySQL connector.
On Debian and Ubuntu,
$ sudo apt-get install python-mysql.connector
On Mac OS X,
$ sudo pip install --allow-external mysql-connector-python mysql-connector-python
Inside the database, we will have the following tables:
db.py
to see the schemas for aforementioned experiment tables.
You can easily re/define and clean tables in that database:
.../ae/adaptive-concretization $ ./db.py -c init [-v]
.../ae/adaptive-concretization $ ./db.py -c clean [-v]
.../ae/adaptive-concretization $ ./db.py -c reset [-v]
You can also see detailed queries when running db.py
with
verbosity option -v
.
(This option is applicable for all the other commands explained below.)
All those commands are guarded by either IF EXISTS
or
IF NOT EXISTS
, and thus it is safe to execute under
any conditions.
There is a bunch of options in run.py
through which you can specify
benchmark(s), number of cores, randdegree(s) of interest, and which strategies
(such as vanilla Sketch, fixed randdegree, adaptive concretization) to use.
For your (and our) convenience, there are configuration files,
where we can literally configure all the experiments we want to conduct.
config.short.json
(for short version) is a subset of
config.full.json
(for full version of experiments, respectively),
and run.py
uses config.short.json
by default.
One thing that is missed in those configuration files is
the number of times to repeat the experiments: -r n
.
All Sketch output
files will be placed under data/
folder.
You can check those outputs by running post.py
;
register those post-analyzed outputs to the database via
-c register
command in db.py
;
or run and register at the same time
(--register
option in run.py
).
Once the experiments are done, you can calculate the statistics
via -c stat
command in db.py
.
(All detailed uses will be explained again with concrete examples.)
If you are running only the experiments specified in the configuration, just follow these instructions:
.../ae/adaptive-concretization $ ./clean.sh
.../ae/adaptive-concretization $ ./run.py -r 3
.../ae/adaptive-concretization $ ./run.py -r 3 --vanilla
.../ae/adaptive-concretization $ ./db.py -c register -s
.../ae/adaptive-concretization $ ./db.py -c register
.../ae/adaptive-concretization $ ./db.py -c stat -s
.../ae/adaptive-concretization $ ./db.py -c stat
The second-to-last command will retrieve the statistics for Table 1, while the last command will show you the statistics for Tables 2 and 3. For non-adaptive concretization, which is the last column in Table 2, please refer to the last part of the experiments below. The following instructions will explain detailed experiments step by step.
The first experiment is about examining Sketch back-end behavior with various
randdegrees, to show the hypothesis that optimal randdegree varies
from benchmark to benchmark. run.py
has an option, -s
,
that runs back-end instances in parallel
via python
wrapper: psketch.py
.
That wrapper simply forks the certain number of back-end instances that you specify,
and allows them to run without any interruptions.
That is, it will not do anything clever to manage them,
e.g., stopping all other instances if one found a solution.
(On the other hand, parallel running via Sketch front-end, which will be used for
all the other experiments below, manages the instances of back-end and
stops all active instances if one found a solution.)
In this regard, we can properly collect empirical success rates.
.../ae/adaptive-concretization $ ./clean.sh
.../ae/adaptive-concretization $ ./run.py -s -r n
You can run particular benchmarks with particular degrees:
.../ae/adaptive-concretization $ ./run.py -s -r n [-b name]* [-d degree]*
.../ae/adaptive-concretization $ ./run.py -s -r 1024 -b p_color -b deriv2 -d 64 -d 1024
Check and register Sketch outputs as follows.
(Notice that -s
option is required for other scripts.)
.../ae/adaptive-concretization $ ./post.py -s
.../ae/adaptive-concretization $ ./db.py -c register -s [-e EID]
As mentioned earlier, you can run and register at the same time,
but it is applicable to -s
option only:
.../ae/adaptive-concretization $ ./run.py -s -r n --register
You can retrieve the statistics from the database:
.../ae/adaptive-concretization $ ./db.py -c stat -s [-e EID] [--detail] [-v]
The next experiment is about parallel scalability of our algorithm.
To obtain statistics about plain Sketch (as in Tables 2 and 3),
use --vanilla
option in run.py
.
.../ae/adaptive-concretization $ ./run.py --vanilla -r n [...]
.../ae/adaptive-concretization $ ./post.py
.../ae/adaptive-concretization $ ./db.py -c register [-e EID]
.../ae/adaptive-concretization $ ./db.py -c state [-e EID]
To obtain statistics about our algorithm,
specify the strategy: --strategy WILCOXON
.
Otherwise, it will repeat the first experiment using Sketch front-end's parallel running.
(This happens in order to encompass all experiments settings in one.)
.../ae/adaptive-concretization $ ./run.py --strategy WILCOXON -r n [...]
(post-analysis, registration, and statistics steps are same as above.)
From the statistics, you can learn which degree was mostly chosen at the adaption phase. To test the efficiency of the adaption phase, we can run Sketch with that particular degree, assuming we already knew it:
.../ae/adaptive-concretization $ ./run.py --strategy NOT_SET -d degree -r n [...]
(post-analysis, registration, and statistics steps are same as above.)
The only difference between short and full version of experiments is
how many (big) benchmarks are used.
(If you skipped the previous section, please go back and read through it.)
Simply, config.full.json
includes all benchmarks we've used
for our paper. To make run.py
use a different configuration
file, you need to pass the following option:
--config config.full.json
. For example,
.../ae/adaptive-concretization $ ./run.py --config config.full.json -r n [--timeout t] [...]
(post-analysis, registration, and statistics steps are same as above.)
Notice that you may need to set timeout because Sketch might take too long in some cases
(e.g., plain Sketch for p_menu
).
In sum, the simplest way to conduct the experiments is using the configuration
as follows:
.../ae/adaptive-concretization $ ./clean.sh
.../ae/adaptive-concretization $ ./run.py --config config.full.json -r 13
.../ae/adaptive-concretization $ ./run.py --config config.full.json -r 13 --vanilla
.../ae/adaptive-concretization $ ./db.py -c register -s
.../ae/adaptive-concretization $ ./db.py -c register
.../ae/adaptive-concretization $ ./db.py -c stat -s
.../ae/adaptive-concretization $ ./db.py -c stat
Once Sketch outputs were collected, post-analyzed, and registered,
we can retrieve detailed statistics from the database.
The first experiment about back-end behaviors for various randdegrees
shows that optimal degrees vary from benchmark to benchmark.
Statistics about this experiment (with -s
option)
would look like as follows:
benchmark: name, degree: d
success rate: x / n
lucky runs' solution time: Mean (Var)
quantiles: [ 0% | 25% | 50% | 75% | 100% ]
unlucky runs' solution time: Mean (Var)
quantiles: [ 0% | 25% | 50% | 75% | 100% ]
...
& \timeout{} & \timeout{} & \mso{Median}{SIQR}{} & ...
where SIQR stands for Semi-InterQuartile Range.
For each benchmark and degree, we measured empirical success rate
and both successful and failed runs' elapsed times. Based on those,
we calculated expected running time and printed them out in a TeX format:
\timeout{}
means zero success rate,
and \mso{}{}{}
is one of our own TeX commends
that represent median and SIQR together.
The second experiment compares our adaptive concretization against plain Sketch and shows its scalability by varying the number of cores. Statistics about this experiment would look like as follows:
benchmark: name, core: c, strategy: S
success rate: x / n
TRIAL: Mean (Var)
quantiles: [ 0% | 25% | 50% | 75% | 100% ]
CTIME: Mean (Var)
quantiles: [ 0% | 25% | 50% | 75% | 100% ]
FTIME: Mean (Var)
quantiles: [ 0% | 25% | 50% | 75% | 100% ]
STIME: Mean (Var)
quantiles: [ 0% | 25% | 50% | 75% | 100% ]
TTIME: Mean (Var)
quantiles: [ 0% | 25% | 50% | 75% | 100% ]
DEGREE: Mean (Var)
quantiles: [ 0% | 25% | 50% | 75% | 100% ]
degree histogram
d: n (n')
...
where FTIME, STIME, TTIME, and CTIME stand for
failed, successful, wall-clock, and CPU times.
(CTIME was missed in the dumped database for the submitted version.)
In the degree histogram, n (n')
means
n' out of n selections were successful.
At the end, the script will print out statistics 'per benchmark' in a TeX format:
Strategy1 & degree1 & column1 & \mso{Median}{SIQR}{}
Strategy1 & degree1 & column2 & \mso{Median}{SIQR}{}
...
Strategy1 & degree2 & column1 & \mso{Median}{SIQR}{}
Strategy1 & degree2 & column2 & \mso{Median}{SIQR}{}
...
Strategy2 & degree1 & column1 & \mso{Median}{SIQR}{}
...
where possible strategies are: VANILLA, FIXED, and WILCOXON.
You can download our database dumps and check data inside it:
$ mysql --user=sketchperf -h 127.0.0.1
mysql> CREATE DATABASE new_db_name;
mysql> exit
Bye
$ mysql -u sketchperf new_db_name < dump.sql
... $ mysql -u sketchperf smooth < fmsd_submission.sql
.../ae/adaptive-concretization $ ./db.py --db smooth -c stat -e 1
.../ae/adaptive-concretization $ ./db.py --db smooth -c stat -e 11 -s
.../ae/adaptive-concretization $ ./db.py --db smooth -c stat -e 9
.../ae/adaptive-concretization $ ./db.py --db smooth -c stat -e 19 -s
...
-s
)... $ mysql -u sketchperf cav2015_final < cav2015_final.sql
.../ae/adaptive-concretization $ ./db.py --db cav2015_final -c stat -e 11 -s
.../ae/adaptive-concretization $ ./db.py --db cav2015_final -c stat -e 1
.../ae/adaptive-concretization $ ./db.py --db cav2015_final -c stat -e 2
.../ae/adaptive-concretization $ ./db.py --db cav2015_final -c stat -e 3
-s
)... $ mysql -u sketchperf cav2015_submission < cav2015_submission.sql
.../ae/adaptive-concretization $ ./db.py --db cav2015_submission -c stat -e 11 -s
.../ae/adaptive-concretization $ ./db.py --db cav2015_submission -c stat -e 2
.../ae/adaptive-concretization $ ./db.py --db cav2015_submission -c stat -e 3
.../ae/adaptive-concretization $ ./db.py --db cav2015_submission -c stat -e 4
.../ae/adaptive-concretization $ ./db.py --db cav2015_submission -c stat -e 9