Pypi Testing Infrastructure
- General designs
- existing solutions
- weekly meeting report
- Gsoc 2011
Pypi Testing Infrastructure
The Python Package Index (namely PyPI) allows anyone to upload projects. This testing infrastructure wants to provide a way to analyse the distributions available at pypi using metrics such as test coverage, test results, PEP8 etc. as well as feedback on the installation (does it went well? are some weird files modified? etc.)
This wiki page defines the features that will be part of this testing infrastructure, as well as the metrics that will be used.
Two students (Boris and Yeswanth) are currently working on this project. You can follow the overall progress on their respective blogs:
- Raw data: the data generated by tasks execution.
- Report: evaluation of the different features/attributes of the data.
- Task: execution which produce raw data and "output". eg build, install, unittest, pylint...
The entire project has been divided into two parts - Environment part and Execution part . Each part has been detailed below.
- The environment part of the project is responsible for creating an abstraction for the execution part . It handles delivery of distributions (and its dependencies), to the execution part( to run tests on them). It handles all the protocols required to communicate to the PyPI repository and also to the different architecture used in the project. It subscribes to uploaded packages from PyPI for testing them (testing done by the execution part). It is also responisible for setting up the environment required for testing and to deliver the packages to the execution part for testing them .
- Master – Slave architecture where the master dispatches jobs to the slave and the slave executes them.
- The communication between master and slave happen through an API called command API
- The slave communicates with the vm , sends the distributions require for testing and receive raw data (after installing the distributions and conducting tests) using another API called raw data API
- Tests are run on VMs and each VM is handled by a slave
Raw Data API
- The task is to build a raw data API for the communication between the VM and slave.
- The raw data API handles sending the data into the corresponding VMs.
- The raw data API also handles the raw data (after the execution part has finished)on VM to be sent to the slave
- The task is to build a command API to communicate between the Master and the Slave .
- The command API handles the task requests issued by the Master to and assigns them to the slave.
- Task requests can involve different configurations to be made on a VM, what distributions to be tested,etc.
The slave performs the following tasks
- Initialises an isolated VM and configures the VM using the configuration provided by the API call to it.
- It should be able to communicate with PyPI repository and get the distributions to be tested.
- Gets the distribution to be tested from the repository , computes dependencies and also gets the dependencies from the repository.
- Passes all the packages to the VM.
- Receive the raw data from the VM.
- Master subscribes to uploaded packages in PyPI
- It dispatches jobs to the Slave using command API
- It receives the test results from the slave
During the preparation's discussions of the project, we have cut the project into two different parts which are depending on each other. This proposal is one of them: the execution part. The other part the establishment of clean environment (inside a VM instance): Environment part.
Work can be divided into three parts, which I will detail below:
Create an execution manager. Write the essentials tasks. Write the common tasks.
Even if this proposal concern the execution part of PYTI, choices made during preparation's discussions have an influence on the work, i will detail theses choices below and their influences:
Tests will be executed inside a Virtual Machine instance, so we need to execute source codes from untrusted source. We chose to cut off access to the network in order to avoid sending mail, deny of service... This choice can be problematic if tests need access to the network, but good practices recommend to mock them. If several distributions cannot mock network access, we must create a method to control network access in a more accurately way.
As vm instance will not have access to network, the program which will start the instance must prepare all that the instance will need, distribution archive, of course, but also distribution's dependencies archives. The dependencies computing must be done before starting the vm instance, this part is detailed below in the "Dependency setup" part.
The last point is about Raw Data API, in fact, tasks will generate raw data (see below for terminology) and they should be sent to the slave. This API is part of the complementary part of the project, but as the execution part will use it, it should be designed with the participation of all.
Running the tests on package content may be split into different independent tasks. These tasks can be written independently, but they must all be executed anyway. They cannot be performed in any order, as one specific task may depends on an another one and if a task fail, the system should not performed other tasks that depend on that which has failed.
This is the role of the execution manager, it will manage all the different tasks and execute them in the right order, ensuring that tasks success. Execution manager must manage these scenario:
Task 'A' depends on 'B' and 'B' one depends on 'C'. Execution order must be: 'C', 'B' and finally 'A'. If task 'B' fail, execution manager must mark 'A' as skipped due to error in one of it's dependencies. Execution manager must also provide a way for tasks to exchange data, and so we can imagine that tasks will depends on data instead of traditional dependencies. Moreover, as tasks should generate raw data, it makes more sense to manage raw data dependencies than tasks dependencies. For example:
Task 'A' need 'D' data. Task 'B' produce 'D' data during his execution. Execution order must be: 'B' and 'A'.
Some of the tasks are essential and must be implemented as soon as possible.
The second one is the installation. It's one of the first task that will be executed. It's the most important task as it's one of the part which has motivated the entire project. Indeed, old packaging libraries has designed Setup script as traditional python module and it causes some problem because packagers can add valid python code in their script and nothing prevent them to add a "os.sytem('rm -Rf /')" statement. The idea is to authorize it and trace all the access to file system, network and system calls. The detection of all this comportment can be done with using a tracing library such as Strace or SystemTap. Utilization of such tools will produce raw data that we must to process to make them readable for human and useable for harmful comportment detection.
The third one is build, indeed some of python distributions may include non-python source code. These non-python source code must be compiled before the installation of the distribution, but it could fail. Currently, the packaging libraries take care of that, but it could be difficult to have detailed raw data. So we need to study how these libraries work and how we can have raw data from them, and if it's not possible, we should improve the building part of theses libraries.
Moreover, some tasks may be considered as standard, such as the following:
Test execution task, result of unittest, doctest is a very common task and should be also included in the execution. Test execution may also include the code coverage measure. Test execution causes some problems, as there is more than one library for testing. Work must include analysis of existent testing libraries and make tasks for each of them. This task may depends on external dependencies (specific databases, non easy-installable libraries, ...) and we need to choose which external dependencies we will install on our VM instances.
Another common task is quality check. Quality check is a vast subject, so we have thought about pylint and pychecker in first time. Another quality tool can be added case by case depending on features added of these tools.
A final task that can be added if we have time is a performance task. The idea is measuring the time took by the others tasks. This measure is not very important but could be implemented if we have time and if community really want it.
copy content of proposal <here>
Master Slave Architecture
What we need:
- A real time feed
- Master to send jobs to the slave
- an API to communicate between master and slave
Existing solutions (just the master slave architecture):
- Buildbot is a CI to help automatic builds and report build reports. A brief notes on the architecture:
- master polls in the VCS to detect changes.
- There is a scheduler module within the master which maintains a queuing system to queue the build requests.
- Build Requests are managed by builder(master) which commands the slave when and how a build should go on.
- Uses TCP connection, which remains open for many hours ( should be avoided)
- Bitten is a Python CI for automatic builds, collecting various metrics
- Master initiates build by sending a reqest to slave
- Usually the request is done by writing a configuration file ( in this case in XML format)
- Simple peer to peer communication protocol to exchange information
- No SSL or authentication for slave registration
- Condor is a research project aimed at utilizing idle cpu cycles by distributed computing.
- Usually a job is submitted to condor(master) with some specifications
- Condor looks for a likely machine(slave) to the job on the machine's idle.
- Has the ability to preempt the job and shift it to another machine.
- Should have a queuing system at slave which manages requests for a particular platform
- Since we will be having dedicated slaves , distributed computing like condor is unnecessary
- Simple peer to peer protocol should suffice (i.e for slave registration, message passing )
- A message passing architecture should be used (unlike buildbout which uses a TCP connection which is unnecessary)
- A simple API should be good . Master to send commands to slave.
What we need:
- Dependencies between tasks
- Mark tasks as skipped if at least one dependency has failed or skipped
- Unit tests
- (Communication between tasks)
Additional things to take into account:
- Is the solution available as a separate package :
- If it's not available, it will require some work to extract it as a separate package (negative point). If the program whose solution is part is famous, we can consider that solution is quite bug-free and works not so bad (positive point).
- Communication between tasks: Output of some tasks (raw data) can be used by other tasks as input.
- Dependencies may follow two modes:
- Direct dependencies: Task A depends on B, so B must be executed before A.
Output dependencies: A dependency need D input (or raw data), B produces D data during execution. Task order is the same : B -> A, but it's more modular. For example, B and C can produce D data, but B and C don't work on the same platform.
- Resources Control is a way to stop a task if it reach a resource limit (CPU time, memory quantity).
Fine Configuration presents in BuildBot, when you configure a task you can configure how to catch failure and warnings : http://buildbot.net/buildbot/docs/latest/Common-Parameters.html#Common-Parameters.
Pony-build is the continuous integration tools used last year for PYTI. Characteristics:
- No dependencies between tasks, task-manager take a list of task to be executed task by task.
- Task manager stop on fail of one task, cannot mark a task as skipped and don't mark dependencies as skipped.
- No communication between tasks.
- No tests.
- No documentation.
- Part of pony-build, but seems easy to extract.
- No dependencies (0/4).
- Stop on fail (1/3).
- No communication (0/1)
- No tests (0/2).
- No documentation (0/2).
- Part of pony-build, easy to extract (2/3).
- Slighty used (1/4).
Narval is a task-manager library written by logilab and is used by apycot, a continuous integration tool written by logilab. Characteristics:
- Seems complex.
- Some part of the API seems not adapted, a POO approach seems better than the existing one.
- Tasks are associated in a recipe module.
- Dependencies: direct and output mode supported (4/4).
- Fine configuration of what doing if a step failed or emits warnings (3/3).
- Communications (1/1).
- No tests (0/2).
- No specific documentation (0/2).
- Part of narval, hard to extract (2/3).
- Slighty used (1/3).
Buildbot is the reference of continuous integration tool in python world. Characteristics:
- In buildbot vocabulary, steps are what we called tasks.
- Fine configuration of what doing if a step failed or emits warnings.
- Steps seems registered into two different places (master and slave).
- All buildbot system seems really dependent on twisted.
- Part of buildbot, intermediate quantity of work to extract it.
- No dependencies between tasks, scheduler take a list of builder identifier (0/4).
- Fine configuration of what doing on fail, warning (3/3).
- No communication between tasks (0/1).
- Unittests (2/2).
- Good documentation (2/2).
- Part of buildbot (1/3).
- Very used (3/3).
- Dependencies on twisted (-1).
PythonTasks is one of boris's project, it's still in beta and no doc is written for the moment. Characteristics:
Dependencies is either direct dependencies (task A depends on or output dependencies (task A need D input, B produces D, B must be executed before A). For the output dependencies mode, some code must be written as it will not be part (for the moment) of PythonTasks.
- Tasks can communicate between them, all of them receive the same Communication object which registered all the output of tasks.
- When a task fail, all others tasks, which depends on it, are marked as skipped before it failed (if B depends on A; A failed, B is marked as skipped before A has failed) and so recursively. Works the same way when a task has skipped.
- Unittest and code coverage.
- No documentation.
- Dependencies (direct or output) but requires works for output dependencies mode (3/4).
- Stop on fail, mark dependencies as skipped, but no fine configuration of tasks (like buildbot) (2/3).
- Communication between tasks (1/1).
- Unittests (2/2).
- No documentation (0/2).
- Separated package (3/3).
- Not used (0/3).
- We can send an archive consisting of distributions + dependencies along with a configuration file by means of an FTP or SFTP connection.
- Packages can be uploaded in a static folder (/tmp probably) . (or) We can specify the folder in the config file ( preferred solution :1st one)
- At the end of uploading, slave launch a command trough ssh in order to start the execution part (or) a server is launch on vm when the vm's os start and slave send a message to this server.
Raw Data API
- Reports are sent once all tasks are finished ( to keep the implementation simple) (or) send raw data for each task's end.( Preferred solution : 1st one )
For the format , we both agreed on jsonrpc (http://groups.google.com/group/json-rpc/web/json-rpc-2-0).
- A jsonrpc request is sent from the vm to a method in the slave with two parameters : id of the task, a dictionary with all raw data + all additional info(time, memory).
- Raw data can be the following:
- Solution 1 :
- stdout and stderr
- parsed form of stdout. Would result in less data
- Solution 1 :
faulthandler : Python's module to catch signals (sigsegv ie) and output them in stderr or in a file. Easy to parse the output but trace only signals which is already traced by others solution. (http://blog.python.org/2011/05/new-faulthandler-module-in-python-33.html).
Comparison of VMs
Most(Kernel need to be modified)
Performance loss on disk intensive operations
Qemu is usually slow. But Kqemu has good performance
Virtual Hard Disk Image Format
Community not very active in the fast few years
Support dropped for KQemu
RedHat is currently supporting it
Oracle is currently working on it
Clearly the competition drops down to Vbox,KVM Reasons for choosing Vbox over KVM 1)Vbox support for multiple platforms 2)KVM’s native disk image QCOW2 is very slow. Ofcourse I could use other disk images but that would require external tools. 3)KVM’s dependency on linux kernel
So VirtualBox to be used.
Different modules in the VM :
1)Starting the VM(simple version implemented, unit tests remaining) 2)Stopping the VM(simple version implemented, unit tests remaining) 3)XML Handler (simple version implemented, unit tests remaining) 4)Rollback 5)Virtual Hard Disk Handler
weekly meeting report
- (update once in a week, and whenever there is something to say)
General Mailing List : email@example.com Specific : distutils-sig / python-testing / catalog-sig / python-dev depending the cases
Vanilla Mercurial + BitBucket
- Python 2.5, if there is a need for new features, using a different python version can be discussed up to python 2.6
For the next week(Boris): Code : Finish some little things on PyTasks (need it for a project) and compare it to another task manager included in tools like waff and narval (not sure about pony-build) Design : Will read some of my bookmarks about distributed computing communication : Write a summary about task manager and about which protocol we can use for API
For the next week(Yeswanth):
Code : Not much this week( Will work on reading about python coding conventions) Other tools/design : Will go through condor or other equivalents and see how it fits in this project Communication : Report on condor (http://www.cs.wisc.edu/condor/) or other equivalents
Interaction Between Slave and VM
- Interaction between slave and VM happens through a shared folder.
- Slave downloads the distributions
- Slave creates a VM image with a specific os(that is to be tested on ) also mentioning a folder(i.e shared folder) where downloaded distributions are placed(along with the configuration file).
- The VM performs the tasks as mentioned in the configuration file
- VM copies the results and generates a raw data file (consists of all the raw data after tests are conducted) which is placed in the shared folder).
- Slave sends the report file to "Result Storage".
- Master handles "general task scheduling"
- Slave handles "task execution" (into vm)
- Result Storage handles "storing and viewing results"
Focus for the near future
- Managing VM slave side (i.e configuring and managing VM )
- Getting a Task Manager that runs task
For the next week (Yeswanth)
- Study what VMs are compatible with libvirt , how and which matches the need for PyTI
- Make a report of the findings and discuss it on the ML
- Start working with libvirt to have a VM Manager
For the next week (Boris)
- Look at result structure of Bitten and Apycot
- Work on Python Tasks.
- See if anything is missing which might be important for PyTI
- Be sure it can accomplish what it is supposed to do
- Add a doc for Python Tasks.