Project Description

Enables central server to execute arbitrary jobs on client nodes and collect results. Originally designed for executing test harnesses on multiple virtual machines and collecting the harness log and coverage files. This project is not intended to compete against "professional" schedulers like Condor or Microsoft Cluster technology. It is designed for the simple cases of parallel work. It uses WCF (.NET 3.5 version) for communication and a client-side polling approach to limit firewall and NAT issues.

Status of the Project

The current version of this project is 0.80. It is a point-release to coincide with my honors thesis defense.

The project is written in C# and tested with .NET 3.5 on Vista Enterprise, Windows XP, and Server 2003.

Because it is not a version-1 release, not all functionality is complete and ready to use for everyone. I used this project in my research work, and it met my needs.

Let me set expectations low: this is a simple project designed to meet simple needs. I suspect that there are others out there like me with these simple needs, and I hope for them this is useful. However, it is not all things for all people. If you need serious scheduling, check out a professional project like Condor (http://www.cs.wisc.edu/condor/).

Please see the screencast for an overview of the functionality.

Screenshots

The following is the server with one node and 4 batches:
ServerNodeWithBatch.png

The following is the idle client node:
ClientNodeIdle.png

The following is the client node running notepad:
ClientNodeRunning.png

Scenarios

These are sample scenarios for which this product is useful.

Execute gui interface tests in parallel

This is the scenario we needed the product to solve.

We wanted to shorten the (real) time to test a system. Our test suite contained gui test cases, and executing a test case locked up the entire machine. Executing the test suite took a little more than one hour. We also needed to run the test suite across various configurations to increase code coverage. We wrote a test harness that set the ini file of the test subject, ran the test cases, and then gathered the code coverage. The harness takes a command-line parameter to determine which configuration to use.

We created a virtual PC image to execute the test cases. We then duplicated the virtual PC images and ran them in parallel on Virtual Server. That way when the test case executed, it only locked up the virtual machine without affecting the other machines. With 14 nodes in parallel, we needed an efficient way to update harness files, test case definitions, and collect coverage data.

We gave the scheduler the Batch Template which included the list of files necessary to test, the command with arguments to execute the test harness, and the location of the coverage file. Our last command was a call to 7z to zip up all of the coverage files for quicker file transfer.

The scheduler took the template and generated one batch instance for each configuration numbered Cannot resolve the wiki link macro, length of title is too long.
  • Only sequential pattern and hard-coded into template
  • I store everything in DataTables and view them with DataGrids. Long-term I should use an embedded database. Store data in embedded database
  • Client does not always kill all processes when killing a batch. This occurs when a process the client spawns creates its own process. I haven't yet solved identifying all child processes and killing them. Client does not kill entire process tree
  • I use the terms batch and job interchangeably. It's likely confusing to others. This is because the batch was originally designed to replace actual bat files. I then started using the term job following what I heard around the lab with Condor.
  • I don't pass all FxCop violations. Many of them are localization issues, but a good-sized number of them are not. FxCop violations remain

Last edited Apr 17, 2008 at 5:21 PM by joes, version 11