PSPLIB J30 scheduling benchmark | Results

Abstract

This note reports a deterministic dispatch rule for the Resource-Constrained Project Scheduling Problem (RCPSP). On a frozen public subset of PSPLIB J30, the evolutionary chain reduced the lower-is-better acceptance score from 14.312 to 12.087, a 15.55% reduction, while preserving feasibility on all 80 evaluated instances.

For audit clarity, the public claim is the curated evolutionary chain, not a claim about the final isolated diagnostic run.

The benchmark is deliberately external and bounded. Each portfolio instance has a proven optimal makespan, so the reported score measures schedule quality against a reference optimum rather than machine speed. The result should be read as a scheduling benchmark result on this fixed PSPLIB J30 subset, not as a claim that every RCPSP instance or production planning workflow is solved.

1. Problem formulation

RCPSP schedules a project made of activities, precedence constraints, and renewable resource limits. An activity can start only after its predecessors finish, and the resource demand of all activities active at the same time must fit within the available capacity. The schedule objective is the project makespan: the finish time of the terminal activity.

\[ C_{\max}(S) = \max_i F_i \]

(1)

where \(F_i\) is the finish time of activity \(i\) in schedule \(S\).

Figure 1. RCPSP schedule readout. The standard schedule view combines the precedence network, Gantt placement, and makespan marker.

Figure 2. Renewable-resource load profile. The evaluator only accepts placements that keep every active resource demand under the capacity line.

The evaluator uses serial schedule generation. At each step it finds eligible activities, scores each one with the candidate rule, asks the selector to choose one eligible activity, and places that activity at the earliest resource-feasible start time. This keeps the optimization surface narrow: the candidate changes priority logic, not the scheduling validator.

2. Benchmark and evaluation contract

The public portfolio contains 80 frozen PSPLIB J30 single-mode instances. The selection is parameters \(\{1, 7, 13, 19, 25, 31, 37, 43\}\) crossed with instances 1 through 10. Every instance has 32 jobs including dummy source and sink jobs, renewable-resource capacities, precedence arcs, and a proven optimal makespan.

Field	Public contract
Dataset	PSPLIB J30 single-mode RCPSP
Portfolio	80 frozen instances
Selection	parameters [1, 7, 13, 19, 25, 31, 37, 43] crossed with instances 1..10
Reference	480 J30 instances with proven optima in the source set
Objective	`mean_gap_pct + 0.35 * p95_gap_pct + feasibility_penalty`

Table 1. Frozen PSPLIB J30 benchmark surface.

For one instance \(k\), the evaluator computes a makespan gap against the proven optimum:

\[ g_k = 100 \cdot \frac{ C_{\max,k}^{\mathrm{cand}} - C_{\max,k}^{\star} }{ C_{\max,k}^{\star} }. \]

(2)

Here \(C_{\max,k}^{\mathrm{cand}}\) is the candidate makespan on instance \(k\), and \(C_{\max,k}^{\star}\) is the proven optimum for the same instance.

The retained objective combines portfolio mean quality with a tail-risk term:

\[ J = \operatorname{mean}_{k}(g_k) + 0.35\,\operatorname{p95}_{k}(g_k) + \lambda_{\mathrm{feas}}. \]

(3)

A candidate must keep \(\lambda_{\mathrm{feas}}=0\). It must return finite priority scores, choose only eligible activities, schedule every activity exactly once, respect all precedences, and never exceed renewable resource capacities.

3. Accepted candidate

The accepted candidate keeps the serial schedule generator unchanged and changes only the activity-ranking policy. The final rule combines critical-path urgency, successor-unlocking pressure, bottleneck resource pressure, resource pressure, wait pressure, and remaining-work pressure. The selector then prefers activities that can start earlier and uses the priority score as the second-order decision.

1def priority_score(2    activity: ActivityView,3    state: ScheduleStateView,4    instance: InstanceView,5) -> float:6    """Return a deterministic priority score for one eligible RCPSP activity."""7    # Critical path urgency: prioritize jobs that are on the critical path8    cp_score = activity.critical_path_tail * 3.09    # Downstream impact: prioritize jobs that unlock more work10    unlock_score = (11        activity.successor_work * 0.1512        + activity.transitive_successor_count * 1.513    )14    # Resource contention: prioritize jobs that are bottleneck-heavy15    resource_score = (16        activity.bottleneck_ratio * 100.017        + activity.resource_pressure * 10.018    )19    # Urgency: penalize jobs that have been waiting past their earliest possible start20    wait_time = max(0, state.current_makespan - state.earliest_precedence_start)21    wait_score = wait_time * 1.522    # Remaining work pressure: scale priority based on total remaining work to maintain throughput23    remaining_score = state.remaining_work * 0.0524    # Final score with tie-breaker based on job ID25    return (26        cp_score27        + unlock_score28        + resource_score29        + wait_score30        + remaining_score31        - (0.01 * activity.id)32    )33 34def select_activity(35    eligible_activities: tuple[EligibleActivityView, ...],36    instance: InstanceView,37) -> int:38    """Choose one eligible activity after all local priority scores are available."""39    # Prioritize earlier start times to keep resources busy, then use the refined priority score.40    selected = min(41        eligible_activities,42        key=lambda item: (43            item.state.earliest_resource_feasible_start,44            -item.priority,45        ),46    )47    return int(selected.activity.id)

Listing 1. Accepted candidate implementation, formatted for inspection. Full executable artifact.

This is intentionally inspectable. The candidate is not a solver replacement or a hidden search procedure. It is a deterministic dispatch rule inside a fixed evaluator.

4. Results

The public chain moved from the seed score of 14.312 to an accepted score of 12.087. Figure 3 shows the full scored-candidate trace; Figure 4 summarizes the reported seed-versus-accepted comparison.

Figure 3. Best-so-far score across the curated evolutionary chain, rendered in the same paper style as the quadrature objective trace. Faint points are scored candidates from the sanitized public trace; scores above 16 are clipped at the top of the plotting area.

Figure 4. Seed versus accepted acceptance-score readout; lower values are better.

Figure 5. Real schedule-compression readout for PSPLIB J30 instance j3025_9. All executable jobs are rendered as unlabeled bars; seed finishes at 112, accepted at 90, and the proven optimum is 84.

Figure 5 translates the aggregate result into one real portfolio instance, j3025_9. Unlike the introductory schematic, it uses all executable jobs from the instance and the same instance's renewable-resource load buckets.

Metric	Seed	Accepted	Change
Acceptance score	14.312	12.087	-2.226
Relative score change	reference	15.55%	15.55% reduction
Validated schedules	80 / 80 target	80 / 80	100% valid

Table 2. Reported score comparison across the curated evolutionary chain; lower values are better.

Figure 6 keeps the accepted-candidate diagnostics separate from the chain objective. In absolute benchmark terms, this is not a 99% approximation claim: the mean accepted makespan is about 106.04% of the proven optimum, or roughly 94.31% if expressed as optimum divided by candidate makespan.

Figure 6. Accepted candidate diagnostics on the frozen 80-instance portfolio.

The worst residual gaps remain visible because they matter operationally. Figure 7 is the tail readout: it shows the largest accepted-candidate residual gaps and their makespan-versus-optimum comparison.

Figure 7. Worst-instance tail behavior for the accepted candidate. Each bar shows residual gap against the proven optimum.

5. Limitations

This result is limited to the frozen 80-instance PSPLIB J30 subset. PSPLIB contains 480 J30 instances with proven optima; this report does not claim evaluation over all 480. It also does not measure wall-clock optimization speed, human planner usability, robustness under changed project distributions, or production integration behavior.

The accepted rule is deterministic and feasible under this evaluator, but it is still a dispatch heuristic. A different project class, different resource model, multi-mode activity model, stochastic arrivals, or operational priority policy would require a new evaluation contract.

6. Reproducibility

The bundle includes the accepted candidate, evaluation contract, curated evolution chain, metrics, provenance, replay confirmation, and public figures. A curated animated replay of the same public trace is available at the run page. It is a presentation layer over the same artifacts, not a separate result, and excludes prompts, raw logs, telemetry, and non-public proposal context.

Replaying the result should use the same PSPLIB J30 portfolio, the same proven optimal makespans, the same score formula, and the same lower-is-better direction. Changing the instance set or objective creates a new evaluation, not a replay of this result.

The source bundle is available in Göther Labs results repository.