PSPLIB J30 scheduling benchmark

A deterministic resource-constrained project scheduling rule improved across a frozen PSPLIB J30 subset with proven optimal makespans. The public chain reduced the lower-is-better score from 14.312 to 12.087 while keeping feasibility at 100%.

Abstract

This note reports a deterministic dispatch rule for the Resource-Constrained Project Scheduling Problem (RCPSP). On a frozen public subset of PSPLIB J30, the evolutionary chain reduced the lower-is-better acceptance score from 14.312 to 12.087, a 15.55% reduction, while preserving feasibility on all 80 evaluated instances.

For audit clarity, the public claim is the curated evolutionary chain, not a claim about the final isolated diagnostic run.

The benchmark is deliberately external and bounded. Each portfolio instance has a proven optimal makespan, so the reported score measures schedule quality against a reference optimum rather than machine speed. The result should be read as a scheduling benchmark result on this fixed PSPLIB J30 subset, not as a claim that every RCPSP instance or production planning workflow is solved.

1. Problem formulation

RCPSP schedules a project made of activities, precedence constraints, and renewable resource limits. An activity can start only after its predecessors finish, and the resource demand of all activities active at the same time must fit within the available capacity. The schedule objective is the project makespan: the finish time of the terminal activity.

\[ C_{\max}(S) = \max_i F_i \]
(1)

where \(F_i\) is the finish time of activity \(i\) in schedule \(S\).

RCPSP schedule readout Eligible activities are ranked first; the evaluator then commits them to the earliest feasible placement. 0 A B C T A task becomes eligible only after all predecessor arcs are satisfied. row 1 row 2 row 3 J1 J2 J4 J3 J5 Cmax 0 20 40 60
Figure 1. RCPSP schedule readout. The standard schedule view combines the precedence network, Gantt placement, and makespan marker.
Renewable resource load A schedule is feasible only while every time bucket stays at or below capacity. 0 1 2 3 0 20 40 60 time bucket active demand The fixed evaluator rejects any placement that crosses the capacity line.
Figure 2. Renewable-resource load profile. The evaluator only accepts placements that keep every active resource demand under the capacity line.

The evaluator uses serial schedule generation. At each step it finds eligible activities, scores each one with the candidate rule, asks the selector to choose one eligible activity, and places that activity at the earliest resource-feasible start time. This keeps the optimization surface narrow: the candidate changes priority logic, not the scheduling validator.

2. Benchmark and evaluation contract

The public portfolio contains 80 frozen PSPLIB J30 single-mode instances. The selection is parameters \(\{1, 7, 13, 19, 25, 31, 37, 43\}\) crossed with instances 1 through 10. Every instance has 32 jobs including dummy source and sink jobs, renewable-resource capacities, precedence arcs, and a proven optimal makespan.

Field Public contract
Dataset PSPLIB J30 single-mode RCPSP
Portfolio 80 frozen instances
Selection parameters [1, 7, 13, 19, 25, 31, 37, 43] crossed with instances 1..10
Reference 480 J30 instances with proven optima in the source set
Objective mean_gap_pct + 0.35 * p95_gap_pct + feasibility_penalty
Table 1. Frozen PSPLIB J30 benchmark surface.

For one instance \(k\), the evaluator computes a makespan gap against the proven optimum:

\[ g_k = 100 \cdot \frac{ C_{\max,k}^{\mathrm{cand}} - C_{\max,k}^{\star} }{ C_{\max,k}^{\star} }. \]
(2)

Here \(C_{\max,k}^{\mathrm{cand}}\) is the candidate makespan on instance \(k\), and \(C_{\max,k}^{\star}\) is the proven optimum for the same instance.

The retained objective combines portfolio mean quality with a tail-risk term:

\[ J = \operatorname{mean}_{k}(g_k) + 0.35\,\operatorname{p95}_{k}(g_k) + \lambda_{\mathrm{feas}}. \]
(3)

A candidate must keep \(\lambda_{\mathrm{feas}}=0\). It must return finite priority scores, choose only eligible activities, schedule every activity exactly once, respect all precedences, and never exceed renewable resource capacities.

3. Accepted candidate

The accepted candidate keeps the serial schedule generator unchanged and changes only the activity-ranking policy. The final rule combines critical-path urgency, successor-unlocking pressure, bottleneck resource pressure, resource pressure, wait pressure, and remaining-work pressure. The selector then prefers activities that can start earlier and uses the priority score as the second-order decision.

1def priority_score(2    activity: ActivityView,3    state: ScheduleStateView,4    instance: InstanceView,5) -> float:6    """Return a deterministic priority score for one eligible RCPSP activity."""7    # Critical path urgency: prioritize jobs that are on the critical path8    cp_score = activity.critical_path_tail * 3.09    # Downstream impact: prioritize jobs that unlock more work10    unlock_score = (11        activity.successor_work * 0.1512        + activity.transitive_successor_count * 1.513    )14    # Resource contention: prioritize jobs that are bottleneck-heavy15    resource_score = (16        activity.bottleneck_ratio * 100.017        + activity.resource_pressure * 10.018    )19    # Urgency: penalize jobs that have been waiting past their earliest possible start20    wait_time = max(0, state.current_makespan - state.earliest_precedence_start)21    wait_score = wait_time * 1.522    # Remaining work pressure: scale priority based on total remaining work to maintain throughput23    remaining_score = state.remaining_work * 0.0524    # Final score with tie-breaker based on job ID25    return (26        cp_score27        + unlock_score28        + resource_score29        + wait_score30        + remaining_score31        - (0.01 * activity.id)32    )33 34def select_activity(35    eligible_activities: tuple[EligibleActivityView, ...],36    instance: InstanceView,37) -> int:38    """Choose one eligible activity after all local priority scores are available."""39    # Prioritize earlier start times to keep resources busy, then use the refined priority score.40    selected = min(41        eligible_activities,42        key=lambda item: (43            item.state.earliest_resource_feasible_start,44            -item.priority,45        ),46    )47    return int(selected.activity.id)
Listing 1. Accepted candidate implementation, formatted for inspection. Full executable artifact.

This is intentionally inspectable. The candidate is not a solver replacement or a hidden search procedure. It is a deterministic dispatch rule inside a fixed evaluator.

4. Results

The public chain moved from the seed score of 14.312 to an accepted score of 12.087. Figure 3 shows the full scored-candidate trace; Figure 4 summarizes the reported seed-versus-accepted comparison.

Best-so-far acceptance objective (lower is better) scored candidate best-so-far objective baseline accepted 16+ 14 12 0 40 80 119 generation k J(r)
Figure 3. Best-so-far score across the curated evolutionary chain, rendered in the same paper style as the quadrature objective trace. Faint points are scored candidates from the sanitized public trace; scores above 16 are clipped at the top of the plotting area.
Acceptance objective lower is better 15.55% reduction seed 14.312 accepted 12.087 0 7.5 15
Figure 4. Seed versus accepted acceptance-score readout; lower values are better.
Schedule compression seed accepted capacity 0 30 60 90 120 accepted Cmax seed Cmax capacity 14 0 cap 0 time 120 demand
Figure 5. Real schedule-compression readout for PSPLIB J30 instance j3025_9. All executable jobs are rendered as unlabeled bars; seed finishes at 112, accepted at 90, and the proven optimum is 84.

Figure 5 translates the aggregate result into one real portfolio instance, j3025_9. Unlike the introductory schematic, it uses all executable jobs from the instance and the same instance's renewable-resource load buckets.

Metric Seed Accepted Change
Acceptance score 14.312 12.087 -2.226
Relative score change reference 15.55% 15.55% reduction
Validated schedules 80 / 80 target 80 / 80 100% valid
Table 2. Reported score comparison across the curated evolutionary chain; lower values are better.

Figure 6 keeps the accepted-candidate diagnostics separate from the chain objective. In absolute benchmark terms, this is not a 99% approximation claim: the mean accepted makespan is about 106.04% of the proven optimum, or roughly 94.31% if expressed as optimum divided by candidate makespan.

Accepted candidate diagnostics 80 frozen PSPLIB J30 instances 6.04% 17.28% 22.41% 19 / 80 Feasibility penalty: 0.000 | invalid priority count: 0 | schedules validated: 80 / 80
Figure 6. Accepted candidate diagnostics on the frozen 80-instance portfolio.

The worst residual gaps remain visible because they matter operationally. Figure 7 is the tail readout: it shows the largest accepted-candidate residual gaps and their makespan-versus-optimum comparison.

Tail behavior kept visible largest accepted-candidate gaps 22.41% | 71 vs 58 18.84% | 82 vs 69 18.67% | 89 vs 75 18.06% | 85 vs 72 17.24% | 68 vs 58 17.24% | 68 vs 58 17.11% | 89 vs 76 16.44% | 85 vs 73 0% 8% 16% 24% Worst residual gaps remain part of the public result definition.
Figure 7. Worst-instance tail behavior for the accepted candidate. Each bar shows residual gap against the proven optimum.

5. Limitations

This result is limited to the frozen 80-instance PSPLIB J30 subset. PSPLIB contains 480 J30 instances with proven optima; this report does not claim evaluation over all 480. It also does not measure wall-clock optimization speed, human planner usability, robustness under changed project distributions, or production integration behavior.

The accepted rule is deterministic and feasible under this evaluator, but it is still a dispatch heuristic. A different project class, different resource model, multi-mode activity model, stochastic arrivals, or operational priority policy would require a new evaluation contract.

6. Reproducibility

The bundle includes the accepted candidate, evaluation contract, curated evolution chain, metrics, provenance, replay confirmation, and public figures. A curated animated replay of the same public trace is available at the run page. It is a presentation layer over the same artifacts, not a separate result, and excludes prompts, raw logs, telemetry, and non-public proposal context.

Replaying the result should use the same PSPLIB J30 portfolio, the same proven optimal makespans, the same score formula, and the same lower-is-better direction. Changing the instance set or objective creates a new evaluation, not a replay of this result.

The source bundle is available in Göther Labs results repository.