Note

This is example is available as a Jupyter notebook. Download it and all necessary data files here.

Project-Team Allocation#

Consider the following decision problem: a number of teams are available to work on a collection of new projects in your organisation. Due to resourcing constraints, each team has a finite capacity, so not all projects will be completed. Furthermore, the teams are heterogeneous: not all teams are capable of completing all projects. Your task is to allocate projects to teams so that the total value of completed projects is maximised.

The mathematical model is defined based on two indices: the projects \(i \in I\) and the teams \(j \in J\). The model data is also defined on these indices: each project has a resource requirement \(w_i\) and a value \(p_i\). Each team has a capacity \(c_j\). We will define binary variables \(x_{ij}\) over possible assignment pairs \((i,j)\) in order to build the constraints and objective in the model.

The mathematical model is formulated as

\[\begin{split}\begin{alignat}{2} \max \quad & \sum_{i \in I} \sum_{j \in J} p_{i} x_{ij} \\ \mbox{s.t.} \quad & \sum_{i \in I} w_{i} x_{ij} \le c_{j} & \forall j \in J \\ & \sum_{j \in J} x_{ij} \le 1 & \forall i \in I \\\ & x_{ij} \in \lbrace 0, 1 \rbrace & \forall i \in I, j \in J \\ \end{alignat}\end{split}\]

and is more commonly known as the Multiple Knapsack Problem (MKP).

This model has a slight twist: since the teams are heterogeneous, not all teams can complete all projects. We will deal with this requirement before constructing the model, by filtering the input data. In this way we leverage the inherent sparsity of the model variables to avoid adding redundant variables and reduce the overall model size.

For all gurobipy_pandas applications, we start with the standard imports:

[1]:
import pandas as pd
import gurobipy as gp
from gurobipy import GRB
import gurobipy_pandas as gppd

gppd.set_interactive()

We will first read in the data for both projects and teams. To match our mathematical model, which is indexed based on projects \(i\) and teams \(j\), we will set the indexes of our pandas dataframes in the same way.

[3]:
projects = pd.read_csv("data/projects.csv", index_col="project")
projects.head()
[3]:
resource difficulty value
project
0 1.1 3 0.4
1 1.4 3 1.3
2 1.2 1 1.7
3 1.1 3 1.3
4 0.9 1 1.3
[4]:
teams = pd.read_csv("data/teams.csv", index_col="team")
teams.head()
[4]:
skill capacity
team
0 1 2.4
1 1 1.8
2 1 1.1
3 2 1.9
4 3 1.4

We first do some quick quality and size checks on the data. Clearly, it will not be possible to complete all projects with the available resource.

[5]:
print(f"There are {len(projects)} projects and {len(teams)} teams.")
print(f"Total project resource required: {projects.resource.sum():.1f}")
print(f"Total team resource available: {teams.capacity.sum():.1f}")
There are 30 projects and 5 teams.
Total project resource required: 35.2
Total team resource available: 8.6

We also check the heterogeneous nature of the projects and teams. This model follows a simple rule: team \(j\) can complete project \(i\) if \(\text{skill}_j \ge \text{difficulty}_i\). So, we can see that the available teams (and therefore capacity) for projects with difficulty 3 is far less than the available capacity for projects with difficulty 1.

[6]:
projects.difficulty.value_counts()
[6]:
difficulty
3    10
1    10
2    10
Name: count, dtype: int64
[7]:
teams.skill.value_counts()
[7]:
skill
1    3
2    1
3    1
Name: count, dtype: int64

When formulating the model, we could create a variable for every index pair \((i, j)\), and later disallow the invalid assignments using constraints. But this would introduce redundant variables, creating a model which is larger than needed. We should instead exploit the natural sparsity of the problem by filtering on the data to find only the variables we need, before we start to create variables.

[8]:
# Pandas does not have a conditional join, but we can use a
# cross join + query to create the list of allowed pairs.
# For larger datasets, this data filtering might be better
# done before loading into python/pandas (e.g. in SQL, which
# will handle this more efficiently).

allowed_pairs = (
    pd.merge(
        projects.reset_index(),
        teams.reset_index(),
        how='cross',
    )
    .query("difficulty <= skill")
    .set_index(["project", "team"])
)

print(
    f"Model will have {len(allowed_pairs)} variables"
    f" (out of a possible {len(projects) * len(teams)})"
)

allowed_pairs
Model will have 80 variables (out of a possible 150)
[8]:
resource difficulty value skill capacity
project team
0 4 1.1 3 0.4 3 1.4
1 4 1.4 3 1.3 3 1.4
2 0 1.2 1 1.7 1 2.4
1 1.2 1 1.7 1 1.8
2 1.2 1 1.7 1 1.1
... ... ... ... ... ... ...
28 2 1.1 1 1.0 1 1.1
3 1.1 1 1.0 2 1.9
4 1.1 1 1.0 3 1.4
29 3 0.9 2 1.3 2 1.9
4 0.9 2 1.3 3 1.4

80 rows × 5 columns

The new dataframe we have constructed has a sparse index, which represents the small set of allowed pairs (roughly half the set of total possible pairs). This approach reduces the model size, and correspondingly the time taken to formulate it.

Model Formulation#

From the allowed_pairs dataframe, gurobipy_pandas provides a free function to create a corresponding series of variables on our new index. We first create a gurobipy Model, then call gppd.add_vars to create a Gurobi variable for every entry in the index. Since the value column exists in this dataframe, we can reference this column directly to use it as the linear objective coefficient for each variable.

[9]:
model = gp.Model()
model.ModelSense = GRB.MAXIMIZE
x = gppd.add_vars(model, allowed_pairs, vtype=GRB.BINARY, obj="value", name="x")
x
[9]:
project  team
0        4        <gurobi.Var x[0,4]>
1        4        <gurobi.Var x[1,4]>
2        0        <gurobi.Var x[2,0]>
         1        <gurobi.Var x[2,1]>
         2        <gurobi.Var x[2,2]>
                         ...
28       2       <gurobi.Var x[28,2]>
         3       <gurobi.Var x[28,3]>
         4       <gurobi.Var x[28,4]>
29       3       <gurobi.Var x[29,3]>
         4       <gurobi.Var x[29,4]>
Name: x, Length: 80, dtype: object

To add the necessary constraints to the model, we use pandas native groupby and aggregate operations to group our binary assignment variables along with their resource requirements. The result is a Series of expressions capturing the total resource requirement allocated to a team based on the selected assignments:

[10]:
total_resource = (
    (projects["resource"] * x)
    .groupby("team").sum()
)
total_resource
[10]:
team
0    1.2 x[2,0] + 0.9 x[4,0] + 1.3 x[5,0] + x[6,0] ...
1    1.2 x[2,1] + 0.9 x[4,1] + 1.3 x[5,1] + x[6,1] ...
2    1.2 x[2,2] + 0.9 x[4,2] + 1.3 x[5,2] + x[6,2] ...
3    1.2 x[2,3] + 0.9 x[4,3] + 1.3 x[5,3] + x[6,3] ...
4    1.1 x[0,4] + 1.4 x[1,4] + 1.2 x[2,4] + 1.1 x[3...
dtype: object

We then use the free function gppd.add_constrs to create constraints by aligning these expressions with capacity data:

[11]:
capacity_constraints = gppd.add_constrs(
    model, total_resource, GRB.LESS_EQUAL, teams["capacity"],
    name='capacity',
)
capacity_constraints.head()
[11]:
team
0    <gurobi.Constr capacity[0]>
1    <gurobi.Constr capacity[1]>
2    <gurobi.Constr capacity[2]>
3    <gurobi.Constr capacity[3]>
4    <gurobi.Constr capacity[4]>
Name: capacity, dtype: object

We also need to constrain that each project is allocated to at most one team. This is done using the same function:

[12]:
allocate_once = gppd.add_constrs(
    model, x.groupby('project').sum(),
    GRB.LESS_EQUAL, 1.0, name="allocate_once",
)
allocate_once.head()
[12]:
project
0    <gurobi.Constr allocate_once[0]>
1    <gurobi.Constr allocate_once[1]>
2    <gurobi.Constr allocate_once[2]>
3    <gurobi.Constr allocate_once[3]>
4    <gurobi.Constr allocate_once[4]>
Name: allocate_once, dtype: object

Extracting Solutions#

With the model formulated, we solve it using the Gurobi Optimizer:

[13]:
model.optimize()
Gurobi Optimizer version 11.0.3 build v11.0.3rc0 (linux64 - "Ubuntu 22.04.3 LTS")

CPU model: Intel(R) Xeon(R) Platinum 8259CL CPU @ 2.50GHz, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 1 physical cores, 2 logical processors, using up to 2 threads

Optimize a model with 35 rows, 80 columns and 160 nonzeros
Model fingerprint: 0x39911703
Variable types: 0 continuous, 80 integer (80 binary)
Coefficient statistics:
  Matrix range     [3e-01, 2e+00]
  Objective range  [3e-01, 2e+00]
  Bounds range     [1e+00, 1e+00]
  RHS range        [1e+00, 2e+00]
Found heuristic solution: objective 7.6000000
Presolve removed 18 rows and 37 columns
Presolve time: 0.00s
Presolved: 17 rows, 43 columns, 78 nonzeros
Found heuristic solution: objective 12.2000000
Variable types: 0 continuous, 43 integer (43 binary)

Root relaxation: objective 1.381744e+01, 12 iterations, 0.00 seconds (0.00 work units)

    Nodes    |    Current Node    |     Objective Bounds      |     Work
 Expl Unexpl |  Obj  Depth IntInf | Incumbent    BestBd   Gap | It/Node Time

     0     0   13.81744    0    5   12.20000   13.81744  13.3%     -    0s
     0     0   13.55000    0    9   12.20000   13.55000  11.1%     -    0s
H    0     0                      13.4000000   13.55000  1.12%     -    0s
     0     0   13.55000    0    9   13.40000   13.55000  1.12%     -    0s

Cutting planes:
  Gomory: 1
  Cover: 3
  MIR: 2
  StrongCG: 1
  RLT: 1

Explored 1 nodes (22 simplex iterations) in 0.02 seconds (0.00 work units)
Thread count was 2 (of 2 available processors)

Solution count 3: 13.4 12.2 7.6

Optimal solution found (tolerance 1.00e-04)
Best objective 1.340000000000e+01, best bound 1.340000000000e+01, gap 0.0000%

To extract the solution, we use the series accessor .gppd.X to retrieve solution values for all variables in the series x:

[14]:
x.gppd.X
[14]:
project  team
0        4       0.0
1        4       0.0
2        0       1.0
         1      -0.0
         2       0.0
                ...
28       2      -0.0
         3       0.0
         4       0.0
29       3       1.0
         4      -0.0
Name: x, Length: 80, dtype: float64

Notice that the result is returned on the same index as x so we can directly apply pandas tranformations, so we can immediately use pandas methods to transform the result into a more readable form. We could also directly write the result to another file or result API. In this example, we will perform a simple aggregation to show the list of projects allocated to each team:

[15]:
(
    x.gppd.X.to_frame()
    .query("x >= 0.9").reset_index()
    .groupby("team").agg({"project": list})
)
[15]:
project
team
0 [2, 11]
1 [5]
2 [6]
3 [4, 29]
4 [14, 15, 26]

To aid in follow-up analysis, we can also use the series accessor .gppd.Slack on any constraint series to determine the slack in inequality constraints. For example, the following shows the spare capacity of each team in the current assignment:

[16]:
capacity_constraints.gppd.Slack
[16]:
team
0    0.1
1    0.5
2    0.1
3    0.1
4    0.1
Name: capacity, dtype: float64