Note
This is example is available as a Jupyter notebook. Download it and all
necessary data files here
.
Project-Team Allocation¶
Consider the following decision problem: a number of teams are available to work on a collection of new projects in your organisation. Due to resourcing constraints, each team has a finite capacity, so not all projects will be completed. Furthermore, the teams are heterogeneous: not all teams are capable of completing all projects. Your task is to allocate projects to teams so that the total value of completed projects is maximised.
The mathematical model is defined based on two indices: the projects \(i \in I\) and the teams \(j \in J\). The model data is also defined on these indices: each project has a resource requirement \(w_i\) and a value \(p_i\). Each team has a capacity \(c_j\). We will define binary variables \(x_{ij}\) over possible assignment pairs \((i,j)\) in order to build the constraints and objective in the model.
The mathematical model is formulated as
and is more commonly known as the Multiple Knapsack Problem (MKP).
This model has a slight twist: since the teams are heterogeneous, not all teams can complete all projects. We will deal with this requirement before constructing the model, by filtering the input data. In this way we leverage the inherent sparsity of the model variables to avoid adding redundant variables and reduce the overall model size.
For all gurobipy_pandas
applications, we start with the standard imports:
[1]:
import pandas as pd
import gurobipy as gp
from gurobipy import GRB
import gurobipy_pandas as gppd
gppd.set_interactive()
We will first read in the data for both projects and teams. To match our mathematical model, which is indexed based on projects \(i\) and teams \(j\), we will set the indexes of our pandas dataframes in the same way.
[3]:
projects = pd.read_csv("data/projects.csv", index_col="project")
projects.head()
[3]:
resource | difficulty | value | |
---|---|---|---|
project | |||
0 | 1.1 | 3 | 0.4 |
1 | 1.4 | 3 | 1.3 |
2 | 1.2 | 1 | 1.7 |
3 | 1.1 | 3 | 1.3 |
4 | 0.9 | 1 | 1.3 |
[4]:
teams = pd.read_csv("data/teams.csv", index_col="team")
teams.head()
[4]:
skill | capacity | |
---|---|---|
team | ||
0 | 1 | 2.4 |
1 | 1 | 1.8 |
2 | 1 | 1.1 |
3 | 2 | 1.9 |
4 | 3 | 1.4 |
We first do some quick quality and size checks on the data. Clearly, it will not be possible to complete all projects with the available resource.
[5]:
print(f"There are {len(projects)} projects and {len(teams)} teams.")
print(f"Total project resource required: {projects.resource.sum():.1f}")
print(f"Total team resource available: {teams.capacity.sum():.1f}")
There are 30 projects and 5 teams.
Total project resource required: 35.2
Total team resource available: 8.6
We also check the heterogeneous nature of the projects and teams. This model follows a simple rule: team \(j\) can complete project \(i\) if \(\text{skill}_j \ge \text{difficulty}_i\). So, we can see that the available teams (and therefore capacity) for projects with difficulty 3 is far less than the available capacity for projects with difficulty 1.
[6]:
projects.difficulty.value_counts()
[6]:
difficulty
3 10
1 10
2 10
Name: count, dtype: int64
[7]:
teams.skill.value_counts()
[7]:
skill
1 3
2 1
3 1
Name: count, dtype: int64
When formulating the model, we could create a variable for every index pair \((i, j)\), and later disallow the invalid assignments using constraints. But this would introduce redundant variables, creating a model which is larger than needed. We should instead exploit the natural sparsity of the problem by filtering on the data to find only the variables we need, before we start to create variables.
[8]:
# Pandas does not have a conditional join, but we can use a
# cross join + query to create the list of allowed pairs.
# For larger datasets, this data filtering might be better
# done before loading into python/pandas (e.g. in SQL, which
# will handle this more efficiently).
allowed_pairs = (
pd.merge(
projects.reset_index(),
teams.reset_index(),
how='cross',
)
.query("difficulty <= skill")
.set_index(["project", "team"])
)
print(
f"Model will have {len(allowed_pairs)} variables"
f" (out of a possible {len(projects) * len(teams)})"
)
allowed_pairs
Model will have 80 variables (out of a possible 150)
[8]:
resource | difficulty | value | skill | capacity | ||
---|---|---|---|---|---|---|
project | team | |||||
0 | 4 | 1.1 | 3 | 0.4 | 3 | 1.4 |
1 | 4 | 1.4 | 3 | 1.3 | 3 | 1.4 |
2 | 0 | 1.2 | 1 | 1.7 | 1 | 2.4 |
1 | 1.2 | 1 | 1.7 | 1 | 1.8 | |
2 | 1.2 | 1 | 1.7 | 1 | 1.1 | |
... | ... | ... | ... | ... | ... | ... |
28 | 2 | 1.1 | 1 | 1.0 | 1 | 1.1 |
3 | 1.1 | 1 | 1.0 | 2 | 1.9 | |
4 | 1.1 | 1 | 1.0 | 3 | 1.4 | |
29 | 3 | 0.9 | 2 | 1.3 | 2 | 1.9 |
4 | 0.9 | 2 | 1.3 | 3 | 1.4 |
80 rows × 5 columns
The new dataframe we have constructed has a sparse index, which represents the small set of allowed pairs (roughly half the set of total possible pairs). This approach reduces the model size, and correspondingly the time taken to formulate it.
Model Formulation¶
From the allowed_pairs
dataframe, gurobipy_pandas
provides a free function to create a corresponding series of variables on our new index. We first create a gurobipy Model, then call gppd.add_vars
to create a Gurobi variable for every entry in the index. Since the value
column exists in this dataframe, we can reference this column directly to use it as the linear objective coefficient for each variable.
[9]:
model = gp.Model()
model.ModelSense = GRB.MAXIMIZE
x = gppd.add_vars(model, allowed_pairs, vtype=GRB.BINARY, obj="value", name="x")
x
[9]:
project team
0 4 <gurobi.Var x[0,4]>
1 4 <gurobi.Var x[1,4]>
2 0 <gurobi.Var x[2,0]>
1 <gurobi.Var x[2,1]>
2 <gurobi.Var x[2,2]>
...
28 2 <gurobi.Var x[28,2]>
3 <gurobi.Var x[28,3]>
4 <gurobi.Var x[28,4]>
29 3 <gurobi.Var x[29,3]>
4 <gurobi.Var x[29,4]>
Name: x, Length: 80, dtype: object
To add the necessary constraints to the model, we use pandas native groupby and aggregate operations to group our binary assignment variables along with their resource requirements. The result is a Series of expressions capturing the total resource requirement allocated to a team based on the selected assignments:
[10]:
total_resource = (
(projects["resource"] * x)
.groupby("team").sum()
)
total_resource
[10]:
team
0 1.2 x[2,0] + 0.9 x[4,0] + 1.3 x[5,0] + x[6,0] ...
1 1.2 x[2,1] + 0.9 x[4,1] + 1.3 x[5,1] + x[6,1] ...
2 1.2 x[2,2] + 0.9 x[4,2] + 1.3 x[5,2] + x[6,2] ...
3 1.2 x[2,3] + 0.9 x[4,3] + 1.3 x[5,3] + x[6,3] ...
4 1.1 x[0,4] + 1.4 x[1,4] + 1.2 x[2,4] + 1.1 x[3...
dtype: object
We then use the free function gppd.add_constrs
to create constraints by aligning these expressions with capacity data:
[11]:
capacity_constraints = gppd.add_constrs(
model, total_resource, GRB.LESS_EQUAL, teams["capacity"],
name='capacity',
)
capacity_constraints.head()
[11]:
team
0 <gurobi.Constr capacity[0]>
1 <gurobi.Constr capacity[1]>
2 <gurobi.Constr capacity[2]>
3 <gurobi.Constr capacity[3]>
4 <gurobi.Constr capacity[4]>
Name: capacity, dtype: object
We also need to constrain that each project is allocated to at most one team. This is done using the same function:
[12]:
allocate_once = gppd.add_constrs(
model, x.groupby('project').sum(),
GRB.LESS_EQUAL, 1.0, name="allocate_once",
)
allocate_once.head()
[12]:
project
0 <gurobi.Constr allocate_once[0]>
1 <gurobi.Constr allocate_once[1]>
2 <gurobi.Constr allocate_once[2]>
3 <gurobi.Constr allocate_once[3]>
4 <gurobi.Constr allocate_once[4]>
Name: allocate_once, dtype: object
Extracting Solutions¶
With the model formulated, we solve it using the Gurobi Optimizer:
[13]:
model.optimize()
Gurobi Optimizer version 12.0.0 build v12.0.0rc1 (linux64 - "Ubuntu 22.04.3 LTS")
CPU model: Intel(R) Xeon(R) Platinum 8175M CPU @ 2.50GHz, instruction set [SSE2|AVX|AVX2|AVX512]
Thread count: 1 physical cores, 2 logical processors, using up to 2 threads
Optimize a model with 35 rows, 80 columns and 160 nonzeros
Model fingerprint: 0x39911703
Variable types: 0 continuous, 80 integer (80 binary)
Coefficient statistics:
Matrix range [3e-01, 2e+00]
Objective range [3e-01, 2e+00]
Bounds range [1e+00, 1e+00]
RHS range [1e+00, 2e+00]
Found heuristic solution: objective 7.6000000
Presolve removed 18 rows and 37 columns
Presolve time: 0.00s
Presolved: 17 rows, 43 columns, 78 nonzeros
Found heuristic solution: objective 6.3000000
Variable types: 0 continuous, 43 integer (43 binary)
Found heuristic solution: objective 10.6000000
Root relaxation: objective 1.381744e+01, 12 iterations, 0.00 seconds (0.00 work units)
Nodes | Current Node | Objective Bounds | Work
Expl Unexpl | Obj Depth IntInf | Incumbent BestBd Gap | It/Node Time
0 0 13.81744 0 5 10.60000 13.81744 30.4% - 0s
H 0 0 11.9000000 13.81744 16.1% - 0s
H 0 0 12.3000000 13.81744 12.3% - 0s
H 0 0 12.5000000 13.81744 10.5% - 0s
0 0 13.60000 0 4 12.50000 13.60000 8.80% - 0s
H 0 0 13.4000000 13.60000 1.49% - 0s
0 0 infeasible 0 13.40000 13.40000 0.00% - 0s
Cutting planes:
Gomory: 1
Cover: 2
MIR: 1
StrongCG: 1
RLT: 1
Explored 1 nodes (24 simplex iterations) in 0.03 seconds (0.00 work units)
Thread count was 2 (of 2 available processors)
Solution count 7: 13.4 12.5 12.3 ... 6.3
Optimal solution found (tolerance 1.00e-04)
Best objective 1.340000000000e+01, best bound 1.340000000000e+01, gap 0.0000%
To extract the solution, we use the series accessor .gppd.X
to retrieve solution values for all variables in the series x
:
[14]:
x.gppd.X
[14]:
project team
0 4 0.0
1 4 0.0
2 0 1.0
1 -0.0
2 0.0
...
28 2 -0.0
3 0.0
4 0.0
29 3 1.0
4 0.0
Name: x, Length: 80, dtype: float64
Notice that the result is returned on the same index as x
so we can directly apply pandas tranformations, so we can immediately use pandas methods to transform the result into a more readable form. We could also directly write the result to another file or result API. In this example, we will perform a simple aggregation to show the list of projects allocated to each team:
[15]:
(
x.gppd.X.to_frame()
.query("x >= 0.9").reset_index()
.groupby("team").agg({"project": list})
)
[15]:
project | |
---|---|
team | |
0 | [2, 11] |
1 | [5] |
2 | [6] |
3 | [4, 29] |
4 | [14, 15, 26] |
To aid in follow-up analysis, we can also use the series accessor .gppd.Slack
on any constraint series to determine the slack in inequality constraints. For example, the following shows the spare capacity of each team in the current assignment:
[16]:
capacity_constraints.gppd.Slack
[16]:
team
0 0.1
1 0.5
2 0.1
3 0.1
4 0.1
Name: capacity, dtype: float64