Skip to content

Latest commit

 

History

History
261 lines (213 loc) · 25 KB

File metadata and controls

261 lines (213 loc) · 25 KB

A. What is a Graph?

🌀 Expand for an short explanations on Graphs

Before we continue it might be good to briefly explain what a Graph even in is!

A Graph is a data structure consisting of:

  • Nodes: Individual elements in the graph
  • Edges: Connections between nodes

The graph is typically represented by:

  • Adjacency matrix: Shows connections between nodes
  • Node features: Attributes or properties of each node
  • Edge features: Attributes of the connections between nodes

The image on the right represents a stylized version of a frame of tracking data in soccer.

In section 6.1 we can see what this looks like in Python.

Graph representation

B. What are all GraphConverter settings?

🌀 ⚽ 🏈 Expand for a full table of additional optional GraphConverter parameters
Parameter Type Description Default Sport
prediction bool When True use the converter to create Graph dataset to apply a pre-trained model to, no labels required. Defaults to False. False ⚽ 🏈
adjacency_matrix_connect_type str The type of connection used in the adjacency matrix, typically related to the ball. Choose from 'ball', 'ball_carrier' or 'no_connection' 'ball' ⚽ 🏈
adjacency_matrix_type str The type of adjacency matrix, indicating how connections are structured, such as split by team. Choose from 'delaunay' , 'split_by_team', 'dense', 'dense_ap' or 'dense_dp' 'split_by_team' ⚽ 🏈
self_loop_ball bool Flag to indicate if the ball node should have a self-loop, aka be connected with itself and not only player(s) True ⚽ 🏈
label_type str The type of prediction label used. Currently only supports 'binary' 'binary' ⚽ 🏈
random_seed int, bool When a random_seed is given, it will randomly shuffle an individual Graph without changing the underlying structure. When set to True, it will shuffle every frame differently; False won't shuffle. Advised to set True when creating an actual dataset to support Permutation Invariance. False ⚽ 🏈
pad bool True pads to a total amount of 22 players and ball (so 23x23 adjacency matrix). It dynamically changes the edge feature padding size based on the combination of AdjacencyMatrixConnectType and AdjacencyMatrixType, and self_loop_ball. No need to set padding because smaller and larger graphs can all be used in the same dataset. False ⚽ 🏈
verbose bool The converter logs warnings / error messages when specific frames have no coordinates, or other missing information. False mutes all of these warnings. False ⚽ 🏈
defending_team_node_value float Value for the node feature when player is on defending team. Should be between 0 and 1 including. 0.1 ⚽ 🏈
attacking_non_qb_node_value float Value for the node feature when player is NOT the QB, but is on the attacking team 0.1 🏈
chunk_size int Set to determine size of conversions from Polars to Graphs. Preferred setting depends on available computing power 2_000 🏈
non_potential_receiver_node_value float Value for the node feature when player is NOT a potential receiver of a pass (when on opposing team or in possession of the ball). Should be between 0 and 1 including. 0.1

C. What features does each Graph have?

🌀 ⚽ Expand for a full list of Soccer features (note: `SoccerGraphConverter`, `SoccerGraphConverter` has slightly different features)
Variable Datatype Index Features
a np.array of shape (players+ball, players+ball) -
x np.array of shape (n_nodes, n_node_features) 0 normalized x-coordinate
1 normalized y-coordinate
2 x component of the velocity unit vector
3 y component of the velocity unit vector
4 normalized speed
5 normalized angle of velocity vector
6 normalized distance to goal
7 normalized angle to goal
8 normalized distance to ball
9 normalized angle to ball
10 attacking (1) or defending team (defending_team_node_value)
11 potential receiver (1) else non_potential_receiver_node_value
e np.array of shape (np.non_zero(a), n_edge_features) 0 normalized inter-player distance
1 normalized inter-player speed difference
2 inter-player angle cosine
3 inter-player angle sine
4 inter-player velocity vector cosine
5 inter-player velocity vector sine
6 optional: 1 if two players are connected else 0 according to delaunay adjacency matrix. Only if adjacency_matrix_type is NOT 'delauney'
y np.array -
id int, str, None -

🌀 🏈 Expand for a full list of American Football features
Variable Datatype Index Features
a np.array of shape (players+ball, players+ball) -
x np.array of shape (n_nodes, n_node_features) 0 normalized x-coordinate
1 normalized y-coordinate
2 x component of the speed unit vector
3 y component of the speed unit vector
4 normalized speed
5 x component of the acceleration unit vector
6 y component of the acceleration unit vector
7 normalized acceleration
8 sine of the normalized direction (angle)
9 cosine of the normalized direction (angle)
10 sine of the normalized orientation (angle)
11 cosine of the normalized orientation (angle)
12 normalized distance to goal
13 normalized distance to ball
14 normalized distance to end zone
15 possession team or defending team (defending_team_node_value) indicator
16 quarterback indicator or attacking_non_qb_node_value or 0 (defending team)
17 ball indicator
18 normalized weight
19 normalized height
e np.array of shape (np.non_zero(a), n_edge_features) 0 inter-player distance
1 inter-player speed difference
2 inter-player acceleration difference
3 cosine of the inter-player positional angle
4 sine of the inter-player positional angle
5 cosine of the inter-player direction angle
6 sine of the inter-player direction angle
7 cosine of the inter-player orientation angle
8 sine of the inter-player orientation angle
y np.array -
id int, str, None -

D. What is a CustomGraphDataset?

🌀 Expand for a short explanation on GraphDataset

Let's have a look at the internals of our GraphDataset. This dataset class contains a list of graphs, available through dataset.graphs.

The first item in our dataset has 23 nodes, 12 features per node and 7 features per edge.

dataset.graphs[0]

>>> Graph(n_nodes=23, n_node_features=12, n_edge_features=7, n_labels=1)

The GraphDataset also allows us to split our data into train and test sets (and validation set if required) by using either:

  • dataset.split_test_train_validation()
  • dataset.split_test_train()


🌀 Expand for a short explanation on the representation of adjacency matrix
Adjacency Matrix

The adjacency matrix is represented as a compressed sparse row matrix, as required by Spektral. A 'normal' version of this same matrix would be of shape 23x23 filled with zero's and one's in places where two players (or ball) are connected.

Because we set adjacency_matrix_type='split_by_team' and adjacency_matrix_connect_type="ball" this results in a total of 287 connections (ones), namely between:

  • adjacency_matrix_type='split_by_team':
    • All players on team A (11 * 11)
    • All players on team B (11 * 11)
    • Ball connected to ball (1)
  • adjacency_matrix_connect_type="ball"
    • All players and the ball (22)
    • The ball and all players (22)
dataset.graphs[0].a
>>> <Compressed Sparse Row sparse matrix of dtype 'float64'
	    with 287 stored elements and shape (23, 23)>


🌀 Expand for a short explanation on the representation of node feature matrix
Node Features

The node features are described using a regular Numpy array. Each column represents one feature and every row represents one player.

The ball is presented in the last row, unless we set random_seed=True then every Graph gets randomly shuffled (while leaving connections in tact).

See the bullet points in 5. Load Kloppy Data, Convert and Store to learn which column represents which feature.

The rows filled with zero's are 'empty' players created because we set pad=True. Graph Neural Networks are flexible enough to deal with all sorts of different graph shapes in the same dataset, normally it's not actually necessary to add these empty players, even for incomplete data with only a couple players in frame.

dataset.graphs[0].x
>>> [[-0.163 -0.135  0.245 -0.97   0.007  0.289  0.959  0.191  0.059  0.376  1.     1.   ]
 [-0.332  0.011 -0.061  0.998  0.02   0.76   1.015  0.177  0.029  0.009  1.     0.1  ]
 [ 0.021 -0.072  0.987 -0.162  0.017  0.474  0.88   0.203  0.121  0.468  1.     1.   ]
 [-0.144  0.232  0.343  0.939  0.024  0.694  0.924  0.186  0.077  0.638  1.     1.   ]
 [-0.252  0.302  0.99   0.141  0.032  0.523  0.964  0.176  0.078  0.741  1.     1.   ]
 [ 0.012  0.573  0.834 -0.551  0.035  0.407  0.842  0.191  0.19   0.646  1.     1.   ]
 [-0.293  0.686  0.999 -0.045  0.044  0.493  0.966  0.163  0.182  0.761  1.     1.   ]
 [ 0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.   ]
 [ 0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.   ]
 [ 0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.   ]
 ...
 [ 0.202  0.124 -0.874  0.486  0.024  0.919  0.791  0.214  0.197  0.524  0.1    0.1  ]
 [ 0.404  0.143 -0.997  0.08   0.029  0.987  0.709  0.23   0.281  0.519  0.1    0.1  ]
 [ 0.195 -0.391  0.48  -0.877  0.014  0.33   0.847  0.218  0.222  0.417  0.1    0.1  ]
 [ 0.212 -0.063  0.982 -0.187  0.009  0.47   0.804  0.217  0.2    0.483  0.1    0.1  ]
 [-0.03   0.248 -0.996  0.091  0.021  0.986  0.876  0.194  0.116  0.591  0.1    0.1  ]
 [ 0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.   ]
 [ 0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.   ]
 [ 0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.   ]
 [ 0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.     0.   ]
 [-0.262  0.016  0.937 -0.35   0.036  0.443  0.986  0.044  0.     0.     0.     0.   ]]

 
dataset.graphs[0].x.shape
>>> (23, 12)


🌀 Expand for a short explanation on the representation of edge feature matrix
Edge Features

The edge features are also represented in a regular Numpy array. Again, each column represents one feature, and every row decribes the connection between two players, or player and ball.

We saw before how the adjacency matrix was presented in a Sparse Row Matrix with 287 rows. It is no coincidence this lines up perfectly with the edge feature matrix.

dataset.graphs[0].e
>>> [[ 0.     0.     1.     0.5    0.5    1.     0.   ]
 [ 0.081  0.006  0.936  0.255  0.21   0.907  1.   ]
 [ 0.079  0.004  0.012  0.391  0.     0.515  1.   ]
 [ 0.1    0.007  0.46   0.002  0.005  0.571  1.   ]
 [ 0.125  0.011  0.65   0.023  0.474  0.999  0.   ]
 [ 0.206  0.012  0.322  0.033  0.535  0.999  0.   ]
 [ 0.23   0.016  0.619  0.014  0.567  0.996  0.   ]
 [ 0.     0.     0.     0.     0.     0.     0.   ]
 [ 0.     0.     0.     0.     0.     0.     0.   ]
 [ 0.     0.     0.     0.     0.     0.     0.   ]
 ...
 [ 0.197 -0.025  0.005  0.426  0.929  0.757  1.   ]
 [ 0.281 -0.023  0.004  0.439  0.959  0.699  1.   ]
 [ 0.222 -0.03   0.067  0.75   0.979  0.643  1.   ]
 [ 0.2   -0.032  0.003  0.554  0.982  0.633  1.   ]
 [ 0.116 -0.026  0.08   0.229  0.82   0.884  1.   ]
 [ 0.     0.     0.     0.     0.     0.     1.   ]
 [ 0.     0.     0.     0.     0.     0.     1.   ]
 [ 0.     0.     0.     0.     0.     0.     1.   ]
 [ 0.     0.     0.     0.     0.     0.     1.   ]
 [ 0.     0.     1.     0.5    0.5    1.     1.   ]]

 dataset.graphs[0].e.shape
 (287, 7)