ABOUT ME

-

Today
-
Yesterday
-
Total
-
  • Classification with Dota2 Game Results Dataset (PyTorch)
    CS/Machine Learning 2020. 12. 22. 14:30

    Project Introduction

    Dota 2 is a very famous multiplayer online battle arena game with over 11 million monthly users. For those of you who don't know this game(like me lol), each gameplay consists of two teams of five players and each player chooses a hero from 120 of them. Each hero has his/her pros and cons. Therefore, the combinations of these characters can affect whether the team will win the game or not. 

     

    This project aims to train a deep neural network model that successfully predicts which team has won the game or not, utilizing "Dota2 Game Results Data Set". 

     

    Dataset

    The Dataset used to train the model in this project is "Dota2 Game Results Data Set" from UCI Machine Learning Library. This data can be obtained from this link.  

     

    UCI Machine Learning Repository: Dota2 Games Results Data Set

    Dota2 Games Results Data Set Download: Data Folder, Data Set Description Abstract: Dota 2 is a popular computer game with two teams of 5 players. At the start of the game each player chooses a unique hero with different strengths and weaknesses. Data Set C

    archive.ics.uci.edu

    This dataset consists of 116 attributes.

    The first attribute is the information about which team has won the game. Therefore, the value is either 1 or -1. 

    The second attribute is the cluster ID, which is related to the location of the game. 

    The third attribute is information about game mode (e.g. All Pick, Captains Mode). 

    The fourth attribute is the information about the game type. (e.g. Ranked). These information may be more understandable if you are a player of Dota 2. 

    Then, there are 113 attributes left. These attributes are information about which characters are used in the game. There are 113 characters because this dataset was collected in 2016. Each attribute represents each character. If team player chose this character, the attribute value is 1. If opposite team player has chosen the character, the attribute value is -1. If none of the players chose the character, the attribute value will remain as 0. It is a very sparse dataset. 

     

    Dota 2 Characters

    To check whether the dataset has wide range of data, I checked the numbers of team 1 win and team -1 win. 

    sns.countplot(x = 'win', data=dft)

    We can observe that the data isn't biased towards one particular case. 

    Data Preprocessing

    The original dataset does not contain any headers. For convenient processing of data, I decided to add headers to each column when reading data from the csv file. 

    names = ['win']
    for i in range(1, 117):
        names.append('col'+str(i))
    df = pd.read_csv("data/dota2Train.csv", names=names)
    df.head()

    The first attribute labeled as 'win' is to be set as y (result). Before separating data, I decided to change the value of win when it's -1 to 0. 

    encode_map = {
        1:1,
        -1:0
    }
    dft['win'].replace(encode_map, inplace=True)
    dft.head()

    Also, if you observe the original data. The values in col1 (the cluster ID) is far bigger than other attributes. These difference in data may cause the model to be biased towards the cluster ID. Therefore, normalization is needed. Normalization was performed by utilizing sklearn's MinMaxScaler. The code below demonstrates how normalization was peformed. 

    from sklearn.preprocessing import MinMaxScaler
    min_max_scaler = MinMaxScaler()
    fitted = min_max_scaler.fit(df)
    print(fitted.data_max_)
    dft = min_max_scaler.transform(df)
    dft = pd.DataFrame(dft, columns=df.columns, index=list(df.index.values))

    So, now we can see that the values of win are either 0 or 1. Also all data are succesfully normalized with the value between 0 and 1. Now, we are ready to build a model. 

     

    Next, we need to divide the given dataset into X and y. X is the data that will be used to train the model. y is the label of the data. y can be used to check whether the predicted y is equal to the true y. 

    X = df.iloc[:, 1:]
    y = df.iloc[:, 0]
    

    Last but not least, I will divide given dataset into training dataset and testing dataset. 

    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=69)

     

    Epoch, Batch Size and Learning Rate are pre-defined. 

    EPOCHS = 10000
    BATCH_SIZE = 32
    LEARNING_RATE = 0.01

    DataLoader is used to prepare data as the input for PyTorch neural network. 

    class trainData(Dataset):
        def __init__(self, X_data, y_data):
            self.X_data = X_data
            self.y_data = y_data
            
        def __getitem__(self, index):
            return self.X_data[index], self.y_data[index]
            
        def __len__ (self):
            return len(self.X_data)
    
    train_data = trainData(torch.FloatTensor(X_train), torch.FloatTensor(y_train))
    
    class testData(Dataset):
        def __init__(self, X_data):
            self.X_data = X_data
            
        def __getitem__(self, index):
            return self.X_data[index]
            
        def __len__ (self):
            return len(self.X_data)
        
    
    test_data = testData(torch.FloatTensor(X_test))
    train_loader = DataLoader(dataset=train_data, batch_size=BATCH_SIZE, shuffle=True)
    test_loader = DataLoader(dataset=test_data, batch_size=1)

    Model

    The model that I have chosen is Feed-Forward Neural Network with 3 layers. The data consists of 116 attributes, so the input of the first layer is 116. Then it is converted to 64, then 32, then 10, and then 1 as output. Since, this project is classification, it makes sense to have one output. 

    The activation function used between layer is ReLU.

    Batch normalization is applied after applying the activation function, then passed to the next layer. It normalizes input layer to stabilize the neural network. 

    Dropout is used before the last layer. It has the effect of reducing overfitting and increasing generalization. 

    The neural network model is built with the code below. 

    class binaryClassification(nn.Module):
        def __init__(self):
            super(binaryClassification, self).__init__()
            
            self.layer_1 = nn.Linear(116, 64) 
            self.layer_2 = nn.Linear(64, 32)
            self.layer_3 = nn.Linear(32, 10)
            self.layer_out = nn.Linear(10, 1) 
            
            self.relu = nn.ReLU()
            self.dropout = nn.Dropout(p=0.1)
            self.batchnorm1 = nn.BatchNorm1d(64)
            self.batchnorm2 = nn.BatchNorm1d(32)
            self.batchnorm3 = nn.BatchNorm1d(10)
            
        def forward(self, inputs):
            x = self.relu(self.layer_1(inputs))
            x = self.batchnorm1(x)
            x = self.relu(self.layer_2(x))
            x = self.batchnorm2(x)
            x = self.relu(self.layer_3(x))
            x = self.batchnorm3(x)
            x = self.dropout(x)
            x = self.layer_out(x)
            
            return x

    So, we build the model. Then, contruct loss and optimizer. Learning rate is my pre-defined value. 

    model = binaryClassification()
    criterion = nn.BCEWithLogitsLoss()
    optimizer = optim.Adam(model.parameters(), lr=LEARNING_RATE)

    The function below measures the accuracy of the data, so that accruacy can be tracked with each epoch. 

    def binary_acc(y_pred, y_test):
        y_pred_tag = torch.round(torch.sigmoid(y_pred))
    
        correct_results_sum = (y_pred_tag == y_test).sum().float()
        acc = correct_results_sum/y_test.shape[0]
        acc = torch.round(acc * 100)
        
        return acc

    Finally, the model is trained. With each epoch,

    1. compute gradients

    2. predict y with given model

    3. compute loss and accuracy of the prediction

    4. carry out backward pass

    5. update weights

    model.train()
    for e in range(1, EPOCHS+1):
        epoch_loss = 0
        epoch_acc = 0
        for X_batch, y_batch in train_loader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)
            optimizer.zero_grad()
            
            y_pred = model(X_batch)
            
            loss = criterion(y_pred, y_batch.unsqueeze(1))
            acc = binary_acc(y_pred, y_batch.unsqueeze(1))
            
            loss.backward()
            optimizer.step()
            
            epoch_loss += loss.item()
            epoch_acc += acc.item()
            
    
        print(f'Epoch {e+0:03}: | Loss: {epoch_loss/len(train_loader):.5f} | Acc: {epoch_acc/len(train_loader):.3f}')

    I wanted to see how high the accuracy can rise to with given enough epoch. Therefore, I set the epoch as 10,000 and then trained the model. It took over 20 hours. The observation is that the accuracy started as 58.07% and rose up to 77.85% towards the end. 

    epoch 1 - 19
    epoch 9982-10000

    However, one thing to note is that the accuracy has been on 77% since epoch around 3600, therefore, there hasn't been much improvement for 7000 epoch. It would have been more reasonable to end the training when there wasn't significant improvement in the training. 

    epoch 3654-3672

    Experiment

    I was curious about the effect of number of layers on the accuracy of model. Therefore, I decided to vary the number of layers for the training model from single layer to 5 layers, then test the trend of accuracy during the training. Given the limited amount of time, I set the epoch to 200. 

     

    1. Single layer

    class binaryClassification1(nn.Module):
        def __init__(self):
            super(binaryClassification1, self).__init__()
            
            self.layer_out = nn.Linear(116, 1) 
            
        def forward(self, inputs):
            x = self.layer_out(inputs)
            
            return x

     

    2. 2 layers

    class binaryClassification2(nn.Module):
        def __init__(self):
            super(binaryClassification2, self).__init__()
            
            self.layer_1 = nn.Linear(116, 64) 
            self.layer_out = nn.Linear(64, 1) 
            
            self.relu = nn.ReLU()
            self.dropout = nn.Dropout(p=0.1)
            self.batchnorm1 = nn.BatchNorm1d(64)
            
        def forward(self, inputs):
            x = self.relu(self.layer_1(inputs))
            x = self.batchnorm1(x)
            x = self.dropout(x)
            x = self.layer_out(x)
            
            return x

    3. 3 layers

    class binaryClassification3(nn.Module):
        def __init__(self):
            super(binaryClassification3, self).__init__()
            
            self.layer_1 = nn.Linear(116, 64) 
            self.layer_2 = nn.Linear(64, 32)
            self.layer_out = nn.Linear(32, 1) 
            
            self.relu = nn.ReLU()
            self.dropout = nn.Dropout(p=0.1)
            self.batchnorm1 = nn.BatchNorm1d(64)
            self.batchnorm2 = nn.BatchNorm1d(32)
            
        def forward(self, inputs):
            x = self.relu(self.layer_1(inputs))
            x = self.batchnorm1(x)
            x = self.relu(self.layer_2(x))
            x = self.batchnorm2(x)
            x = self.dropout(x)
            x = self.layer_out(x)
            
            return x

     

    4. 4 layers

    class binaryClassification4(nn.Module):
        def __init__(self):
            super(binaryClassification4, self).__init__()
            
            self.layer_1 = nn.Linear(116, 64) 
            self.layer_2 = nn.Linear(64, 32)
            self.layer_3 = nn.Linear(32, 10)
            self.layer_out = nn.Linear(10, 1) 
            
            self.relu = nn.ReLU()
            self.dropout = nn.Dropout(p=0.1)
            self.batchnorm1 = nn.BatchNorm1d(64)
            self.batchnorm2 = nn.BatchNorm1d(32)
            self.batchnorm3 = nn.BatchNorm1d(10)
            
        def forward(self, inputs):
            x = self.relu(self.layer_1(inputs))
            x = self.batchnorm1(x)
            x = self.relu(self.layer_2(x))
            x = self.batchnorm2(x)
            x = self.relu(self.layer_3(x))
            x = self.batchnorm3(x)
            x = self.dropout(x)
            x = self.layer_out(x)
            
            return x

     

    5. 5 layers

    class binaryClassification5(nn.Module):
        def __init__(self):
            super(binaryClassification5, self).__init__()
            
            self.layer_1 = nn.Linear(116, 64) 
            self.layer_2 = nn.Linear(64, 32)
            self.layer_3 = nn.Linear(32, 10)
            self.layer_4 = nn.Linear(10, 5)
            self.layer_out = nn.Linear(5, 1) 
            
            self.relu = nn.ReLU()
            self.dropout = nn.Dropout(p=0.1)
            self.batchnorm1 = nn.BatchNorm1d(64)
            self.batchnorm2 = nn.BatchNorm1d(32)
            self.batchnorm3 = nn.BatchNorm1d(10)
            self.batchnorm4 = nn.BatchNorm1d(5)
            
        def forward(self, inputs):
            x = self.relu(self.layer_1(inputs))
            x = self.batchnorm1(x)
            x = self.relu(self.layer_2(x))
            x = self.batchnorm2(x)
            x = self.relu(self.layer_3(x))
            x = self.batchnorm3(x)
            x = self.relu(self.layer_4(x))
            x = self.batchnorm4(x)
            x = self.dropout(x)
            x = self.layer_out(x)
            
            return x

    I have used these 5 different models to follow the trend of accuracy of data to observe whether there will be effect of number of layers on the training. 

    Result

    The plot above illustrates the result of the experiment. For models with single and two layers, the accuracy did not change significantly from its original value. However, it is observed that the accuracy constantly rises for models with 3, 4 and 5 layers. From the results of the experiment, it is observed that the accruacy rises more fast when there are more layers. 

    Conclusion

    Considering that this is a binary classification, the accuracy did not rise as much as I have hoped even after data normalization. 
    The maximum accuracy I have achieved was 77% after 20 hours of training. 

    However, I was glad to conduct this experiment because I was curious about how to decide on number of layers of deep neural network and the small expriement I have conducted gave me a slight insight on its effect. Given more time, it would have been nice to conduct this experiment with more various environments with longer training time of the model. 

     

    I will end this writing with my favorite meme. Thank you for reading! 

    The entire code can be referenced from this github

    Reference

    All of the writings are written by me, but I have referenced a lot of the code from the links below. 

    towardsdatascience.com/pytorch-tabular-binary-classification-a0368da5bb89 

     

    Pytorch [Tabular] — Binary Classification

    This blog post takes you through an implementation of binary classification on tabular data using PyTorch.

    towardsdatascience.com

    teddylee777.github.io/scikit-learn/sklearn%EC%99%80-pandas%EB%A5%BC-%ED%99%9C%EC%9A%A9%ED%95%9C-%EA%B0%84%EB%8B%A8-%EB%8D%B0%EC%9D%B4%ED%84%B0%EB%B6%84%EC%84%9D

     

    Pandas와 scikit-learn으로 정말 간단한 pre-processing 몇 가지 팁

    pandas, scikit-learn을 활용한 정말 간단한 pre-processing 몇 가지 팁에 대하여 알아보겠습니다.

    teddylee777.github.io

     

Designed by Tistory.