![]() ![]() We planned to reward the agent for its closeness to the puck, the puck’s velocity vector and scoring. We would first imitate Jurgen and then adapt its behavior with a reward function on top of that knowledge. Then, we decided to implement reinforcement-learning on top of our imitation model. So, ultimately we scrapped the idea in favor of imitating Jurgen. It also acted less fluid than Jurgen and we thought this might be a harder set of actions and strategies for the imitation network to copy. However, we could not figure out how to implement this approach given our limited time and lack of experience with projective geometry.Īfter many hours of hand-tuning the controller, we could not get it to perform much better than Jurgen. We theorize that a reason for our controller’s poor performance was due to our approach of the agent always driving towards and hitting the puck with its chasing behavior regardless of the angle, rather than first driving to a different position to get a better angle or steering to hit the puck at an angle in which it is more likely to score. We found that the goalie/chaser combination performed the best against the TA agents, but it was still not good enough to consistently beat them, especially the better agents(Jurgen). After creating this goalie behavior, we tried different combinations of two naive chasers, one goalie/chaser, and two goalies. The advantage of this approach was that it would ensure that the agents would only hit the puck towards the enemy goal, rather than its own goal, and have a higher chance of having a good angle to hit the puck into the enemy goal. The next improvement we made to the naive chaser agents was to create a “goalie” agent that would chase the puck whenever the puck was between itself and the enemy goal, but if the agent ever passed the puck then it would reverse towards its own goal until it was behind the puck, and then begin chasing again. ![]() While this fix solved the issue of the karts being stuck, we quickly realized that the approach of using two chaser agents was not adequate to consistently beat the TA agents or score at least one goal/game. To fix this issue, we created an instance variable to track if an agent was stuck based on how long its velocity remained below a certain threshold, and once we determined that it was stuck we set it to reverse for a certain number of frames before proceeding to chase the puck again. However, we immediately ran into a problem with this approach where our agents would eventually get stuck against a wall and be unable to reverse since we coded them to only drive forward. Our initial strategy was to only steer towards the puck and move forward, setting both agents to constantly chase the puck. Our next plan was to create a hand-tuned controller AI that outperformed Jurgen so that we could learn from something better than the bots we would compete with. So, the first main optimization was to collect action state pairs from both sides of the field. We realized that we were only training from a single side of the field and that the network only drove in circles when we had it play the other side. In order to fix this we did data shuffling and training on 150 games taken equally from each agent vs. The first issue we encountered was making the first imitation learning model, because we had improper shuffling (sequential data) which produced very poor models. This initial implementation scored poorly and gave us ideas on how to continue. First, we collected games where Jurgen plays itself and fed them into a network, and learned state action pairs. ![]() We started by implementing a basic off-policy imitation learning model on the Jurgen agent. We took four different implementation approaches for our project: imitation learning, reinforcement learning, gradient-free optimization, and imitation learning with DAGGER. Our team decided to use a state based approach that learns state action relationships because the state based approach gave us access to game state information that was not provided in the image agent. The goal of our AI is to be able to defeat precoded opponents and challenge other teams' AI. In this project, we attempt to learn to play hockey in SuperTuxKart with a neural network using a state based agent. Placed first among all state based agents Trial game where we put our agent against the ta agent trial.mp4 Final game played between the top 2 teams final_win.mp4 Group MembersĪlbin Shrestha, Andrew Wu, Bruce Moe, John Mackie, and Varad Thorat Introduction A state based learning approach to winning a game of hockey with cars in Supertuxkart ![]()
0 Comments
Leave a Reply. |