Learning, Evaluating and Optimizing Behavior Policies for Autonomous Vehicles

Hart, Patrick Christopher

Patrick Christopher Hart

If you experience problems opening the document, please try this link.

Original title:: Learning, Evaluating and Optimizing Behavior Policies for Autonomous Vehicles
Translated title:: Lernen, Evaluieren und Optimieren von Verhaltensstrategien für Autonome Fahrzeuge
Author:: Hart, Patrick Christopher
Year:: 2021
Document type:: Dissertation
Faculty/School:: Fakultät für Informatik
Advisor:: Knoll, Alois (Prof. Dr. habil.)
Referee:: Knoll, Alois (Prof. Dr. habil.); Kochenderfer, Mykel (Prof., Ph.D.)
Language:: en
Subject group:: DAT Datenverarbeitung, Informatik
Keywords:: reinforcement learning, machine learning, autonomous vehicles
TUM classification:: DAT 260; DAT 815
Abstract:: This thesis applies reinforcement learning to learn behavior policies for autonomous vehicles, evaluates and post-optimizes these to obtain smooth behaviors. It proposes reward shaping functions, discusses input representations, and introduces a graph neural network actor-critic architecture that is invariant towards the number and order of vehicles. The behavior policies are evaluated at runtime using a counterfactual behavior policy evaluation. A post-optimization smoothens the behavior that preserves the interactions with others while guaranteeing the same constraints. «
This thesis applies reinforcement learning to learn behavior policies for autonomous vehicles, evaluates and post-optimizes these to obtain smooth behaviors. It proposes reward shaping functions, discusses input representations, and introduces a graph neural network actor-critic architecture that is invariant towards the number and order of vehicles. The behavior policies are evaluated at runtime using a counterfactual behavior policy evaluation. A post-optimization smoothens the behavior that p... »
Translated abstract:: Diese Thesis wendet Reinforcement Learning zum Erlernen von Verhaltensstrategien für autonome Fahrzeuge an, wertet diese aus und optimiert diese nach, um reibungslose Verhalten zu erhalten. Es schlägt Belohnungsformungsfunktionen vor, diskutiert Eingabedarstellungen und führt eine Graph Neural Network Actor-Critic Architektur ein, die invariant gegenüber der Anzahl und Reihenfolge der Fahrzeuge ist. Die Verhaltensstrategien werden zur Laufzeit unter Verwendung der Counterfactual Behavior Policy Evaluation evaluiert. Das Verhalten wird von einer Nachoptimierung geglättet, welche die Interaktionen mit anderen bewahrt und gleichzeitig die gleichen Einschränkungen garantiert. «
Diese Thesis wendet Reinforcement Learning zum Erlernen von Verhaltensstrategien für autonome Fahrzeuge an, wertet diese aus und optimiert diese nach, um reibungslose Verhalten zu erhalten. Es schlägt Belohnungsformungsfunktionen vor, diskutiert Eingabedarstellungen und führt eine Graph Neural Network Actor-Critic Architektur ein, die invariant gegenüber der Anzahl und Reihenfolge der Fahrzeuge ist. Die Verhaltensstrategien werden zur Laufzeit unter Verwendung der Counterfactual Behavior Policy... »
WWW:: https://mediatum.ub.tum.de/?id=1601897
Date of submission:: 29.03.2021
Oral examination:: 28.09.2021
File size:: 2072601 bytes
Pages:: 136
Urn (citeable URL):: https://nbn-resolving.de/urn/resolver.pl?urn:nbn:de:bvb:91-diss-20210928-1601897-1-2
Last change:: 15.11.2021
BibTeX