Autonomous Navigation Robot

As part of my certificate (aka minor) in Robotics and Intelligent Systems, I took a course called MAE345 Introduction to robotics. This course covered a broad range of topics relevant to many modern applications of robotics, such as optical flow algorithims, state estimation, Simulatanous Localization and Mapping (SLAM), and more. The class culminated in a final project, where we each built our own autonomous navigation vehicle.


Figure 1: Bill of Materials

Figure 2: Nvidia Jetson Nano board

Hardware Overview

This vehicle, referred to as a "Jetbot," was built using the Nvidia Jetson Nano, which is a developer kit with GPIO pins, and a powerful CPU and GPU that make it ideal for mobile machine learning and computer vision applications. The Jetson runs Ubuntu Linux, and can be accessed via SSH. From there, I wrote and executed code in Jupyter notebooks (python based). The Jeston Nano was attatched to a chasis which featured 2 geared DC motors and appropriate motor drivers, an adjustable wide angle 8MP 160 degree fov IMX219-160 Camera, a cooling fan for the GPU, and 2 omni directional castor wheels.


Figure 3: Robot training arena and workstation

  1. Basic Motion

After assembling everything correctly, and connecting via SSH, the first step was to make the robot move. Nvidia provides a class called "Robot" within the "Jetbot" package which takes care of the GPIO settings and PWM motor controls, allowing you to set the speed of each motor. While this kind of worked, I found that the specific DC motors being used were low quality and making the robot drive forward in a straight line was nearly impossible. To fix this, I wrote a new class to control the motors which included alpha and beta values that were experimentally calculated to adjust for perfomance differences between the two motors. I also wrote a method to allow the motors to speed up and slow down according to a ramp function to avoid jerky starts and stops.

After running into issues with the small wheels and castors getting stuck in the grooves of the tile floors, I decided to clear out my entire living room and create a robot training arena.

Figure 4: Object detection on various objects from the COCO dataset.

2. Object Detection and Goal Navigations

The next step was to train a model to detect and properly label objects. In order to do this, I started by using a pre-trained neural network trained on the COCO dataset. The original model, a mobilenet SSD, comes from the TensorFlow object detection API. From there, I optimized it to work better on the Jetson Nano using NVIDIA TensorRT, which is an SDK built on CUDA that includes a deep learning inference optimizer.

This model worked well enough on a variety of stationary objects. In the pictures to the left I show the model detecting an orange, a tennis racket, and a keyboard. This model had two main issues. First, it was unreliable and detecting some objects much better than others. As you can see, it did not detect the banana even though it was trained on thousands of pictures of bananas. The second issue was that there was too much latency and lag, meaning the robot had to stop for a brief moment in between capturing pictures to process. To fix this, I trained my own model on one object (myself) using the camera on the robot and a variety of different modified paramaters. This ended up working much better.

Once the model reliably detected objects, I could have the robot navigate to the desired goal object. "Tracked label" is the ID number of the COCO dataset object I am setting as the "goal" object." The object detection method returns the coordinates of the bounding box. I used these coordinates combined with some logic statements and loops to get the robot to A. rotate in place until it sees the goal object. B. naviagate towards the goal object by calculating how far the goal object is from the center of the image (dead ahead of the robot) and turning proportional to that horizontal distance. C. Stop when it gets to the goal object.

Figure 5: Training the collision avoidance model. Example of a "free" and "blocked" position with image captured to use in training.

3. Collision Avoidance

The last step for the autonomous operation of the vehicle was to train a model to detect collisions and design logic to avoid potential collisions as the robot navigates to the goal.

To do this, I first collected training data. I created a virtual "safety bubble" around the robot. I then exposed the robot to a wide variety of angles, positions, and orientations, with multiple different types of obstacles and collision scenarios. If the robot was in a position where there was an obstacle in its safety bubble, I would capture the image from the robots camera feed and label it with an automatically generated Universally Unique Identifier (UUID) generated using the timestamp and MAC address of the jetson and save it to a folder called"blocked". If there was no obstacle in the safety bubbles I would capture the image, label it with a UUID, and store it in a folder marked "free." To make this task easier, I used a video game controller to remotely control the movement of the robot and programmed a button to capture and save "free" images and a different button to capture and save "blocked" images. This greatly increased the efficiency of training data collection, allowing me to capture 500+ of each category.


Next, I used the training data to train a neural network. I started by preparing the training data using transforms for image datasets provided by the popular deep learning library PyTorch. Next I split the data into training and testing sets. I used a method called "transfer learning" where a pre-trained model (trained on over 1 million images) can be repurposed and trained for a new task using much less training data (on the order of a few hundred images). In my first attempt, I used PyTorch to repurpose an existing AlexNet model. AlexNet is a Convolutional Neural Network (CNN) featuring 8 layes (5 convolutional and 3 fully connected) and it has a reputation for high accuracy on challenging data sets. I trained the neural network on the Nvidia Jetson GPU (via CUDA) using 30 epochs and achieved test accuracies between 93-99%. In practice, however, this collision avoidance model was only mediocre, leading the robot to crash into some objects at full speed.


While I could have explored collecting more or better training data, I took the opportunity to explore other image recognition models. I chose to use a ResNet18 model, one of AlexNet's succesors that was the best at classifying objects from the COCO dataset in 2015. To speed the training up I trained the model on Google Collab, which allows anyone to use powerful 12GB Nvidia Tesla K80 GPU for free. Google Collab has become one of my favorite tools to use when doing anything related to data science or machine learning. If you haven't used it before, I highly reccomend you check it out. In practice, the ResNet18 model had marginally better performance, but the robot still had a hard time detecting objects and avoiding collisions in real time at full speed. I believe that with a combination of more reliable motors, increased onboard processing power, and optimized algorithims, this robot could achieve impressive performance. I intend to revisit this project when time permits