# Map gaze onto body parts using DensePose
Act 3, Scene 1: "To be or not to be?" proclaims Prince Hamlet while holding a skull in his hand. But, where is the audience looking? At the hand, at the arm, or the face?
# Introduction
Have you ever wondered which body parts we gaze upon while conversing with others? Where a professional basketball player looks just before passing? Does hand movement play a role when delivering a speech?
Here we introduce a powerful and time-saving automation that can help answer these questions. It works by identifying body parts of people in the scene video and recording whether they were gazed at!
This sounds really complicated...
Don't worry! We've made this powerful tool easy-to-use, so you can get started right away – no coding required! By the end, you'll have a video like the one above, and a CSV file documenting which body parts were gazed.
# Powered by DensePose
This tool employs DensePose (detectron2, Meta AI). Click here to learn more
DensePose is a method for dense human pose estimation and dense human body part segmentation. It's based on the Mask R-CNN architecture and is trained on the COCO dataset. DensePose is now a part of Detectron2's framework. You can read all the details in their paper DensePose: Dense Human Pose Estimation In The Wild.
# What you'll need
- A Raw data export from your project in Pupil Cloud. The downloaded folder contains one subfolder for every recording in the project. Each subfolder contains a scene video, gaze.csv, and other files. We will work with those subfolders, so have them at hand
- A Google account (optional). We set up a Google Colab notebook so you can execute all of this from your browser. This way, Google will lend you some extra computing power and required dependencies. FAQ about Google Colab, including what it is, how it's free, and some limitations. (If you do fancy running things locally, check out this section.)
# Running in your browser
Firstly, from the raw data export, upload (uncompressed) one of the subfolders(recording folder) that you're interested in to Google Drive.
Now for the Google Colab part: click here to access the notebook, and carefully follow the instructions there.
The notebook contains lots of rows and this can seem overwhelming. It needn't be. It's actually really simple. Just click on each of the play buttons...
# Outputs
After the code is executed, new files will be generated. Check the new DensePoseColab folder. Let's see what's in those files!
# Video
You've already seen the accompanying video above. There's no mystery – you have a bounding box delimiting each person detected, a blue shaded mask over the body parts, a yellow highlighted body part when it's gazed at, and the typical red circle for the gaze position.
# Gaze map image
You'll also find an image reporting the body segments and the amount of frames in which they were gazed, like below.
# CSV files
Two files are stored as well, 1) a simple parts_count.csv
showing the number of times each body part is gazed, and 2) a densepose.csv
following a structure similar to the gaze.csv, but also with a new column indicating gazed body parts.
# Running locally
Learn about running this tool locally
Feeling brave? This is how to run locally on your computer.
Note: this option is only for Linux and Mac users as detectron2 doesn't supports Windows 😕
If you don't have a GPU on your computer, we strongly recommend avoiding running locally.
# Requirements
- Hardware: Linux or MacOS + we recommend that you have a Cuda GPU
- Python 3.7 or higher
- Dependencies
You only need to install two packages as we have put almost all the dependencies within a single package. Just run the following command:
python -m pip install torch 'git+https://github.com/pupil-labs/densepose-module.git'
Then, run the following command in your command prompt to get the DensePose output:
pl-densepose --no-vis --device "cuda"
You can also check out the arguments below.
# Optional Arguments
Learn about additional command line arguments
We can't build a shoe that fits everyone, so we also allow you to pass arguments to the code:
- Device
The device on which to run the DensePose model. You can choose between cpu
and cuda
. The default is --device "cpu"
. But this can be a bit slow, so we recommend using cuda
if you have a GPU with CUDA support.
Even running on cuda
can be slow, just be aware we estimate inference time to be around 2.5 FPS on a Nvidia RTX 3060.
- Visualize
Use the flag --vis
to enable live visualization of the output. By default, the visualisation is turned off to save resources, but even with this off, you'll still get the final video output.
- Input and output paths
Specify the input and output paths using --input_path
and --output_path
. If none are given, a UI will open to select the input and output paths. The input path shall be the subdirectory of the raw download, containing the video, world, and gaze data. The output path shall be the directory where the output files shall be saved.
- Confidence threshold
The default confidence is 0.7. You can change this value to tune the confidence threshold by using --confidence
followed by a number between 0 and 1.
- Start and end
If you want to run it only on one specific section of your recording, you can pass the start and end event annotations to be used, like this: --start "recording.begin" --end "recording.end"
.
# Behind the scenes
Model details & FAQ
- Model weights:
densepose_rcnn_R_50_FPN_DL_s1x
- Why is there no gaze recorded on the back of the head, hands or feet?
There is no definition for those parts in DensePose. Likewise the frontal view of the arms in the picture refers to the inside of the arms, not the front.