ASL Finger Spelling Dataset
We propose two datasets for American Sign Language (ASL) finger spelling recognition.
The datasets contain a set of RGB and depth images for each letter in the alphabet, organized by subject, for estimating generalization. You can see a demo of our fingerspelling system
here.
Samples from the dataset A
Dataset A: 5 users (easy)
The first dataset comprises 24 static signs (excluding letters
j and
z because they involve motion). This was captured in 5 different sessions, with similar lighting and background. This is the dataset that was used for [1].
download
Dataset B: 9 users (hard)
The second dataset (depth only) is captured from 9 different persons in two very different environments and lighting.
download
Performance baseline
Performance baseline according to the method discussed in [1].
The evaluation methodology used here was, for both datasets, to train the forest on all but one of the subjects, and test on the last one. This is to show generalization to unseed users, arguably the most relevant performance criterion. The results are then averaged over all the subjects.
Generalization |
Dataset | depth | intensity | combined |
A | 0.49 | 0.35 | 0.47 |
B | 0.41 | - | - |
FAQ
Q: How do I decode the depth image?
A: The depth images are saved as single channel short unsigned int images - this is the format provided by the kinect.
You can read this using OpenCV using the function
cvLoadImage(filename.c_str(), CV_LOAD_IMAGE_UNCHANGED)
that will give you a pointer to iplImage of format
IPL_DEPTH_16U.
How to do the same thing with OpenCV 2.x C++ or python APIs is left as an exercise :)
References
[1] Pugeault, N., and Bowden, R. (2011).
Spelling It Out: Real-Time ASL Fingerspelling Recognition
In Proceedings of the 1st IEEE Workshop on Consumer Depth Cameras for Computer Vision, jointly with ICCV'2011.
(pdf)