MSR Action 3D


MSR-Action3D dataset is an action dataset of depth sequences captured by a depth camera. This dataset contains twenty actions: high arm wave, horizontal arm wave, hammer, hand catch, forward punch, high throw, draw x, draw tick, draw circle, hand clap, two hand wave, side-boxing, bend, forward kick, side kick, jogging, tennis swing, tennis serve, golf swing, pick up & throw. It is created by Wanqing Li during his time at Microsoft Research Redmond.

The dataset can be found in

MSR DailyActivity 3D Dataset


DailyActivity3D dataset is a daily activity dataset captured by a Kinect device. There are 16 activity types: drink, eat, read book, call cellphone,write on a paper, use laptop, use vacuum cleaner, cheer up, sit still, toss paper, play game, lay down on sofa, walk, play guitar, stand up, sit down. If possible, each subject performs an activity in two different poses: “sitting on sofa” and “standing”. The total number of the activity sequences is “320”.

This data was created by me during my time at Microsoft Research Redmond.

The dataset can be found in

I have created a cropped version of this dataset, which only contains the cropped human regions, it can be found here cropped version of MSRDailyAction Dataset.

Subtle Walking From CMU Mocap Dataset

This is a subject of the subtle waking activities in CMU Mocap Dataset. This dataset is collected in the paper “L. Han, X. Wu, W. Liang, G. Hou, and Y. Jia, 'Discriminative human action recognition in the learned hierarchical manifold space’, Image and Vision Computing, vol. 28, no. 5, pp. 836-849, May 2010.”. I replicate the collection and organization process of this paper, and create a dataset accordingly.

The dataset can be found in MoCap Walking Dataset.

MSR Gesture 3D Dataset


The dataset was captured by a Kinect device by Alex Kurakin. There are 12 dynamic American Sign Language (ASL) gestures, and 10 people. Each person performs each gesture 2-3 times. There are 336 files in total, each corresponding to a depth sequence. The hand portion (above the wrist) has been segmented. The file name has the format sub_depth_m_n where m is the person index. n ranges from 1 to 36. Note that for some (m,n), the file sub_depth_m_n does not exist. For example, there is no “sub_depth_02_03”. The reason is that some of the bad sequences are excluded from the dataset. I cropped the hand with Kinect skeleton tracker and subsample all the data to the fixed size.

The dataset can be founded in

Northwestern-UCLA Multiview Action 3D Dataset


The Multiview 3D event dataset is capture by me and Xiaohan Nie in UCLA. it contains RGB, depth and human skeleton data captured simultaneously by three Kinect cameras. This dataset include 10 action categories: pick up with one hand, pick up with two hands, drop trash, walk around, sit down, stand up, donning, doffing, throw, carry. Each action is performed by 10 actors. This dataset contains data taken from a variety of viewpoints.

The dataset can be found in part-1, part-2 part-3, part-4, part-5, part-6, part-7, part-8, part-9, part-10, part-11, part-12, part-13, part-14, part-15, part-16,

We also created a version of the dataset that only contains RGB videos: RGB videos only.

Image Similarity Triplet Dataset


This dataset was created by the engineers in image search team at Google. I used this dataset in my DeepRanking paper. It characterizes fine-grained image similarity with a large number of image triplets. In that paper, I only used this dataset as evaluation dataset.