Gesture Recognition

The goal of this assignment is to design and implement a program that recognizes at least three different gestures of a person in front of a web camera.

We have chosen to implement three interactive “games”. The first one is a 3D world visualization tool, using your relative position to orient the map. The second is a Balloon PoP game in which the player can pop balloons by hiting them. And the third one is a Rock Paper Scissor game in which the player competes against the coputer for rock-paper-scissor supremacy

3D World View

This part of the software waits until it can see someone, then it diplays a 3d view of earth (from google earth). As you move your head aroud, the program rotates the image to give you the illusion of a 3d view. It also handles two levels of depth as you move back and forth.

The idea for this came from one of Johnny Chung Lee’s Wiimote projects. I figured, it would be cool if we could do the same thing, but without all the extra gadgets.

On the most basic level, this application runs a face detection algorithm. If it sees a face, it starts tracking it (by running the same algorithm on every frame). It then finds the position of the face relative to the camera and depending on this position it displays one of 98 images

How it works:

  1. Face detection and tracking work using open cv’s cvHaarDetectObjects. I started with some example code, and worked my way from there. This algorithm uses a precompiled XML file that included information about people’s facial characteristics. This returns an object with a position. I use this position to calculate where the person is relative to the camera
  2. The image rotations work in a very rudamentary fashion. I took 98 screenshot (I think) of the world from different angles from Google Earth. So there are 7 shots per row and column, and 2 different depth levels. The porgram then picks a specific image according to the coordinates of the location of the user’s face
  3. The GUI was made taking advantage of open cv’s High GUI functionality. Again, I started with some sample code, and worked my way from there.

Problems:

It is very difficult to calculate any kind of statistical data for this program. There are some bugs, especially with the face detection algorithm, that make the program feel glitchy at times. However taking into account that I had two weeks to develop this, and when I began I didn’t know how to even open a video feed, I am satisfied with the results. The face recognition algorithm in this program works best when the face is directly facing the camera. If you tilt your face to the sides, it has lots of problems. The algorithm also varies from person to person, so if it doesn’t work well for you, I sincerely apologize. Lastly, the program is a little sluggish at the beggining when it has to load the images into memory, performance gets better after that

Performance:

This program runs on around 15M of RAM, it does, though, take a significant share of the CPU resources

Further Development:

I would like to make this an add-on for Google Earth API, or Bing 3d maps. But this will come in the future, when I get some time off school

Additional Comments:

This program can also recognize yes-no gestures. It does this by keeping a history of the x and y position of the face and calculating the average direction of motion

If you have any questions or comments, there is a contact form at the bottom of the page. You can also email me at luiscarrascob@gmail.com. All questions or comments are very welcome

Lastly, in the downloads section you can find the source code, as well as a working executable inside the Debug folder of the .zip file. If you download the source, and need any help or explanation, please let me know!

Download Source Here

Old Page

One Comment

  1. Posted August 10, 2010 at 4:01 am | Permalink

    Nice post and this post helped me alot in my college assignement. Thank you for your information.

Post a Comment

You must be logged in to post a comment.