Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade pose detection from PoseNet MobileNetV1 to MoveNet, PoseNet 2.0 ResNet50, or BlazePose #5

Open
ivelin opened this issue Dec 18, 2020 · 10 comments
Assignees
Labels
enhancement New feature or request

Comments

@ivelin
Copy link

ivelin commented Dec 18, 2020

UPDATE: June 11, 2021

Is your feature request related to a problem? Please describe.
Currently we use mobilnetv2 with a 300x300 input tensor (image) by default for object detection.

It struggles with poses of people laying down on the floor. We experimented with rotating images +/-90', which improves overall detection rates, but it still misses poses of fallen people, even when the full body is clearly visible by a human eye.

Clearly the model has not been trained on fallen people poses.

Describe the solution you'd like

  • Google AI introduced MoveNet on May 17, 2021:
    30fps on a mobile phone. Initially for TensorflowJS with a follow up model release coming to TFLite.

  • Google AI released PoseNet 2.0 with a Resnet50 in 2020 base model which has 5-6fps performance on desktop CPU and noticeably better detection rates. Interactive web demo here. However testing shows that even with these improvements, it still misses some poses of people laying down (fallen poses) that are otherwise easy for a human eye to recognize. See an example recorded video below that provides a reference for situations when resnet misses poses.

  • Google AI MediaPipe released a new iteration of BlazePose, which detects 33 (vs 15) keypoints at 25-55fps on desktop CPU (5-10 times faster than PoseNet 2 ResNet50). Testing shows that blazepose does a better job with horizontal people poses, although it still misses some laying positions. See attached video for reference. BlazePose interactive web demo here. Pose detection TFLite model here.

Additional context

  • Other 2D pose detection models

See TensorFlow 2 Detection Model Zoo
https://github.com/tensorflow/models/blob/master/research/object_detection/g3doc/tf2_detection_zoo.md

Notice the high performance and dual purpose (object + keypoints) for CenterNet Resnet50 V1 FPN Keypoints 512x512 and CenterNet Resnet50 V2 Keypoints 512x512.

More on CenterNet and its various applications for object detection, posedetection and object motion tracking.
https://github.com/xingyizhou/CenterNet

  • 3D pose detection
    There are new models being developed for 3D pose estimation, which could further increase fall detection performance in the future.
@ivelin
Copy link
Author

ivelin commented Jan 8, 2021

The various JS models are easy to see in the demo console log:

https://storage.googleapis.com/tfjs-models/demos/posenet/camera.html

Screen Shot 2021-01-08 at 12 30 42 PM

For example a good resnet model for single-pose with balanced parameters for CPU (see screenshot for param details):
https://storage.googleapis.com/tfjs-models/savedmodel/posenet/resnet50/quant2/group1-shard11of12.bin

@ivelin
Copy link
Author

ivelin commented Jan 8, 2021

TF saved model checkpoints for PoseNet 2 are also listed here: https://github.com/tensorflow/tfjs-models/tree/master/posenet/src

@ivelin
Copy link
Author

ivelin commented Jan 8, 2021

My testing shows that the resnet50 model is noticeably more accurate than the mobilnet model, although its 30% slower (6fps vs 10fps). Surprisingly the multi-person performs about as fast as the single-person. Also single person can be confused if there are multiple people in the image.

With these findings, I think it's more important to upgrade to a resnet model (e.g. insput 250, stride 32, quantized int or 2 byte float) and less important whether its multi-pose or single-pose.

@ivelin
Copy link
Author

ivelin commented Jan 8, 2021

@bhavikapanara ^^^

@ivelin
Copy link
Author

ivelin commented Jan 12, 2021

@bhavikapanara thoughts on this one?

I've done more testing between the current mobilnetv1 model and single-person resnet50 with the parameters in the previous comment (250x250, stride 32, 2 byte float quantization).

I find the resnet50 model to be with slightly slower inference time but with a lot better performance in several important areas:

  • Detects correctly pose key points in more practical situations, such as:
    • When the person is facing away from the camera (and face keypoints are not there to be detected)
    • When there are obstacles in the way (chairs, tables) which prevent the model from seeing parts of the body.
    • Low ambient lighting conditions. Resnet50 seems to be able to detect correctly key points in a dark photo image even when it's very hard to see the person with a naked human eye.
  • Has fewer false positives. Restnet50 doesn't get tricked as easily as Mobilnetv1 by paintings, pets and other objects that have some semblance of a human body.

I would like to know what your own experiments show.

If you are able to verify my findings on your data sets, I think upgrading to a resnet model should be the next priority on the roadmap for improving fall detection.

@ivelin ivelin changed the title Upgrade object detection to a more recent DNN Upgrade pose detection from MobilnetV1 to ResNet50 Jan 12, 2021
@ivelin ivelin changed the title Upgrade pose detection from MobilnetV1 to ResNet50 Upgrade pose detection from MobileNetV1 to ResNet50 Jan 12, 2021
@ivelin ivelin transferred this issue from ambianic/ambianic-edge Feb 1, 2021
@ivelin ivelin changed the title Upgrade pose detection from MobileNetV1 to ResNet50 Upgrade pose detection from MobileNetV1 to ResNet50 or BlazeNet Feb 11, 2021
@ivelin
Copy link
Author

ivelin commented Feb 11, 2021

PoseNet 2.0 ResNet 50 testing video

https://youtu.be/6Dz12WtpWuM

PoseNet.Resnet50.Screen.Recording.2021-02-10.at.5.53.02.PM.mov

@ivelin
Copy link
Author

ivelin commented Feb 11, 2021

BlazePose testing video

https://youtu.be/mpqsm1aXUVc

@ivelin ivelin added the enhancement New feature or request label Feb 11, 2021
@ivelin ivelin changed the title Upgrade pose detection from MobileNetV1 to ResNet50 or BlazeNet Upgrade pose detection from PoseNet MobileNetV1 to PoseNet 2.0 ResNet50 or BlazeNet Feb 11, 2021
@ivelin
Copy link
Author

ivelin commented Feb 11, 2021

BlazePose model card: https://drive.google.com/file/d/1zhYyUXhQrb_Gp0lKUFv1ADT3OCxGEQHS/view?usp=drivesdk

TFlite models for pose detection (phase 1) and key point estimation (phase 2).

https://google.github.io/mediapipe/solutions/models.html#pose

An interesting detail worth investigating deeper is the fact the BalzePose estimates body vector as part of the first phase - Pose Detection. That’s before it runs the second phase for key point estimation.

Since for fall detection we are mainly interested in the spinal vector, this could mean an even faster performing inference.

See this text from the blog:

http://ai.googleblog.com/2020/08/on-device-real-time-body-pose-tracking.html

“for the human pose tracking we explicitly predict two additional virtual keypoints that firmly describe the human body center, rotation and scale as a circle. Inspired by Leonardo’s Vitruvian man, we predict the midpoint of a person's hips, the radius of a circle circumscribing the whole person, and the incline angle of the line connecting the shoulder and hip midpoints. This results in consistent tracking even for very complicated cases, like specific yoga asanas. “

@ivelin
Copy link
Author

ivelin commented Feb 11, 2021

More test data with the mobilnetv1 model that shows situations where it is not able to detect a human pose on the ground even though it's easy for a human eye to see it.

@ivelin ivelin changed the title Upgrade pose detection from PoseNet MobileNetV1 to PoseNet 2.0 ResNet50 or BlazeNet Upgrade pose detection from PoseNet MobileNetV1 to PoseNet 2.0 ResNet50 or BlazePose Feb 11, 2021
@ivelin
Copy link
Author

ivelin commented Feb 27, 2021

@bhavikapanara Google AI MediaPipe just released a [3D update to BlazePose](One step closer to 3D pose detection: https://google.github.io/mediapipe/solutions/pose) with a Z axis value for depth. This can be helpful for cases when a person falls along the Z axis and the X,Y change vector angle remains small , but it is not telling us the whole story.

@ivelin ivelin changed the title Upgrade pose detection from PoseNet MobileNetV1 to PoseNet 2.0 ResNet50 or BlazePose Upgrade pose detection from PoseNet MobileNetV1 to PoseNet 2.0 ResNet50, BlazePose or MoveNet May 18, 2021
@ivelin ivelin changed the title Upgrade pose detection from PoseNet MobileNetV1 to PoseNet 2.0 ResNet50, BlazePose or MoveNet Upgrade pose detection from PoseNet MobileNetV1 to MobileNet v2, PoseNet 2.0 ResNet50, or BlazePose Jun 17, 2021
@ivelin ivelin changed the title Upgrade pose detection from PoseNet MobileNetV1 to MobileNet v2, PoseNet 2.0 ResNet50, or BlazePose Upgrade pose detection from PoseNet MobileNetV1 to MoveNet, PoseNet 2.0 ResNet50, or BlazePose Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants