mindpose.models

class mindpose.models.EvalNet(net, decoder, output_raw=True)[source]

Bases: Cell

Create network for forward propagate and decoding only.

Parameters:
  • net (Net) – Network used for foward and backward propagate

  • decoder (Decoder) – Decoder

  • output_raw (bool) – Return extra net’s ouput. Default: True

Inputs:
inputs: List of tensors
Outputs
result: Decoded result
raw_result (optional): Raw result if output_raw is true
class mindpose.models.Net(backbone, head, neck=None)[source]

Bases: Cell

Create network for foward and backward propagate.

Parameters:
  • backbone (Backbone) – Model backbone

  • head (Head) – Model head

  • neck (Optional[Neck]) – Model neck. Default: None

Inputs:
x: Tensor
Outputs:
result: Tensor
class mindpose.models.NetWithLoss(net, loss, has_extra_inputs=False)[source]

Bases: Cell

Create network with loss.

Parameters:
  • net (Net) – Network used for foward and backward propagate

  • loss (Loss) – Loss cell

  • has_extra_inputs (bool) – Has Extra inputs in the loss calculation. Default: False

Inputs:
data: Tensor feed into network
label: Tensor of label
extra_inputs: List of extra tensors used in loss calculation
Outputs:
loss: Loss value
mindpose.models.create_backbone(name, pretrained=False, ckpt_url='', in_channels=3, **kwargs)[source]

Create model backbone.

Parameters:
  • name (str) – Name of the backbone

  • pretrained (bool) – Whether the backbone is pretrained. Default: False

  • ckpt_url (str) – Url of the pretrain check point. Default: None

  • in_channels (int) – Number of channels in the input data. Default: 3

  • **kwargs (Any) – Arguments which feed into the backbone

Return type:

Backbone

Returns:

Model backbone

mindpose.models.create_decoder(name, **kwargs)[source]

Create model decoder.

Parameters:
  • name (str) – Name of the decoder

  • **kwargs (Any) – Arguments which feed into the decoder

Return type:

Decoder

Returns:

Model decoder

mindpose.models.create_eval_network(net, decoder, output_raw=True)[source]

Create network for inferencing or evaluation.

Parameters:
  • net (Net) – Network used for foward and backward propagate

  • decoder (Decoder) – Decoder

  • output_raw (bool) – Return extra net’s ouput. Default: True

Return type:

EvalNet

Returns:

Network for inferencing or evaluation

mindpose.models.create_head(name, in_channels, num_joints=17, **kwargs)[source]

Create model head.

Parameters:
  • name (str) – Name of the head

  • in_channels – Number of channels in the input tensor

  • num_joints (int) – Number of joints. Default: 17

  • **kwargs (Any) – Arguments which feed into the head

Return type:

Head

Returns:

Model head

mindpose.models.create_loss(name, **kwargs)[source]

Create model loss.

Parameters:
  • name (str) – Name of the loss

  • **kwargs (Any) – Arguments which feed into the loss

Return type:

Loss

Returns:

Loss

mindpose.models.create_neck(name, in_channels, out_channels, **kwargs)[source]

Create model neck.

Parameters:
  • name (str) – Name of the neck

  • in_channels – Number of channels in the input tensor

  • out_channels – Number of channels in the output tensor

  • **kwargs (Any) – Arguments which feed into the neck

Return type:

Neck

Returns:

Model neck

mindpose.models.create_network(backbone_name, head_name, neck_name='', backbone_pretrained=False, backbone_ckpt_url='', in_channels=3, neck_out_channels=256, num_joints=17, backbone_args=None, neck_args=None, head_args=None)[source]

Create network for training.

Parameters:
  • backbone_name (str) – Backbone name

  • head_name (str) – Head name

  • neck_name (str) – Neck name. Default: “”

  • backbone_pretrained (bool) – Whether backbone is pretrained. Default: False

  • backbone_ckpt_url (str) – Url of backbone’s pretrained checkpoint. Default: “”

  • in_channels (int) – Number of channels in the input data. Default: 3

  • neck_out_channels (int) – Number of output channels in the neck. Default: 256

  • num_joints (int) – Number of joints in the output. Default: 17

  • backbone_args (Optional[Dict[str, Any]]) – Arguments for backbone. Defauult: None

  • neck_args (Optional[Dict[str, Any]]) – Arguments for neck. Default: None

  • head_args (Optional[Dict[str, Any]]) – Arguments for head: Default: None

Return type:

Net

Returns:

Network for training

mindpose.models.create_network_with_loss(net, loss, has_extra_inputs=False)[source]

Create network with loss for training.

Parameters:
  • net (Net) – Network used for foward and backward propagate

  • loss (Loss) – Loss cell

  • has_extra_inputs (bool) – Has Extra inputs in the loss calculation. Default: False

Return type:

NetWithLoss

Returns:

Network with loss for training

mindpose.models.backbones

class mindpose.models.backbones.Backbone(auto_prefix=True, flags=None)[source]

Bases: Cell

Abstract class for all backbones.

Note

Child class must implement foward_feature and out_channels method.

forward_feature(x)[source]

Perform the feature extraction.

Parameters:

x (Tensor) – Tensor

Return type:

Tensor

Returns:

Extracted feature

property out_channels: Union[List[int], int]

Get number of output channels.

Returns:

Output channels.

class mindpose.models.backbones.HRNet(stage_cfg, in_channels=3)[source]

Bases: Backbone

HRNet Backbone, based on “Deep High-Resolution Representation Learning for Human Pose Estimation”.

Parameters:
  • stage_cfg (Dict[str, Dict[str, int]]) – Configuration of the extra blocks. It accepts a dictionay storing the detail config of each block. which include num_modules, num_branches, block, num_blocks, num_channels and multiscale_output. For detail example, please check the implementation of hrnet_w32 and hrnet_w48

  • in_channels (int) – Number the channels of the input. Default: 3

Inputs:
x: Input Tensor
Outputs:
feature: Feature Tensor
forward_feature(x)[source]

Perform the feature extraction.

Parameters:

x (Tensor) – Tensor

Return type:

Tensor

Returns:

Extracted feature

property out_channels: int

Get number of output channels.

Returns:

Output channels.

class mindpose.models.backbones.ResNet(block, layers, in_channels=3, groups=1, base_width=64, norm=None)[source]

Bases: Backbone

ResNet model class, based on “Deep Residual Learning for Image Recognition”.

Parameters:
  • block (Type[Union[BasicBlock, Bottleneck]]) – Block of resnet

  • layers (List[int]) – Number of layers of each stage

  • in_channels (int) – Number the channels of the input. Default: 3

  • groups (int) – Number of groups for group conv in blocks. Default: 1

  • base_width (int) – Base width of pre group hidden channel in blocks. Default: 64

  • norm (Optional[Cell]) – Normalization layer in blocks. Default: None

Inputs:
x: Input Tensor
Outputs:
feature: Feature Tensor
forward_feature(x)[source]

Perform the feature extraction.

Parameters:

x (Tensor) – Tensor

Return type:

Tensor

Returns:

Extracted feature

property out_channels: int

Get number of output channels.

Returns:

Output channels.

mindpose.models.backbones.hrnet_w32(pretrained=False, ckpt_url='', in_channels=3)[source]

Get HRNet with width=32 model.

Parameters:
  • pretrained (bool) – Whether the model is pretrained. Default: False

  • ckpt_url (str) – Url of the pretrained weight. Default: “”

  • in_channels (int) – Number of input channels. Default: 3

Return type:

HRNet

Returns:

HRNet model

mindpose.models.backbones.hrnet_w48(pretrained=False, ckpt_url='', in_channels=3)[source]

Get HRNet with width=48 model.

Parameters:
  • pretrained (bool) – Whether the model is pretrained. Default: False

  • ckpt_url (str) – Url of the pretrained weight. Default: “”

  • in_channels (int) – Number of input channels. Default: 3

Return type:

HRNet

Returns:

HRNet model

mindpose.models.backbones.resnet101(pretrained=False, ckpt_url='', in_channels=3, **kwargs)[source]

Get 101 layers ResNet model.

Parameters:
  • pretrained (bool) – Whether the model is pretrained. Default: False

  • ckpt_url (str) – Url of the pretrained weight. Default: “”

  • in_channels (int) – Number of input channels. Default: 3

  • kwargs – Arguments which feed into Resnet class

Return type:

ResNet

Returns:

Resnet model

mindpose.models.backbones.resnet152(pretrained=False, ckpt_url='', in_channels=3, **kwargs)[source]

Get 152 layers ResNet model.

Parameters:
  • pretrained (bool) – Whether the model is pretrained. Default: False

  • ckpt_url (str) – Url of the pretrained weight. Default: “”

  • in_channels (int) – Number of input channels. Default: 3

  • kwargs – Arguments which feed into Resnet class

Return type:

ResNet

Returns:

Resnet model

mindpose.models.backbones.resnet50(pretrained=False, ckpt_url='', in_channels=3, **kwargs)[source]

Get 50 layers ResNet model.

Parameters:
  • pretrained (bool) – Whether the model is pretrained. Default: False

  • ckpt_url (str) – Url of the pretrained weight. Default: “”

  • in_channels (int) – Number of input channels. Default: 3

  • kwargs – Arguments which feed into Resnet class

Return type:

ResNet

Returns:

Resnet model

mindpose.models.necks

class mindpose.models.necks.Neck(auto_prefix=True, flags=None)[source]

Bases: Cell

Abstract class for all necks. Child class must implement construct and out_channels method.

property out_channels: Union[List[int], int]

Get number of output channels.

Returns:

Output channels.

mindpose.models.heads

class mindpose.models.heads.HRNetHead(in_channels=32, num_joints=17, final_conv_kernel_size=1)[source]

Bases: Head

HRNet Head, based on “Deep High-Resolution Representation Learning for Human Pose Estimation”. It is a 1x1 convoultion layer using the feature ouptput.

Parameters:
  • in_channels (int) – Number the channels of the input. Default: 32.

  • num_joints (int) – Number of joints in the final output. Default: 17

  • final_conv_kernel_size (int) – The kernel size in the final convolution layer. Default: 1

Inputs:
x: Input Tensor
Outputs:
result: Result Tensor
class mindpose.models.heads.Head(auto_prefix=True, flags=None)[source]

Bases: Cell

Abstract class for all heads.

class mindpose.models.heads.HigherHRNetHead(in_channels=32, num_joints=17, with_ae_loss=[True, False], tag_per_joint=True, final_conv_kernel_size=1, num_deconv_layers=1, num_deconv_filters=[32], num_deconv_kernels=[4], cat_outputs=[True], num_basic_blocks=4)[source]

Bases: Head

HigherHRNet Head, based on “HigherHRNet: Scale-Aware Representation Learning for Bottom-Up Human Pose Estimation”.

Parameters:
  • in_channels (int) – Number the channels of the input. Default: 32.

  • num_joints (int) – Number of joints in the final output. Default: 17

  • with_ae_loss (List[bool]) – Output the associated embedding for each resolution. Default: [True, False]

  • tag_per_joint (bool) – Whether each of the joint has its own coordinate encoding. Default: True

  • final_conv_kernel_size (int) – The kernel size in the final convolution layer. Default: 1

  • num_deconv_layers (int) – Number of deconvolution layers. Default: 1

  • num_deconv_filters (List[int]) – Number of filters in each deconvolution layer. Default: [32]

  • num_deconv_kernels (List[int]) – Kernel size in each deconvolution layer. Default: [4]

  • cat_outputs (List[bool]) – Whether to concate the feature before deconvolution layer at each resoluton. Default: [True]

  • num_basic_blocks (int) – Number of basic blocks after deconvolution. Default: 4

Inputs:
x: Input Tensor
Outputs:
result: Tuples of Tensor at different resolution
class mindpose.models.heads.SimpleBaselineHead(num_deconv_layers=3, num_deconv_filters=[256, 256, 256], num_deconv_kernels=[4, 4, 4], in_channels=2048, num_joints=17, final_conv_kernel_size=1)[source]

Bases: Head

SimpleBaseline Head, based on “Simple Baselines for Human Pose Estimation and Tracking”. It contains few number of deconvolution layers following by a 1x1 convolution layer.

Parameters:
  • num_deconv_layers (int) – Number of deconvolution layers. Default: 3

  • num_deconv_filters (List[int]) – Number of filters in each deconvolution layer. Default: [256, 256, 256]

  • num_deconv_kernels (List[int]) – Kernel size in each deconvolution layer. Default: [4, 4, 4]

  • in_channels (int) – number the channels of the input. Default: 2048.

  • num_joints (int) – Number of joints in the final output. Default: 17

  • final_conv_kernel_size (int) – The kernel size in the final convolution layer. Default: 1

Inputs:
x: Input Tensor
Outputs:
result: Result Tensor

mindpose.models.decoders

class mindpose.models.decoders.BottomUpHeatMapAEDecoder(num_joints=17, num_stages=2, with_ae_loss=[True, False], use_nms=False, nms_kernel=5, max_num=30, tag_per_joint=True, shift_coordinate=False)[source]

Bases: Decoder

Decode the heatmaps with associativa embedding into coordinates

Parameters:
  • num_joints (int) – Number of joints. Default: 17

  • num_stages (int) – Number of resolution in the heatmap outputs. If it is larger than one, then heatmap aggregation is performed. Default: 2

  • with_ae_loss (List[bool]) – Output the associated embedding for each resolution. Default: [True, False]

  • use_nms (bool) – Apply NMS for the heatmap output. Default: False

  • nms_kernel (int) – NMS kerrnel size. Default: 5

  • max_num (int) – Maximum number (K) of instances in the image. Default: 30

  • tag_per_joint (bool) – Whether each of the joint has its own coordinate encoding. Default: True

  • shift_coordinate (bool) – Perform a +-0.25 pixel coordinate shift based on heatmap value. Default: False

Inputs:
model_output: Model output. It is a list of Tensors with the length equal to the num_stages.
mask: Heatmap mask of the valid region.
Outputs:
val_k, tag_k, ind_k: Tuples contains the maximum value of the heatmap for each joint with the corresponding tag value and location.
class mindpose.models.decoders.Decoder(auto_prefix=True, flags=None)[source]

Bases: Cell

Abstract class for all decoders.

class mindpose.models.decoders.TopDownHeatMapDecoder(pixel_std=200.0, to_original=True, shift_coordinate=False, use_udp=False, dark_udp_refine=False, kernel_size=11)[source]

Bases: Decoder

Decode the heatmaps into coordinates with bounding boxes.

Parameters:
  • pixel_std (float) – The scaling factor using in decoding. Default: 200.

  • to_original (bool) – Convert the coordinate into the raw image. Default: True

  • shift_coordinate (bool) – Perform a +-0.25 pixel coordinate shift based on heatmap value. Default: False

  • use_udp (bool) – Use Unbiased Data Processing (UDP) decoding. Default: False

  • dark_udp_refine (bool) – Use post-refinement based on DARK / UDP. It cannot be use with shift_coordinate in the same time. Default: False

  • kernel_size (int) – Gaussian kernel size for UDP post-refinement, it should match the heatmap gaussian simg in training. K=17 for sigma=3 and K=11 for sigma=2. Default: 11

Inputs:
heatmap: The ordinary output based on heatmap-based model, in shape [N, C, H, W]
center: Center of the bounding box (x, y) in raw image, in shape [N, C, 2]
scale: Scale of the bounding box with respect to the raw image, in shape [N, C, 2]
score: Score of the bounding box, in shape [N, C, 1]
Outputs:
coordinate: The coordindate of C joints, in shape [N, C, 3(x_coord, y_coord, score)]
boxes: The coor bounding boxes, in shape [N, 6(center_x, center_y, scale_x, scale_y, area, bounding_box_score)]

mindpose.models.loss

class mindpose.models.loss.AELoss(tag_per_joint=True, reduction='mean')[source]

Bases: Loss

Associative embedding loss. Or called Grouping loss. Based on “End-to-End Learning for Joint Detection and Grouping”.

Parameters:
  • tag_per_joint (bool) – Whether each of the joint has its own coordinate encoding. Default: True

  • reduction (Optional[str]) – Type of the reduction to be applied to loss. The optional value are “mean”, “sum” and “none”. Default: “mean”

Inputs:
pred: Predicted tags. In shape [N, K, H, W] if tag_per_joint is True; in shape [N, H, W] otherwise. Where K stands for the number of joints.
target: Ground truth of tag mask. In shape [N, M, K, 2] if tag_per_joint is True; in shape [N, M, 2] otherwise. Where M stands for number of instances.
Outputs:
loss: Loss tensor contains the push loss and the pull loss.
class mindpose.models.loss.AEMultiLoss(num_joints=17, num_stages=2, stage_sizes=[(128, 128), (256, 256)], mse_loss_factor=[1.0, 1.0], ae_loss_factor=[0.001, 0.001], with_mse_loss=[True, True], with_ae_loss=[True, False], tag_per_joint=True)[source]

Bases: Loss

Combined loss of MSE and AE for multi levels of resolutions

Parameters:
  • num_joints (int) – Number of joints. Default: 17

  • num_stages (int) – Number of resolution levels. Default: 2

  • stage_sizes (List[Tuple[int, int]]) – The sizes in each stage. Default: [(128, 128), (256, 256)]

  • mse_loss_factor (List[float]) – Weighting for MSE loss at each level. Default: [1.0, 1.0]

  • ae_loss_factor (List[float]) – Weighting for Associative embedding loss at each level. Default: [0.001, 0.001]

  • with_mse_loss (List[bool]) – Whether to calculate MSE loss at each level. Default: [True, False]

  • with_ae_loss (List[bool]) – Whether to calculate AE loss at each level. Default: [True, False]

  • tag_per_joint (bool) – Whether each of the joint has its own coordinate encoding. Default: True

Inputs:
pred: List of prediction result at each resolution level. In shape [N, aK, H, W]. Where K stands for the number of joints. a=2 if the correspoinding with_ae_loss is True
target: Ground truth of heatmap. In shape [N, S, K, H, W]. Where S stands for the number of resolution levels.
mask: Ground truth of the heatmap mask. In shape [N, S, H, W].
tag_ind: Ground truth of tag position. In shape [N, S, M, K, 2]. Where M stands for number of instances.
Outputs:
loss: Single Loss value
class mindpose.models.loss.JointsMSELoss(use_target_weight=False, reduction='mean')[source]

Bases: Loss

Joint Mean square error loss. It is the MSE loss of heatmaps with extra weight for different channel.

Parameters:
  • use_target_weight (bool) – Use extra weight in loss calculation. Default: False

  • reduction (Optional[str]) – Type of the reduction to be applied to loss. The optional value are “mean”, “sum” and “none”. Default: “mean”

Inputs:
pred: Predictions, in shape [N, K, H, W]
target: Ground truth, in shape [N, K, H, W]
target_weight: Loss weight, in shape [N, K]
Outputs:
loss: Loss value
class mindpose.models.loss.JointsMSELossWithMask(reduction='mean')[source]

Bases: Loss

Joint Mean square error loss with mask. Mask-out position will not contribute to the loss.

Parameters:

reduction (Optional[str]) – Type of the reduction to be applied to loss. The optional value are “mean”, “sum” and “none”. Default: “mean”

Inputs:
pred: Predictions, in shape [N, K, H, W]
target: Ground truth, in shape [N, K, H, W]
mask: Ground truth Mask, in shape [N, H, W]
Outputs:
loss: Loss value
class mindpose.models.loss.Loss(reduction='mean')[source]

Bases: LossBase

Abstract class for all losses.