本文用于记录在炼丹时踩过的坑,以及解决方案。
之前在AutoDL训的时候很多都忘记录了…只记得基本每训一次之前得改5 6个bug
所以现在从padddleX开始记录。
TypeError: Argument ‘bb’ has incorrect type (expected numpy.ndarray, got list)
问题描述
在使用自己打标好的COCO
数据集训练时,出现了如下错误:
---------------------------------------------------------------------------TypeError Traceback (most recent call last)/tmp/ipykernel_128/738473502.py in <module>
15 warmup_start_lr=0.0,
16 save_dir='output/mask_rcnn_r50_fpn',
---> 17 use_vdl=True)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/detector.py in train(self, num_epochs, train_dataset, train_batch_size, eval_dataset, optimizer, save_interval_epochs, log_interval_steps, save_dir, pretrain_weights, learning_rate, warmup_steps, warmup_start_lr, lr_decay_epochs, lr_decay_gamma, metric, use_ema, early_stop, early_stop_patience, use_vdl, resume_checkpoint)
289 early_stop=early_stop,
290 early_stop_patience=early_stop_patience,
--> 291 use_vdl=use_vdl)
292
293 def quant_aware_train(self,
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/base.py in train_loop(self, num_epochs, train_dataset, train_batch_size, eval_dataset, save_interval_epochs, log_interval_steps, save_dir, ema, early_stop, early_stop_patience, use_vdl)
331 outputs = self.run(ddp_net, data, mode='train')
332 else:
--> 333 outputs = self.run(self.net, data, mode='train')
334 loss = outputs['loss']
335 loss.backward()
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/cv/models/detector.py in run(self, net, inputs, mode)
102
103 def run(self, net, inputs, mode):
--> 104 net_out = net(inputs)
105 if mode in ['train', 'eval']:
106 outputs = net_out
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in __call__(self, *inputs, **kwargs)
900 self._built = True
901
--> 902 outputs = self.forward(*inputs, **kwargs)
903
904 for forward_post_hook in self._forward_post_hooks.values():
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/architectures/meta_arch.py in forward(self, inputs)
24
25 if self.training:
---> 26 out = self.get_loss()
27 else:
28 out = self.get_pred()
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/architectures/mask_rcnn.py in get_loss(self)
121
122 def get_loss(self, ):
--> 123 bbox_loss, mask_loss, rpn_loss = self._forward()
124 loss = {}
125 loss.update(rpn_loss)
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/architectures/mask_rcnn.py in _forward(self)
98 # Mask Head needs bbox_feat in Mask RCNN
99 mask_loss = self.mask_head(body_feats, rois, rois_num, self.inputs,
--> 100 bbox_targets, bbox_feat)
101 return rpn_loss, bbox_loss, mask_loss
102 else:
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/dygraph/layers.py in __call__(self, *inputs, **kwargs)
900 self._built = True
901
--> 902 outputs = self.forward(*inputs, **kwargs)
903
904 for forward_post_hook in self._forward_post_hooks.values():
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/heads/mask_head.py in forward(self, body_feats, rois, rois_num, inputs, targets, bbox_feat, feat_func)
244 if self.training:
245 return self.forward_train(body_feats, rois, rois_num, inputs,
--> 246 targets, bbox_feat)
247 else:
248 im_scale = inputs['scale_factor']
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/heads/mask_head.py in forward_train(self, body_feats, rois, rois_num, inputs, targets, bbox_feat)
182 tgt_labels, _, tgt_gt_inds = targets
183 rois, rois_num, tgt_classes, tgt_masks, mask_index, tgt_weights = self.mask_assigner(
--> 184 rois, tgt_labels, tgt_gt_inds, inputs)
185
186 if self.share_bbox_feat:
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/proposal_generator/target_layer.py in __call__(self, rois, tgt_labels, tgt_gt_inds, inputs)
256
257 outs = generate_mask_target(gt_segms, rois, tgt_labels, tgt_gt_inds,
--> 258 self.num_classes, self.mask_resolution)
259
260 # mask_rois, mask_rois_num, tgt_classes, tgt_masks, mask_index, tgt_weights
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/proposal_generator/target.py in generate_mask_target(gt_segms, rois, labels_int32, sampled_gt_inds, num_classes, resolution)
351 results.append(
352 rasterize_polygons_within_box(new_segm[j], boxes[j],
--> 353 resolution))
354 else:
355 results.append(
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/proposal_generator/target.py in rasterize_polygons_within_box(poly, box, resolution)
306
307 # 3. Rasterize the polygons with coco api
--> 308 mask = polygons_to_mask(polygons, resolution, resolution)
309 mask = paddle.to_tensor(mask, dtype='int32')
310 return mask
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddlex/ppdet/modeling/proposal_generator/target.py in polygons_to_mask(polygons, height, width)
282 import pycocotools.mask as mask_util
283 assert len(polygons) > 0, "COCOAPI does not support empty polygons"
--> 284 rles = mask_util.frPyObjects(polygons, height, width)
285 rle = mask_util.merge(rles)
286 return mask_util.decode(rle).astype(np.bool)
pycocotools/_mask.pyx in pycocotools._mask.frPyObjects()
TypeError: Argument 'bb' has incorrect type (expected numpy.ndarray, got list)
解决方案
这是因为json文件里面的segmentation中的数据不符合要求,正常来说这里面是类似于[x,y,x,y,x,y…..x,y]按顺序排列的点序列,并且这里面的点序列个数是偶数,同时点的个数至少要超过2个(4个最稳),也就是要构面。
而我在打标时用的是矩形打标,只记录了对角两个点。因此这里面的数据是[x1,y1,x2,y2],这里面的点序列个数是4个,而且是不符合要求的,因此需要将这里面的数据转换成符合要求的数据。
写一个小脚本将[x1,y1,x2,y2]转换成[x1,y1,x1,y2,x2,y2,x2,y1],这样就符合要求了。
import json
with open('instances_val2017.json') as f:
data = json.load(f)
ann = data['annotations']
for i in ann:
if len(i['segmentation'][0]) == 4:
x1, y1, x2, y2 = i['segmentation'][0]
i['segmentation'][0] = [x1, y1, x2, y1, x2, y2, x1, y2]
with open('instances_val2017.json', 'w') as f:
json.dump(data, f, indent=4)
SystemError: (Fatal) Blocking queue is killed because the data reader raises an exception.
解决方案
参考这个ISSUE
AssertionError: Results do not correspond to current coco set
assert set(annsImgIds)== (set(annsImgIds)& set(self.getImgIds())),\
'Results do not correspond to current coco set'
解决方案
COCO数据集不规范,检查下合成时有没有空的json,只有id没有坐标。
真·解决方案
最后发现是把eval_dataset.add_negative_samples(image_dir='o_Natural_empty_light')
注释掉就行。。。我还重新生成了几次coco数据集。我真服了,paddle魔改完的库能不能多测试下啊