Detectron2 Basics

Detectron2 Basics

Tags
Published
Link
I am jotting down the basics of how to use the model. For full tutorial, visit https://detectron2.readthedocs.io/en/latest/
Detectron2, developed by Facebook AI Research (FAIR), enables rapid implementation and evaluation of novel computer vision research. It provides implementations of these object detection algorithms:
  • Mask R-CNN
  • RetinaNet
  • Faster R-CNN
  • RPN
  • Fast R-CNN
  • TensorMask
  • PointRend
  • DensePose
  • and more...

Running inference with a pre-trained detectron2 model

Once we input and store an image in variable im , we  create a detectron2 config and a detectron2 DefaultPredictor to run inference on this image.
 
cfg = get_cfg() # add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set threshold for this model # Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") predictor = DefaultPredictor(cfg) outputs = predictor(im)
merge_from_file & get_checkpoint_url
get_checkpoint_url retrieves the pre-trained model weights from Detectron2's model zoo, while merge_from_file loads the model architecture and hyperparameter configurations from a YAML file.
In this code:
  • merge_from_file defines how the model should be structured
  • get_checkpoint_url sets the actual trained parameters that go into that structure
Think of merge_from_file as the blueprint and get_checkpoint_url as the building materials.
SCORE_THRESH_TEST refers to:
# Minimum score threshold (assuming scores in a [0, 1] range); a value chosen to# balance obtaining high recall with not having too many low precision # detections that will slow down inference post processing steps (like NMS) # A default threshold of 0.0 increases AP by ~0.2-0.3 but significantly slows down# inference.
I guess here they set to 0.5 to make it faster.
 

Visualizing the output

# look at the outputs. See https://detectron2.readthedocs.io/tutorials/models.html#model-output-format for specification print(outputs["instances"].pred_classes) print(outputs["instances"].pred_boxes)
# We can use `Visualizer` to draw the predictions on the image. v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2) out = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2_imshow(out.get_image()[:, :, ::-1])
I know the ::-1 is for converting from RGB to cv2’s preferred BGR
 

Train on a custom datraset

Prepare the dataset

When using custom datasets in detectron2, we need to
  1. Register your dataset (i.e., tell detectron2 how to obtain your dataset).
  1. Optionally, register metadata for your dataset.
 
Register a Dataset
To let detectron2 know how to obtain a dataset named “my_dataset”, users need to implement a function that returns the items in your dataset and then tell detectron2 about this function:
 
def my_dataset_function(): ... return list[dict] in the following format from detectron2.data import DatasetCatalog DatasetCatalog.register("my_dataset", my_dataset_function) # later, to access the data: data: List[Dict] = DatasetCatalog.get("my_dataset")
 
Basically, this function should give the data in the consistent format requested by detectron2, in the colab notebook, this is how they make a baloon dataset (aim was to train model to detect balloons)
from detectron2.structures import BoxMode def get_balloon_dicts(img_dir): json_file = os.path.join(img_dir, "via_region_data.json") with open(json_file) as f: imgs_anns = json.load(f) dataset_dicts = [] for idx, v in enumerate(imgs_anns.values()): record = {} filename = os.path.join(img_dir, v["filename"]) height, width = cv2.imread(filename).shape[:2] record["file_name"] = filename record["image_id"] = idx record["height"] = height record["width"] = width annos = v["regions"] objs = [] for _, anno in annos.items(): assert not anno["region_attributes"] anno = anno["shape_attributes"] px = anno["all_points_x"] py = anno["all_points_y"] poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)] poly = [p for x in poly for p in x] obj = { "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)], "bbox_mode": BoxMode.XYXY_ABS, "segmentation": [poly], "category_id": 0, } objs.append(obj) record["annotations"] = objs dataset_dicts.append(record) return dataset_dicts for d in ["train", "val"]: DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d)) MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"]) balloon_metadata = MetadataCatalog.get("balloon_train")
 
To test if you did this conversion to a detectron standard dataset, use the Visualizer provided by detectron2 to check, for example:
dataset_dicts = get_balloon_dicts("balloon/train") for d in random.sample(dataset_dicts, 3): img = cv2.imread(d["file_name"]) visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, scale=0.5) out = visualizer.draw_dataset_dict(d) cv2_imshow(out.get_image()[:, :, ::-1])
 

Train the dataset

from detectron2.engine import DefaultTrainer cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("balloon_train",) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 2 cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo cfg.SOLVER.IMS_PER_BATCH = 2 # This is the real "batch size" commonly known to deep learning people cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset cfg.SOLVER.STEPS = [] # do not decay learning rate cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # The "RoIHead batch size". 128 is faster, and good enough for this toy dataset (default: 512) cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets) # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here. os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()
 

Inference using the trained model

# Inference should use the config with parameters that are used in training # cfg now already contains everything we've set previously. We changed it a little bit for inference: cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth") # path to the model we just trained cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 # set a custom testing threshold predictor = DefaultPredictor(cfg)
How to use this predictor?
from detectron2.utils.visualizer import ColorMode dataset_dicts = get_balloon_dicts("balloon/val") for d in random.sample(dataset_dicts, 3): im = cv2.imread(d["file_name"]) outputs = predictor(im) # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format v = Visualizer(im[:, :, ::-1], metadata=balloon_metadata, scale=0.5, instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels. This option is only available for segmentation models ) out = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2_imshow(out.get_image()[:, :, ::-1])