I am jotting down the basics of how to use the model. For full tutorial, visit https://detectron2.readthedocs.io/en/latest/
Detectron2, developed by Facebook AI Research (FAIR), enables rapid implementation and evaluation of novel computer vision research. It provides implementations of these object detection algorithms:
- Mask R-CNN
- RetinaNet
- Faster R-CNN
- RPN
- Fast R-CNN
- TensorMask
- PointRend
- DensePose
- and more...
Running inference with a pre-trained detectron2 model
Once we input and store an image in variable
im
, we create a detectron2 config and a detectron2 DefaultPredictor
to run inference on this image.cfg = get_cfg() # add project-specific config (e.g., TensorMask) here if you're not running a model in detectron2's core library cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.5 # set threshold for this model # Find a model from detectron2's model zoo. You can use the https://dl.fbaipublicfiles... url as well cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") predictor = DefaultPredictor(cfg) outputs = predictor(im)
merge_from_file & get_checkpoint_url
get_checkpoint_url retrieves the pre-trained model weights from Detectron2's model zoo, while merge_from_file loads the model architecture and hyperparameter configurations from a YAML file.
In this code:
- merge_from_file defines how the model should be structured
- get_checkpoint_url sets the actual trained parameters that go into that structure
Think of merge_from_file as the blueprint and get_checkpoint_url as the building materials.
SCORE_THRESH_TEST
refers to:# Minimum score threshold (assuming scores in a [0, 1] range); a value chosen to# balance obtaining high recall with not having too many low precision # detections that will slow down inference post processing steps (like NMS) # A default threshold of 0.0 increases AP by ~0.2-0.3 but significantly slows down# inference.
I guess here they set to 0.5 to make it faster.
Visualizing the output
# look at the outputs. See https://detectron2.readthedocs.io/tutorials/models.html#model-output-format for specification print(outputs["instances"].pred_classes) print(outputs["instances"].pred_boxes)
# We can use `Visualizer` to draw the predictions on the image. v = Visualizer(im[:, :, ::-1], MetadataCatalog.get(cfg.DATASETS.TRAIN[0]), scale=1.2) out = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2_imshow(out.get_image()[:, :, ::-1])
I know the
::-1
is for converting from RGB to cv2’s preferred BGRTrain on a custom datraset
Prepare the dataset
When using custom datasets in detectron2, we need to
- Register your dataset (i.e., tell detectron2 how to obtain your dataset).
- Optionally, register metadata for your dataset.
Register a Dataset
To let detectron2 know how to obtain a dataset named “my_dataset”, users need to implement a function that returns the items in your dataset and then tell detectron2 about this function:
def my_dataset_function(): ... return list[dict] in the following format from detectron2.data import DatasetCatalog DatasetCatalog.register("my_dataset", my_dataset_function) # later, to access the data: data: List[Dict] = DatasetCatalog.get("my_dataset")
Basically, this function should give the data in the consistent format requested by detectron2, in the colab notebook, this is how they make a baloon dataset (aim was to train model to detect balloons)
from detectron2.structures import BoxMode def get_balloon_dicts(img_dir): json_file = os.path.join(img_dir, "via_region_data.json") with open(json_file) as f: imgs_anns = json.load(f) dataset_dicts = [] for idx, v in enumerate(imgs_anns.values()): record = {} filename = os.path.join(img_dir, v["filename"]) height, width = cv2.imread(filename).shape[:2] record["file_name"] = filename record["image_id"] = idx record["height"] = height record["width"] = width annos = v["regions"] objs = [] for _, anno in annos.items(): assert not anno["region_attributes"] anno = anno["shape_attributes"] px = anno["all_points_x"] py = anno["all_points_y"] poly = [(x + 0.5, y + 0.5) for x, y in zip(px, py)] poly = [p for x in poly for p in x] obj = { "bbox": [np.min(px), np.min(py), np.max(px), np.max(py)], "bbox_mode": BoxMode.XYXY_ABS, "segmentation": [poly], "category_id": 0, } objs.append(obj) record["annotations"] = objs dataset_dicts.append(record) return dataset_dicts for d in ["train", "val"]: DatasetCatalog.register("balloon_" + d, lambda d=d: get_balloon_dicts("balloon/" + d)) MetadataCatalog.get("balloon_" + d).set(thing_classes=["balloon"]) balloon_metadata = MetadataCatalog.get("balloon_train")
To test if you did this conversion to a detectron standard dataset, use the Visualizer provided by detectron2 to check, for example:
dataset_dicts = get_balloon_dicts("balloon/train") for d in random.sample(dataset_dicts, 3): img = cv2.imread(d["file_name"]) visualizer = Visualizer(img[:, :, ::-1], metadata=balloon_metadata, scale=0.5) out = visualizer.draw_dataset_dict(d) cv2_imshow(out.get_image()[:, :, ::-1])
Train the dataset
from detectron2.engine import DefaultTrainer cfg = get_cfg() cfg.merge_from_file(model_zoo.get_config_file("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml")) cfg.DATASETS.TRAIN = ("balloon_train",) cfg.DATASETS.TEST = () cfg.DATALOADER.NUM_WORKERS = 2 cfg.MODEL.WEIGHTS = model_zoo.get_checkpoint_url("COCO-InstanceSegmentation/mask_rcnn_R_50_FPN_3x.yaml") # Let training initialize from model zoo cfg.SOLVER.IMS_PER_BATCH = 2 # This is the real "batch size" commonly known to deep learning people cfg.SOLVER.BASE_LR = 0.00025 # pick a good LR cfg.SOLVER.MAX_ITER = 300 # 300 iterations seems good enough for this toy dataset; you will need to train longer for a practical dataset cfg.SOLVER.STEPS = [] # do not decay learning rate cfg.MODEL.ROI_HEADS.BATCH_SIZE_PER_IMAGE = 128 # The "RoIHead batch size". 128 is faster, and good enough for this toy dataset (default: 512) cfg.MODEL.ROI_HEADS.NUM_CLASSES = 1 # only has one class (ballon). (see https://detectron2.readthedocs.io/tutorials/datasets.html#update-the-config-for-new-datasets) # NOTE: this config means the number of classes, but a few popular unofficial tutorials incorrect uses num_classes+1 here. os.makedirs(cfg.OUTPUT_DIR, exist_ok=True) trainer = DefaultTrainer(cfg) trainer.resume_or_load(resume=False) trainer.train()
Inference using the trained model
# Inference should use the config with parameters that are used in training # cfg now already contains everything we've set previously. We changed it a little bit for inference: cfg.MODEL.WEIGHTS = os.path.join(cfg.OUTPUT_DIR, "model_final.pth") # path to the model we just trained cfg.MODEL.ROI_HEADS.SCORE_THRESH_TEST = 0.7 # set a custom testing threshold predictor = DefaultPredictor(cfg)
How to use this predictor?
from detectron2.utils.visualizer import ColorMode dataset_dicts = get_balloon_dicts("balloon/val") for d in random.sample(dataset_dicts, 3): im = cv2.imread(d["file_name"]) outputs = predictor(im) # format is documented at https://detectron2.readthedocs.io/tutorials/models.html#model-output-format v = Visualizer(im[:, :, ::-1], metadata=balloon_metadata, scale=0.5, instance_mode=ColorMode.IMAGE_BW # remove the colors of unsegmented pixels. This option is only available for segmentation models ) out = v.draw_instance_predictions(outputs["instances"].to("cpu")) cv2_imshow(out.get_image()[:, :, ::-1])