Class Activation Mapping
data:image/s3,"s3://crabby-images/c28c5/c28c5a740a48afb3094957a61976d03c553ee205" alt="CAM Example"
data:image/s3,"s3://crabby-images/fe0da/fe0da32cb1859f8c58a289e0b8c4f9a6d0bc886a" alt="TorchCAM Example"
By simply upsampling the class activation map to the size of the input image, we can identify the image regions most relevant to the particular category
CAM
We perform global average pooling on the convolutional feature maps and use those as features for a fully-connected layer that produces the desired output (categorical or otherwise). Given this simple connectivity structure, we can identify the importance of the image regions by projecting back the weights of the output layer on to the convolutional feature maps, a technique we call class activation mapping.
data:image/s3,"s3://crabby-images/6cbfe/6cbfe2bc84a869a722b81e7735a64d2b9459f4b5" alt="CAM Structure"
data:image/s3,"s3://crabby-images/c9777/c97770dfdccfc99ba7bd98eb0fb9084f023fc26a" alt="image-20221127195754216"
data:image/s3,"s3://crabby-images/d3124/d31242b79ae694eca5b5f8c0bddf51d47cb8aff4" alt="image-20221127195813671"
data:image/s3,"s3://crabby-images/f24ac/f24ac049eee0d05eec5d170504b4d1bbdbdd8643" alt="Mapping Matrix"
By simply up-sampling the class activation map to the size of the input image, we can identify the image regions most relevant to the particular category
data:image/s3,"s3://crabby-images/6ef9c/6ef9c6ac42b6c08babfe8b51f85a464f1ed4d24f" alt="What top-5 results network is paying attention to?"
data:image/s3,"s3://crabby-images/2435e/2435e6bd83db0af654c2dab8f67630974d711a63" alt="What the network is paying attention to?"
Grad-CAM
Why does interpretability matter? - Transparency
- First, when AI is significantly weaker than humans and not yet reliably deployable (e.g. visual question answering), the goal of transparency and explanations is to identify the failure modes, thereby helping researchers focus their efforts on the most fruitful research directions.
- Second, when AI is on par with humans and reliably deployable (e.g., image classification trained on sufficient data), the goal is to establish appropriate trust and confidence in users.
- Third, when AI is significantly stronger than humans (e.g. chess or Go), the goal of explanations is in machine teaching – i.e., a machine teaching a human about how to make better decisions.
data:image/s3,"s3://crabby-images/be811/be8115a9e98c896f0c6045a93a89adbf16195f6c" alt="Grad-CAM architecture."
3D Conv & Grad-CAM
data:image/s3,"s3://crabby-images/d05af/d05af00610f4301846df0f7982e820872a010345" alt="3D ConvNet"
data:image/s3,"s3://crabby-images/ac99c/ac99c702f39174e2b10fb88504eed2b81fb7b69d" alt="3D ConvNet Formula"
References
Paper
- Learning Deep Features for Discriminative Localization - CAM
- Grad-CAM: Visual Explanations from Deep Networks via Gradient-based Localization - Grad-CAM
- 3D convolutional neural network for machining feature recognition with gradient‑based visual explanations from 3D CAD models - 3D ConvNet