Current location: Home> Ai News

Improve image recognition accuracy: Finer-CAM allows AI to understand images more accurately

Author: LoRA Time: 10 Mar 2025 737

Artificial intelligence is so rolling up in the field of image recognition. Classification of cats and dogs has long been out. Now the popular version is the "Lianlian" Plus, such as recognizing which model of sports car this year, or whether the bird's eyebrows are as thick as those of Lao Wang next door.

But the problem is that neural network is "smart" and is smart, but when it makes it clear, "Why should I say this is this?", it is a bit like a poor student being asked about the problem-solving ideas, and hesitating for a long time can't come up with it. The traditional Class Activation Map (CAM) is like putting a glowing aperture on the neural network's head, telling you "Well, it mainly depends on this part", but what exactly does it look at?

Why look at this place? When it encounters nuances of the "twins" level, it is stunned and points to a bunch of similar places and says, "Probably...it's here... maybe..."

QQ_1741575725565.png

Finer-CAM debuts: Let AI bid farewell to "face blindness"

At critical moments, there will always be heroes! The research tycoons at Ohio State University can't stand it anymore. They have created a magical tool - Finer-CAM . This thing is simply equipped with a high-definition night vision goggles + microscope for neural networks! Its core trick is ** "What do you look at? You look at differently! "

Traditional CAM is a single-handed battle, staring at the target; while Finer-CAM is a team PK, which will pull out the target categories and those that look like "Old Wang next door" and let them "face to battle" .

QQ_1741575703928.png

By calculating the differences between their prediction results, Finer-CAM can accurately identify those "rebellious" and unique characteristics and severely suppress those "popular faces". It feels like playing "Everyone is here to find fault". In the past, I used to point to a few places and say "I think it's here." Now with Finer-CAM, it can tell you: "Wrong! The real difference is this hair!"

"Fire Eyes": More detailed, understand you better, and more reliable

When the Finer-CAM is released, it has a halo, and the functions are so bright that it makes people think of "Wow":

  • The gospel of detail control : Finer-CAM can accurately lock in key features of "the devil in the details" , such as unique patterns on bird feathers, unique lines at a certain angle of the car, and even minor changes that cannot be discovered without careful looking at the wings of the aircraft. In the past, neural networks might just tell you "This is a bird", but now they use Finer-CAM, which can point to the bird's toes and say "No! This is a red-legged stilt!"
  • It comes with "noise reduction" function : the previous CAM method, but the picture is often blurred, and the messy background lights up. Finer-CAM is like a beauty filter that can effectively remove irrelevant background interference , making the explanation results more neat and clear, and you can see the key points at a glance.
  • Speak with strength : Don’t look at the name of “Finer” (more refined), its strength is not “fine” at all. On various hard-core indicators, such as the decline in relative confidence and positioning accuracy , Finer-CAM presses old CAM methods (such as Grad-CAM, Layer-CAM, Score-CAM) to the ground to rub. Whether you are using "high, rich and handsome" DINOv2 or "slum boy" CLIP as the backbone of the neural network, Finer-CAM can make you shine.
  • "Cross-Border" expert : What's even more amazing is that Finer-CAM can also play multimodal zero-sample learning . Simply put, it can not only recognize objects when looking at pictures, but also understand text descriptions, and then accurately find the corresponding things in the picture . It's like you say "that red convertible sports car" to a foreigner. Not only can he find a sports car, but he can also tell you exactly which red convertible is!

Such a fun and practical thing, of course you have to experience it together! The Imageomics team is quite powerful and directly released the source code of Finer-CAM and the Colab demonstration . You just need to move your fingers, install a gadget called grad-cam , and then run the generate_cam.py script they provide to generate "find faults" results, and then use visualize.py to see the effect.

The emergence of Finer-CAM is like installing a more advanced image analysis system to neural networks, allowing them to see clearly and clearly when facing nuances.

In the future, let AI recognize things that "look exactly the same", and it can finally say confidently: "Hmph! I have long seen the difference between you two!" This technology not only improves the accuracy of image interpretation, but also gives us a deeper understanding of the decision-making process of AI.

Project: https://github.com/Imageomics/Finer-CAM

demo:https://colab.research.google.com/drive/1plLrL7vszVD5r71RGX3YOEXEBmITkT90