Detect and extract images surrounded by a frame

fmw42 2020-02-01 14:18

This can be done in ImageMagick (6) using -connected-components.

Here I convert to HSV colorspace and extract the Saturation channel. White and black have no saturation, but the pink and blue do. I then threshold so that the pink and blue become white on a black background. I then use morphology erode to remove the effects of your border. Then I use connected components to fill in any holes in the white regions and then get their bounding boxes and store in an array. I then loop over each bounding box and crop the original image.

See https://imagemagick.org/script/connected-components.php

Input:

Unix Syntax:

bboxArr=(`convert wikipedia.png \
-colorspace HSV -channel 1 -separate +channel \
-threshold 0 -type bilevel \
-morphology erode square:3 \
-define connected-components:verbose=true \
-define connected-components:mean-color=true \
-define connected-components:area-threshold=1000 \
-connected-components 4 null: | grep "gray(255)" | awk '{print $2}'`)

num=${#bboxArr[*]}

for ((i=0; i<num; i++)); do
convert wikipedia.png -crop ${bboxArr[$i]} +repage wikipedia_$i.png
done

Results:

If using ImageMagick 7, then change convert to magick.

Windows syntax will need to remove the \ before ( and ). And also change the end of line \ to ^. The grep and awk are Unix tools. So you may need to install such for Windows or find other ways to do that.

zono 2020-02-01 16:57:42

It worked perfectly. But please give me some more time. As you mentioned, I've confirmed that it did not work if it was white backend-color. The color is also possible in my requirement (yes I did not mention in my question. Sorry..). I'm now trying to find solutions.

fmw42 2020-02-02 01:51:43

Do you need all the text paragraphs? If so, then a slightly different approach is needed. First, make all the text black on a white background. Then blur the text or us morphology open to connect the text in each paragraph. The threshold. Then use connected components to find the text region bounding boxes. Then use the bounding boxes to crop the input.

zono 2020-02-02 15:18:56

Hi. No, I don't need the text paragraphs. I need to extract output1.png and output2.png. I added the detail in my question. (Update 2)

Related issues

batch crop quad images with diffenrent sizes to a circle

convert and crop image in tiles with python

Wand/ImageMagick compare method always returns same float number

How to add image magick to my laravel under docker project?

ImageMagic is annotating the character codes rather than the characters in annotateimage

ImageMagick: Bold and Italic Fonts?

Getting error about ImageMagick With Python/MoviePy when I try add text clip

Save screenshot as variable in R Selenium

How to determine the number of transparent pixels in an animated GIF frame?

Convert image to pdf with Imagemagick keeping image resolution and placing it on top left corner