If the image stands alone such that sighted users must parse it, then your logic is correct. But if the image is merely decorative due to adjacent text such that sighted users gloss over the image, aria-hidden affords users of screen readers that exact same efficiency instead of wasting their time.
In other words, eyes can skip over decorations without the developer needing to flag them as such, but audio can't auto-skip.
All text being skipped unless the author tags it for inclusion doesn't seem like a failsafe default, assuming that excessive information is generally preferred over insufficient information.
In other words, eyes can skip over decorations without the developer needing to flag them as such, but audio can't auto-skip.