Noita uses herringbone wang tiles where each wang tile pixel is 16x16 simulated in-game CA pixels, with tiles selected from a per-biome pool and the ability for biome-specific scripts to override certain areas too. As part of expanding each wang tile pixel into 16x16 pixels, some noise is applied to terrain to add the curvy look, with another layer of (thresholded perlin?) noise that controls which bits of inner terrain get variations (gold veins etc).
Source: working on a Noita-like so have spent a bit of time looking at prior art. Noita wiki.gg will explain a lot of it though (warning: many spoilers).
Would it be better to use something like streamers do with a camera that tracks the real you, but replaces it. Seems like it would handle all the natural movements without extras