It's very cheap on any shader-capable card. The per-pixel sin is slightly expensive (although the shader compiler will typically optimize it to a single sincos instead of two sins) but that balances against a lower vertex submission cost due to no subdivision (and not to mention the ability to keep everything in a static VBO if you want to go that far).
Doing shader-based versions of both water and sky means that every single surf can go into a single static VBO, in fact, as no per-vertex data needs to be updated on the CPU at all (you'd use a cubemap for skyboxes in this setup - Doom 3 has some code for converting Q1/Q2 skybox layouts to alignments suitable for cubemaps and I've independently written the same conversion myself a few years ago so I'm reasonably certain that it can't be rocket science). Get some draw call batching in and you're suddenly running up to 5 times or more as fast in big, heavy scenes. That does require a higher GL_VERSION than Fitz aims for, as well as some massive surgery to the current codebase. Probably not reasonable to expect in any kind of short/medium timeframe, if ever.
The fullscreen warp has some complexities in how you handle the edges. You can use GL_CLAMP_TO_EDGE but it won't handle the bottom edge which juts against the sbar. There are solutions (I used a "control texture" that fades off the warp effect at the edges, another option is to just destroy and recreate the texture at the appropriate size if it needs to change, which generally doesn't happen at any performance-critical time).
Generally glCopyTexSubImage is going to be slower than using an FBO for this because it needs to shift more data around. Overall you can expect to lose maybe one third of your performance owing to increased fillrate even in the best case.
Another way to do it that doesn't need shaders (and will even work on GL1.1 hardware) is to use a grid of quads (much the same as what Fitz does for r_oldwater 0). That can give acceptable enough quality.
The third way is just to rotate the projection matrix a little based on time, which gives a reasonable enough effect at no performance cost. It's like Fitz's current stretch'n'squeeze effect but with some rotation added in too. You need to extract your frustum from the combined MVP rather than calculating it separately if you do this though. Easy enough (and a little more robust than calculating it separate).
Despite all this seeming complexity and multiple options it's probably cleaner and easier to integrate with any engine as it doesn't need to touch any code outside of it's own subsystem. So - detect if we're underwater (via r_viewleaf->contents) at the start of the scene, and either (a) just set a flag and/or (b) switch the render target. At the end of the scene, either capture the scene to a texture or put the old render target back, then draw using the appropriate method depending on what hardware capabilities are available.
For an engine like Fitz I'd recommend asm shaders rather than full GLSL. The main reason for that is that they're available on a much wider range of hardware, so they seem more in keeping with the Fitz philosophy.