One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages.
One of the reasons I always suggest "Use Unicode!" is that there are security problems converting between code pages. In short if data is going to be converted between code pages after some sort of security validation is done, then that validation could be negated. This is true of lots of data transformations, but it seems to surprise people a lot when applied to code page transformations.
There are lots of reasons for this, but some are:
A related problem is the IDN and code page parsing that browsers sometimes do. & named and numeric entities in HTML can end up with a different appearance. % escaping is common in URLs, and IDN xn-- encoding happens in domain names. An application may decode these, even at unexpected times, and cause problems if the data was assumed to be in a different state before the decoding.
So the moral is: Do any security tests after any conversions have been done. If you have to retransmit the data, try to use an encoding like Unicode that has fewer edge case behaviors that could trip you up. If possible, revalidate the data after the transmission if it has to be decoded.