But as a computer it would need some sort of artificial intelligence. As humans we might be able to recognize that a file is a text file with some umlauts in a "wrong" encoding. It just sees some bytes and tries to guess what the encoding might be. The file command has no idea of "valid" or "invalid". Umlaut-utf16.txt: Little-endian UTF-16 Unicode text, with no line terminators Umlaut-mixed.txt: application/octet-stream charset=binary $ iconv -f utf8 -t utf16 umlaut-utf8.txt > umlaut-utf16.txtĬheck the hex dump: $ hexdump -C umlaut-iso88591.txtĬreate something "invalid" by mixing all three: $ cat umlaut-iso88591.txt umlaut-utf8.txt umlaut-utf16.txt > umlaut-mixed.txt But convince yourself: $ hexdump -C umlaut-utf8.txtĬonvert to the other encodings: $ iconv -f utf8 -t iso88591 umlaut-utf8.txt > umlaut-iso88591.txt Here is how I created the files: $ echo ä > umlaut-utf8.txt Umlaut-utf8.txt: text/plain charset=utf-8 Umlaut-utf16.txt: text/plain charset=utf-16le Use the -i parameter to force file to print information about the encoding. ![]() The file command makes "best-guesses" about the encoding.
0 Comments
Leave a Reply. |