Question:
C++ window application problem?
2014-01-23 19:34:55 UTC
I have a problem in C++ window application, first of all here is the sample code:

int WINAPI WinMain(HINSTANCE hIns, HINSTANCE hPrev, LPSTR lpCmdLine, int nCmdShow)
{
char* msg;
msg = "Hello World!";
MessageBox(NULL, LPCWSTR(msg), L"Caption", MB_OK);
return 0;
}

The messagebox pops up something written in chinese.
Can someone help me?
How can i get the exact clean string in messagebox from this "msg" variable, without changing it's type from char*
Three answers:
Ratchetr
2014-01-23 20:25:09 UTC
Welcome to the wonderful world of Character Sets.



C and C++ are fairly old programming languages (C is downright Ancient).



Back in the day when C was being developed, you could represent any character you would ever need in 7 bits. All the characters in the Latin alphabet, and all the common symbols like + and # and @ could all be encoded in 7 bits. And, on most systems, a char was 8 bits, meaning you had twice as many possible values. Life was good.



Then something bad happened. People in Japan and Israel and other places around the world that don't use the Latin alphabet wanted to use computers too. But there was no way to encode all the funny squiggly characters they wanted to see in 8 bits, let alone 7.



There were (still are) a number of workarounds for this problem. One solution is to use 8 bits for the common characters, but have an escape mechanism for the other characters that uses 16 (or more) bits for the oddball characters. That gets kinda complicated...some characters are 8 bits, some are 16. Yuck!!!



Another solution is Unicode. In Unicode (almost) every character everyone on the planet could ever need can be represented. But to do that, you need to say that *every* character needs 16 bits, not just 7 or 8.



Modern programming languages like Java or C# solve this by simply stating up front that characters are 16 bits, and they are Unicode.



But Microsoft, and the C/C++ community in general had a problem. Windows was originally written in C. And in C a char was 8 bits. But Microsoft wanted to sell its OS in Japan. And Isreal. And a bunch of other countries. So they adopted Unicode, but in a rather strange way. They created 2 versions of every API call that takes a string ( a char *). One version works with Unicode. The other version works with plain old 8 bit (ASCII) chars. And they do this with whacky macros.



And they added a compiler setting that determines if you want to use the 8 bit ASCII API, or the Unicode API.



You are mixing both worlds here. For your code to compile, I'm thinking you MUST have the 'Use Unicode Character Set' option set. (It is under general options for the project).



But then you mindlessly, without knowing why, tried to coerce msg (which is a simple 8 bit C style string) into a Unicode string with LPCWSTR(msg). You probably had to do that to make it compile.



But when you did that, you LIED (not on purpose, of course). You said that the 13 bytes that msg points to are UNICODE (aka Wide...that is what the W stands for in LPCWSTR). So the first unicode character would be... "He". The next is "ll". Fail.... "He" in unicode is not He....it is some foreign character (possibly chinese, not sure).



It's a mess, it really is. If you want to use C++ on Windows, you will need to learn what a wide char is. What a wchar is. What a char_t and wchar_t is. What the L prefix before a string literal means. And a bunch of other details. One of the many reasons I really dislike C and C++, despite the fact that they were once my favorite programming languages.
husoski
2014-01-23 20:36:52 UTC
You probably already know this, but you can compiler Windows code in either ASCII or Unicode mode, depending on whether or not the UNICODE macro was defined when is included. Your code won't compile in ASCII Mode, though, so you must have -DUNICODE on the command line or #define UNICODE in your source.



You have ASCII and Unicode modes mixed here, passing a narrow character string pointer (msg) cast to be a wide string pointer (LPCWSTR). That's where the Chinese comes from. If you compiling in Unicode, make that:



const wchar_t *msg = L"Hello World";



---------



You can also use the TCHAR type and the TEXT() macro to make string constants that will work in either mode:



const TCHAR *msg = TEXT("Hello World!");

MessageBox(NULL, msg, TEXT("Caption"), MB_OK);



This is the style recommended in Petzold's classic "Programming Windows" books, at least in the editions that cover Win32 programming for NT and 2000. (Sadly, this hasn't been updated for modern OSes, probably because it was a Microsoft Press book, and MS is emphasizing .NET for application programming since about the same time as XP was released. It's still quite good, and pertinent.)
Daniel
2014-01-23 19:37:24 UTC
check out this website

http://msdn.microsoft.com/en-us/library/windows/desktop/ms645505(v=vs.85).aspx


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Continue reading on narkive:
Loading...