You don't really want a GUI toolkit, but a Graphics engine. To name a few: OpenGL, DirectX, SDL, Allegro. There are some higher level ones like OGRE and Irrlecht, which are powerful enough to be considered complete game engines. Graphics engines interface closely with the hardware to produce efficient code where possible. They also have plenty of already-optimised functions for handling graphics which save you time in writing yourself.
You'll need some GUI knowledge to tie in with the graphics engine, but it's usually just passing a widget's ID to the graphics engine to integrate them. Noting Jordon's reply. I'd agree about the cross-platform GUI toolkits. wxWidgets, QT and GTK+ are the main ones to point out. But there are some smaller known ones that might suit you too, for example Crazy Eddie's GUI.
Using a Rapid Application Development style for your GUI is preferred. An IDE or separate application will allow you to drag and drop the widgets onto a form and generate the code for you (In the same way Visual Studio will do for C#/VB.) An example of such, which I always recommend is Code::Blocks (for many reasons). But it has built in support for wxWidgets RAD, and code templates for several project types (OGRE too).
I would suggest you try out an engine in C++ first, but consider moving to C# if it is difficult. A month of learning C++ isn't really sufficient to know the language, so don't be dissuaded from learning C# for the sake of such a small time. Most of the ideas you've learned from C++ already will be usable in C# with some slight variation.
I've been using it for 12 years and it still catches me out from time to time.
XNA is an nice option, but similar alternatives do exist in C++. (ie, OGRE). If you take a look at the XNA tutorial videos you'll see how powerful it can be in just a few hundred lines of code. I personally have a few dislikes about it though - it kinda forces a design strategy on you, and does things I prefer to do differently. A good note about using .Net is, you aren't forced to use a single language. You can use XNA with the F# language for instance, which is an ideal language for the math formulae you'll stick in a game (among many other thing).
On Jordon's note about the approximate 5% performance difference. I've no idea where he pulled such a figure, but it's complete crap. You can't really measure an approximate performance difference, but only an average difference over lots of code. You might, for example, have a frequently iterating loop that is slow performing, which would cripple the overall average performance of your app.
The main performance issue in JIT-compiled code is in register allocation. A CPU typically has only a few registers, and all other variables need to be held in memory. Memory access is significantly slower than register access, so knowing when to move data in and out of registers is non-trivial. A normal C++ compiler will optimize this during compile time (slows down compilation time, but greatly improves code performance), but for JIT-compiled applications, it needs to be compiled instantly, and such optimizations cannot be made, thus slower code.
I've no idea where 5% comes from, but it seems wrong. The last I read, some JIT register allocating optimizations were made which could narrow down average performance to around 10% slower than native code. This could become significant if the slower code is something you iterate through frequently.
Java performs better than CIL because the JRE has several runtime-optimizations that will improve code performance dynamically during code execution, but this kind of support is much weaker in the CLR.
On the plus side - Some of the more performance intensive functions will be those inside the Graphics API and other libraries that are written in lower-level code, and imported into .Net. There is no aformentioned performance loss in these, because they are DLLs which were compiled by an optimizing compiler.
An argument some C++ elitist might have is about garbage collection. Ignore these fallacies, because you can manage memory just like you do in C++ by creating your own destructors (by inheriting from the IDisposable interface) which can be invoked manually (or automatically with the using statement). You can also manually control the garbage collector, which can sometimes work to your favor, because you can chose to perform collection when few other events are occurring, and not waste time freeing your memory during busy events. Admittedly though, it's less predictable than manually controlling allocation for everything.