1) A transistor does not literally create zeros and ones, those are just symbols for low power versus high power.
A basic transistor has three contactors, with power flowing through the first making power flow from the second to the third.
Two of them and a few other components allows the construction of a flip flop, which is a small circuit capable of storing a state (on or off) pretty much indefinitely, as long as it has power.
Combine a few billion of them and you'll have CPUs and RAM, capable of doing calculations and storing information.
2) A simple analogy is how you don't need to move each individial finger to grab something off a table. You just think "I'm going to grab that pencil", and the brain does all the tedious little steps, like "tighten muscle #47 by 4%".
That's pretty much what the compiler does when it turns a single C++ command into 35 machine code operations. Everything can be broken down further and further, until it's just lots of turning 0s to 1s and back.
I'll keep it short, too, because I doesn't really matter. In order to learn how to drive a car, you don't need to know the exact details of how and why its engine does what it does.
You can also go watch Minecraft videos on youtube where people build graphical calculators using redstone and pistons to get an idea.
3) This question is much too vague. What you see on the screen is stored in the computer's RAM; it's a digital image exactly like one you get out of your phone's camera, only that it is sent to the monitor 60 times per second, and updated just as fast when something changes. How typing text in Word exactly translates to changing the pixel information that is sent to the monitor might sound interesting at first, really isn't though.
4) This question is ridiculous, assuming that you actually "learned a lot of programming languages".
C++ is closer to machine code than, say, Python, so you might want to look at that for code that's a bit closer to English.
The thing is that programming isn't like talking at all. Talking is imprecise and arbitrary, while programming has to be exact down to the letter. Which is probably why people tend to find it difficult.