Question:
How can I view hex source code?
2014-03-05 07:17:45 UTC
Hi I'm a novice programmer. I know a little html/CSS/JavaScript, but I've been trying to learn some C (C++ to be exact). So when I look at source code of some programs on my computer all I see is a bunch of:êO€¥dkmč9@@@@68&5-[£=]. A friend said that happens when your looking at binary or hex. So I have two questions:
1. Why hex, why not base ten or binary?
2. How can I view this source code. Has it been decompiled to hex? If so how can I turn it into a higher level language that I can understand?
Four answers:
Jeroonk
2014-03-05 08:41:26 UTC
That's not the "source code", but "machine code". The stuff you would write in C or C++ is what we would call "source code". But the computer doesn't need human-readable (to programmers at least) code.



To a computer, all instructions are just numbers. Each operation it must perform is another number. Each variable is a number, each address in memory is a number. Even text is just a series of numbers. When you compile a C or C++ program, the compiler will translate your code into the corresponding instructions (called "machine code"). All these numbers are then stored in one big binary file: the executable program.



When you open that file up in a text editor, it tries to interpret those numbers as if they were ordinary text. That is why you see garbage, because it is no ordinary text. Usually, binary files are inspected using hex editors, because hexadecimal is a nice compromise between readability and size:



- In binary each byte would be 8 bits (1's and 0's), and that takes up too much space on the screen.

- In decimal each byte would be a number between 0 and 255, which doesn't really have an obvious relation to the underlying binary value.

- In hexadecimal each byte would be a number between 00 and FF, with the lower digit corresponding to the lower four bits, and the upper digit to the upper four bits. So it doesn't take up much screen space, but the underlying binary is still easy to see (you just have to memorize the 4 bits corresponding to each hexadecimal digit).



Still, opening the file in hex wouldn't give you any more information. You're still looking at a series of numbers only the computer can understand, but now they're in hex.



You could try to decompile it in order to recover some C or C++ source code, but it won't look like anything you've written. During the compilation process, all variable names, function names, comments and a lot of the structure of the program is lost. The decompiler can't magically recover this, so you'll end up with a very unintelligible version of the source code, unless you know what you're looking for.
Sadsongs
2014-03-05 07:48:25 UTC
C source code is complied and linked with appropriate libraries to produce a binary runfile - I suspect you're looking at that binary. Even if you use a decompiler (the one Paultech links to above is good) , it won't help you learn C.



You'd be better starting with the basics - a site like http://www.learn-c.org/ should help.
husoski
2014-03-05 09:36:31 UTC
It's been pointed out that you are looking at machine code, not source code. Source code is meant for humans to read and write. In a compiled language like C or C++, the source program is converted once to a binary form (machine code) that the computer can directly execute.



"Hex" (hexadecimal) and octal are notations used primarily as a shorthand notation for binary numbers. Each digit represents a group of bits (4 for hex, 3 for octal). The digital electronic circuitry inside the computer is all binary, though.



"Why binary?" is because of that electronic circuitry. The simplest circuits or components that can actually change states at all have just two states: on or off for a switch, current or no current for a wire, charged or not charged for a capacitor.



Simplicity tends to mean fewer components, reduced cost, reduced power, greater reliability and usually faster operation. There's no reason to sacrifice all that so a human can read code that humans rarely need to read.



Interpreted language like JavaScript, PHP or Python, and markup languages like HTML, CSS or XML, tend to have source code frequently modified or even generated by another program (as with PHP) so the expensive step of converting whole programs to binary machine code is skipped.



Maybe the source code is executed directly, maybe it's converted to a tokenized "byte code" version of the source program for execution, but something very close to the source code is executed. (In most BASIC interpreters, even the "comments" are executed, for example.)



There are "reverse engineering" tools for compiled binary programs in many cases, but they typically produce assembly source code, and can't supply meaningful names for most variables and functions. The result is only slightly more readable than that binary-treated-as-ASCII output you see on the screen. They certainly can't provide meaningful comments.



Commercial code is often even more obfuscated, using various "copy protection" methods to discourage unauthorized copying of the software, or to make it more difficult to attach malware into downloaded code, or to introduce "cheats" into an online game client. Things like that. There may be multiple levels of encryption, with some part of the key information retrieved from a server on the Net.



For almost all legitimate purposes, it's easier (and more rewarding, I think) to write code that does what you want than to reverse-engineer someone else's binary code that does the same thing.
Paultech
2014-03-05 07:25:22 UTC
you would need to use a decompiler eg -> http://boomerang.sourceforge.net/ well dont know anybody who writes code in binary/or hex since base 10 is just a way of doing calculations in binary a lot faster.


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...