Question:
How do I tell if a piece of code is malicious?
anonymous
2017-08-16 17:47:34 UTC
Hi there, I program C, C++ and VB.net

If I had the source code of a program right in front of me, how would I be able to tell if it is malicious or not?
Five answers:
oyubir
2017-08-16 20:20:05 UTC
You need to trust something anyway.



Read "Reflections of trusting trust", by Ken Thomson. A must for any computer scientist (although I know a few computer scientist who wowed to abandon computer and enter a monastery after I made them read it^^)



If you don't want to read it, here is a summary (but really, it worth reading)



The article was made after Thomson's remarks when he received the Turing award.

He started his speech saying that he did not want to talk about the reason why he received the award, but instead wanted to talk about three anecdotes he find funny



1) When he was a kid, he had a leisure: programming a program whose output is its own code.

It is not as easy as it seems. Try it

Obviously if you write

#include

int main(){

printf("#include \nint main(){\nprintf...



you will never end typing the program

There are some solutions though. For example some variation of

char s[]="\";\nprintf("char s[]=\"%s\",s);\nprintf(\"%s\\n\",s);\n";

printf("char s[]=\"%s", s);

printf("%s\n", s);



would work (not exactly, because of the \". But I wanted to make it short here. In the article you have a working version.



There are some easier ways





2) You know that in C, '\n' is just a writing of the value 10.

How come? Because the compiler when parsing a string or a character, has somewhere a code that says "if the current character is '\', do not interpret it literaly, look the next character. If the character is 'n', then the read character is 10.



Something like:

char parse(char *s){

      if(s[0]=='\\'){

            if(s[1]=='n') return 10;

            if(s[1]=='r') return 13;

            ...

      }else{

            return s[0];

      }

}





Well, in fact, not really. If you look at the C compiler code, it looks in reality like that:

char parse(char *s){

      if(s[0]=='\\'){

            if(s[1]=='n') return '\n';

            if(s[1]=='r') return '\r';

            ...

      }else{

            return s[0];

      }

}



Which means exactly the same, since the C compiler is written in C, and in C '\n' and 10 are the same things.

But then, where is it written that '\n' is 10?



To explain that, Thomson's imagine that he wanted to add a new escaped character in the language, \v, meaning 12.

The first thing to do for him (coauthor of language C, I guess you know), would be to alter the code of the C compiler, like this:

char parse(char *s){

      if(s[0]=='\\'){

            if(s[1]=='n') return '\n';

            if(s[1]=='r') return '\r';

            if(s[1]=='v') return 12;

            ...

      }else{

            return s[0];

      }

}



Then compile the code (code of new C compiler, let's call it v2), with an older version of the C compiler, (Cv1). No compilation problem, obviously, since the code of Cv2 compiler is perfectly compatible with Cv1.

You obtain a compiled version of the Cv2 compiler.



Now that you have a Cv2 compiler, you can modify again the code: Cv3.c=

char parse(char *s){

      if(s[0]=='\\'){

            if(s[1]=='n') return '\n';

            if(s[1]=='r') return '\r';

            if(s[1]=='v') return '\v';

            ...

      }else{

            return s[0];

      }

}



Which is exactly the same code. Except that this code is only compilable with Cv2.

If you compile Cv3.c, (with Cv2 compiler. Cv1 would not compile it) you obtain a compiled version of Cv3. Which is exactly the same as Cv2. Except that its source could have no trace left that '\v' is 12.



That is exactly what happened to '\n'. You can look to the code of your C compiler: it is written nowhere that '\n' is 10. And same goes for the code of the compiler that compiled your compiler. Same goes for the code of the compiler that compiled the compiler that compiled the compiler that compiled your compiler. Etc.



So in which code is it written that '\n' is 10?



Nowhere! That's where!



Or, to be more precise: in the source code of a compiler that ceased to exist decades ago. The source code may even have been lost, that would change nothing. This '\n'=10 is in no avaiable source code to you. It is inherited from compiler to compiler since decades!

(each time an old compiler is used to compile a new version of the compiler, it passes to the compiler it compiles this '\n'=10 heritage)









3)

And the 3rd point is where the things start to be really funny:



Now, says Thomson (who, in addition to be the author of C, is also the author of kerberos, the authentification method used by Unix systems), what if I've added a trojan in kerberos?

Like this:

if(!strcmp(login, "thomson")) authok=TRUE;

You think you would have found it, because you've read the source code of kerberos before compiling and installing it yourself?



Not if I have also added a trojan in your compiler as well! Like this:

if(strstr(code, "kerberoscodepattern")){

      compile("kerberoscodepattern");

      compile("if(!strcmp(login, \"thomson\")) authok=TRUE");

}





You think then that you would have seen it when reading the source code of C compiler?



Not if I had use my ability (demonstrated in item 1 of the paper) to write code that generates itself, and my ability (demonstrated in item 2 of the paper) to remove a code from the source code, while having it still working.



Like this:

if(strstr(code, "kerberoscodepattern")){

      compile("kerberoscodepattern");

      compile("if(!strcmp(login, \"thomson\")) authok=TRUE");

}

if(strstr(code, "compilercodepattern")){

      compile(...); // Do whatever it takes (like in point 1) to replace ... by those 7 lines

}





Now I have a compiler that will add a trojan in kerberos, each time it recognizes that it is compiling kerberos code. And that will add those 7 lines in the compiler, each time it recognizes that it is compiling itself (or a future version of itself), even if those 7 lines are not in the compiled code.



And once I have it, I can remove those 7 lines from the source code of the compiler (like I've remove the '\n' => 10 code in point 2).



And for decades, C compiler will inherit this trojan from the compiler that compiled them.

For decades C compiler will have a vulnerability, that will not appear in any source code, neither of the compiler, nor of the compiler that has compiled the compiler etc.









In short: if you are really paranoid, looking the source code is not enough. Compiling everything your self, including the C compiler, is not enough.

You have to start from stratch yourself, boostraping from machine code, typed your self, (for example, you write a small compiler using machine code yourself. Then you can trust this small compiler, and try to use it to build a real C compiler. That you can trust. With which you will be able to compile Thomson's (or any other) C compiler (after checking the code). With which you will be able to compile any program you find on internet (after check its code).





(Well, now my answer is almost as long as Thomson's paper. So, I encourage you not to read it -if not too late :D- and read Thomson's paper instead)
?
2017-08-18 05:12:14 UTC
you would have to know,given the execution context,what exactly is the piece of code aiming to do, and whether flaws in it could lead to exploitation. malware is a broad subject and sometimes malicious code can be subtle. there is no code that will stand out in red in your IDE as "warning! this code is malicious!". if you are reading the code on a website and don't trust the author of the code,then you need to understand fully what the code does- do not copy and paste the code from untrusted authors.
keerok
2017-08-16 22:54:00 UTC
Taste it!
chrisjbsc
2017-08-16 18:03:48 UTC
By looking at the code and working out what it does.
anonymous
2017-08-16 17:58:36 UTC
You work through each and every function that code performs and you try and rationalize it in your own mind if you think that is a function you think a user would want to perform or something that is intended to cause harm.



Disabling the network adapter every 5 minutes, not something the user 'usually' want to do, add a a function to a media player seems like a thing a user might want to do.



There are some grey areas (and this is where antivirus gets false flags) users performing actions which when put together make sense but when looked at indervidually. For example - program disabled Windows updates and deletes Windows temp folders, might seem like a thing a user wouldn't want to do, but when you read the application is being used to stop the latest creators update from coming down because it breaks company software it makes sense.



On the other hand changing a homepage and omnibox search functions seem like legit things to do, until you realize it's switched to malware sites and key logging.


This content was originally posted on Y! Answers, a Q&A website that shut down in 2021.
Loading...