This is an android app for malware detection based on anomaly using dynamic analysis. Capitalize on earlier approaches for dynamic analysis of application behavior as a means for detecting malware in the Android platform. The detector is embedded in a overall framework for collection of traces from an unlimited number of real users based on crowdsourcing. Our framework has been demonstrated by analyzing the data collected in the central server using two types of data sets: those from articial malware created for test purposes, and those from real malware found in the wild.
for further description read the below link paper (http://www.ida.liu.se/labs/rtslab/publications/2011/spsm11-burguera.pdf)
The implementation consists of two main programs: client and server.
Working of client Client program is a daemon process that runs in background on Android platform. It's responsible for collecting system call data and communicating with server. It is designed to be lightweight with small code to conserve battery consumption on smartphones.
**Steps followed by client:
*System call tracing using strace
Android at its core has a process they call the “Zygote”, which starts up at init. It gets its name from dictionary definition: "It is the initial cell formed when a new organism is produced". This process is a “Warmed-up” process, which means it’s a process that’s been initialized and has all the core libraries linked in. The only mission of Zygote is to launch application. This means that Zygote is the parent of all apps process. Whenever Zygote receives request to launch a new application, it forks a new process (replicates itself). Thereafter application code replaces Zygote and starts execution. Client requires to capture system call requested by all application. One way to do that is to attach strace to Zygote process. This way we can capture system call of all application process because Zygote is the parent of all other application. At this stage output of strace is stored in one file.
*Parsing data according to application
Strace will log system call data of process and all its child process and so on in one file. Following example shows strace output for multiple (parent child) application: [pid 28772] select(4, [3], NULL, NULL, NULL <unfinished ...> [pid 28779] clock_gettime(CLOCK_REALTIME, {1130322148, 939977000}) = 0 [pid 28772] <... select resumed> ) = 1 (in [3])
Each process is differentiated from other by the pid in the log. Using the pid, we can trace back to the application running the given process. Thus, we are able to obtain system call data for that application. In this stage, we collect system call data for each application (that we're interested) in separate file. Each file also contains additional information regarding the application. This additional information is used by the server to decode back the application.
*Establish connection with server
Once sufficient data has been collected and client has it networking turned on, client establish connection with server process to transfer data. Transfer of data can be scheduled to happen at specific intervals to prevent power consumption.
*Transfer data to server
In the next step, client sends the system call data to server. A particular format is setup, which helps server identifies which application does that system call data belongs to.
*React to the threat
Server will in turn process on the data, identify whether the application is benign or malicious and return the result back to client. Client will then warn the user about threat and/or prompt user to uninstall/remove the malicious application.
**Working of server
Server process is responsible for detecting malicious application based on the data send by the client and informing client of the results. Main design goal is performance and concurrency (ability to handle multiple request). Thus server process is implemented in C language for faster execution.
*Listen for client request
Client initiates connection with server. Server can handle multiple clients at same time.
*Process client request
Server, then, forks a new process to handle the incoming client request. Transfer of system call data from client takes place.
*Identify application
Next, server identifies the application using the additional information stored along with system call data. If application has not been stored in server's application database, a new entry is created.
*Parse system call data to vector form
System call data is then converted in to vector form, which essentially is an array of numbers, each representing the frequency of particular system call, in predefined order of system calls. A simple bash based shell script is used for parsing. This vector is the passed to subsequent malware detection analysis.
*Clustering
The vector obtained from previous step is used as an input to simple clustering algorithm (K-means clustering algorithm). This clustering algorithm is then able to cluster this application data to either benign or malicious. The exact steps taken are: i. Obtain the centroid values from the database ii. Find the cluster which is closest to the input vector iii. Calculate the new centroid obtained after insertion and store the result to the database iv. Save the cluster number of the input vector for future use. v. Find whether application is malicious or not based on the comparison of count of data vectors in current cluster number and the alternate cluster. It is assumed that the cluster with more vectors is not malicious.
*Deliver the results to the client
The result of the clustering algorithm is then communicated to the client. Appropriate action is then taken by the client.