Sage Journals: Discover world-class research

Abstract

In this article, we analyze the behavioral characteristics of domain name service queries produced by programs and then design an algorithm to detect malware with expired command-and-control domains based on the key feature of domain name service traffic, that is, repeatedly querying domain with a fixed interval. In total, 3027 malware command-and-control domains in the network traffic of Shanghai Jiao Tong University, affecting 249 hosts, were successfully detected, with a high precision of 92.0%. This algorithm can find those malware with expired command-and-control domains that are usually ignored by current research and would have important value for eliminating network security risks and improving network security environment.

Keywords

Malware detection expired command-and-control domain name system time sequence analysis

Introduction

Malware is a constant threat to the Internet and detecting malware is also a hot research topic. Recently, researchers^1,2 have noted and analyzed the activeness of malware in wireless sensor network (WSN). Malware is also a security risk to WSN due to their nature of real-time monitoring and reporting of sensor data.

To defend malware, surveys^3–5 summarized existing malware detection methods. The data sources used by detecting systems are generally NetFlow, Honeypot, domain name system (DNS) traffic, and address assignment information (border gateway protocol, autonomous system, dynamic host configuration protocol (DHCP), etc.), and some detecting systems need deep packet inspection to identify the characteristics of application layer.

At present, most of malware detection research is based on network communication characteristics of malware, especially C&C and attack traffic. In this article, we research the specific malware whose hard-coded C&C domains are currently expired for some reasons. Therefore, when these C&C domain names cannot be resolved, malware often would repeatedly retry, attempting to use the domain name to establish a communication with the C&C, which is different from the exploratory attempts in domain generation algorithm (DGA).^6,7

Expired C&C domain names not only have the characteristic of repeated attempts. Since a lot of malware does not try to initiate requests with random intervals, they usually exhibit a certain periodicity. The literature⁸ also shows that malicious software’s C&C connection would generate DNS requests and TCP connections with a fixed interval.

In order to describe the characteristics of repeatability and periodicity, we select two representative domain names, produced by program and manual input respectively, from DNS traffic and graphically present them as shown in Figure 1. The horizontal axis in Figure 1 is time, while the vertical axis is the client IP address sorted by appearance time. Points in the graph represent the client who initiates a DNS query in the corresponding time.

Figure 1.

DNS querying pattern: (a) tracker.sjtu.edu.cn and (b) www.cnbeta.com.

Figure 1(a) shows the query characteristics of the domain name tracker.sjtu.edu.cn . This domain name is the domain of a private BitTorrent tracker used by BitTorrent software.⁹ As we can see clearly from the graph that most of the clients have a repeating query pattern with a fixed interval of 60 min after the initial request. Figure 1(b) selects a website domain www.cnbeta.com . Almost all requests are from web browsers when users visit the site; thus, it does not exhibit the same pattern of repeating requests as shown in Figure 1(a).

Figure 1 presents two DNS querying patterns and we show that the program domain has the periodic requesting pattern. In fact, it is a common situation for malware that C&C domain would be periodically requested even if the domain has been taken down. Thus, it would follow the pattern as shown in Figure 1(a).

In this article, we propose a method to detect these malware based on this kind of traffic pattern. Our approach requires four steps as shown in Figure 2. The first step is collecting DNS failure traffic. With respect to the real world, the traffic dataset that we focus on is collected from campus real-time traffic. The second step is filtering irrelevant traffic for reducing false positive and expediting detection. The third step is to extract time sequences of domain requests and calculate the DNS clients’ retrying behavior. Finally, we compute the sequence scores and detect C&C domains with a threshold.

Figure 2.

Steps to perform detecting.

The contribution of this article is as follows: (1) we analyze malware C&C failure problem and find it arising repeated and periodic request behavior in DNS traffic. We verify it with convincing real world dataset. (2) We analyze the characteristics of expired C&C domains of malware, distinguish properties from legal domains, and propose an effective method to detect them. (3) In the campus network, the algorithm has successfully detected 3027 malware C&C on 249 affected client hosts with a precision of 92.0%.

In the past, most research focused on detecting active malware. This article focuses on detection of abandoned malware which is still running on users’ machines. Although there is not much research in C&C failure before, it is still a potentially huge risk to the network security (see section “Evaluation”).

This article is organized as follows. In section “DNS traffic filtering,” we first proceed a pre-filtering method of failed query to reduce the amount of data that are needed for sequence analysis. Section “C&C detection” describes our method to detect C&C based on query sequence analysis and the client behavior’s similarity. In section “Evaluation,” we assess and analyze the effectiveness of detection experimentally. Finally, we give the conclusion of this article.

DNS traffic filtering

Request sequence analysis analyzes all query messages’ timestamps by grouping requests of the same domain and requested type in the DNS traffic of a client. Before this, in order to reduce the amount of data for request sequence analysis, we first filter out the massive failed traffic as follows.

Failed BitTorrent tracker

Counting repeated failed requests generated by a single client to a single domain name, we discover that failed BitTorrent tracker will cause certain clients to send DNS domain name queries frequently.

Since there is no way to accurately identify all of the BitTorrent tracker domain names from DNS traffic, our study focuses on repeated attempts from a single client and domain names with particularly large number of requesting clients. For domain names that take up a large portion in the failed DNS flow, we use a search engine to verify whether it is a BitTorrent tracker domain name.

When using a search engine to judge the BitTorrent tracker domain name, the domain name must:

Have no known websites built on that domain name directly.

Have been used by a BitTorrent tracker as their address.

Using the results of the search engine is an easy way to judge whether a domain name is used as the BitTorrent tracker, because most of BitTorrent trackers’ URL addresses are in similar (or fixed) format, such as http://<domain>:<port>/announce and udp://<domain>:<port>/announce.

In the experiment on campus network traffic, we identified 342 domain names of failed BitTorrent tracker in total. The failed BitTorrent trackers widely exist in the BitTorrent seed files created earlier. There are also trackers that are failed but still copied and reused in newly made seed files, because BitTorrent users often want to use more trackers to find as much other downloaders (peers) with the same resources as possible. In addition, there are also failed tracker addresses carried in magnet URIs or magnetic links.

The amount of failed DNS queries caused by failed BitTorrent tracker is astonishing. Just with the 342 failed tracker domain names, the failed DNS queries caused by them occupy 42.2% of the total DNS failures in the campus network. It means nearly half of the DNS failures result from invalid BitTorrent tracker addresses. We also found a client who attempted to connect a failed tracker domain name for more than 100,000 times in 1 day. We think such kind of frequent retries may arise from the fact that the client has downloaded varieties of seed files that contain the failed tracker, but a more important reason is the possible defect in the implementation of the BitTorrent client. We identify the failed BitTorrent tracker domain name queries and summarize them in Table 1.

Table 1.

Top 20 failed BitTorrent trackers.

Domain	Client	Requests
denis.stalker.h3q.com	6459	31,302,428 (21.14%)
btfans.3322.org	7665	12,816,928 (8.65%)
tracker.piecesnbits.net	4862	9,905,472 (6.69%)
genesis.1337x.org	4838	7,429,393 (5.02%)
p2p.lineage2.com.cn	2909	6,198,697 (4.19%)
tracker.lamsoft.net	3749	5,789,829 (3.91%)
www.lamsoft.net	3679	4,082,306 (2.76%)
bt1.125a.net	3521	3,789,174 (2.56%)
publictracker.org	2756	3,714,981 (2.51%)
bt1.511yly.com	3386	3,548,223 (2.40%)
tracker9.bol.bg	2972	3,276,513 (2.21%)
exodus.1337x.org	2448	2,423,998 (1.64%)
photodiode.mine.nu	2106	1,983,473 (1.34%)
tracker.tjgame.enorth.com.cn	1558	1,780,432 (1.20%)
nemesis.1337x.org	1898	1,615,018 (1.09%)
tk.btcomic.net	1499	1,566,679 (1.06%)
tracker.mightynova.com	1483	1,408,713 (0.95%)
tracker4.finalgear.com	1509	1,396,008 (0.94%)
bt.cnxp.com	1486	1,333,599 (0.90%)
tracker1.desitorrents.com	1369	1,248,341 (0.84%)

Expired domain names of legitimate software

In this section, we inspect expired domain names that were once used by legitimate software. When the software is no longer maintained or the updated version no longer uses the previous domain name, expired domain names appeared.

As we all know, among the DNS requests that are sent by the client machine, quite a large part are not initiated by user’s operation (e.g. web browsing, email checking). A variety of software installed on the client system will automatically access network and generate DNS queries, which usually serve the purpose of automatic update and advertisementing inside the software. The failure of the domain used by the software has caused severe incident in the past, such as on 19 May 2009, after Storm Player’s authoritative DNS provider DNSPod went offline, massive amount of Storm video network client frequently retried to get access to their service in the background, overloading recursive DNS servers in several major ISPs.¹⁰

During the analysis of the campus network traffic, we located a lot of expired software domain names, many of which are still under frequent attempts to be resolved by the clients. For example, within 7 days, there are 68,936 clients attempting to resolve “stun01.sipphone.com,” which is an expired Session Traversal Utilities for NAT (STUN) server name. The total number of requests is 1,268,955. “nccpr.p2p.baofeng.net,” once used as Storm video advertisement server domain name, has 1,203,016 requests from 15,154 clients, showing a huge installation base and a widespread impact. The download software, Thunder, is also an important source of failed DNS requests. Once used by Thunder Assistant, “bibei.sandai.net” receives 1,181,038 attempts from 428 clients, while the expired domain names, “btrouter.sandai.net” and “ui.pmap.sandai.net,” have 9307 and 11,134 requesting clients, respectively, and 332,341 and 303,380 failed requests, respectively.

We filter out those domain names of common software that are failed to be resolved. Finally, we identified 333 expired domain names of common software with total request of 10,706,250 times, accounting for 3.05% of total DNS failure.

Other irrelevant traffic

Besides traffic that is mentioned in sections “Failed BitTorrent tracker” and “Expired domain names of legitimate software,” some failed traffic, which are irrelevant with malware’s behaviors, should also be filtered out. We illustrate them as follows. (1) Invalid top-level domains (TLDs). An invalid TLD means that the TLD is not registered, and thus, it contributes to a failure traffic such as localdomain, home. (2) Reverse DNS resolution. In the literature,¹¹ global survey results showed that the success rate of PTR query is only 30.4%, and 44.5% has a negative answer while 25.1% does not respond. It contributes to a large proportion of failure traffic that we do not care. (3) DNS-based blacklist (DNSBL). DNSBL provides a blacklist for DNS query. For queries that are not on the blacklist, the server returns NXDomain. Thus, DNSBL also has a certain proportion in the failure traffic and should be filtered out. (4) Intermittent failure. Due to the authorization servers’ instability, configuration errors, network failure, or other reasons, some domains may be unable to resolve occasionally. Filtering the intermittent failure domains can effectively reduce the interference caused by DNS system instability. (5) Internationalized domain names (IDN) domain. IDN domains accounted for only 0.012% of the failed traffic. It is mainly from web browsing’s input error or hyperlink error. (6) Campus network domain. We note that in failed traffic, the school’s domain sjtu.edu.cn accounted for 5.33%. The reason lies mainly in the following two points. First, the DNS on Shanghai Jiao Tong University campus does not serve only as recursive server for campus users, but also as an authorization server of sjtu.edu.cn. Therefore, some failure queries are from external recursive resolution. Second, campus wireless users can get a default domain suffix of sjtu.edu.cn when they use DHCP services. (7) Special symbols. DNS domain tags usually only contain letters, digits, and a hyphen (“-”), but some domains with unsupported special symbols can also be found in the traffic, which contributes 0.027% to failure traffic.

We summarize the proportion of each type of DNS failure, as shown in Figure 3. Through the preliminary filter, DNS failure that enters the request sequence analysis only accounts for 4.01% of the total DNS failure, showing a great reduction of data required for sequences analysis, while also reducing the interference in C&C domain detection from legitimate applications’ requests.

Figure 3.

Classification of DNS failures.

C&C detection

Request sequences analysis

Request sequences analysis studies the time sequence behavior that one client sends repeating requests to an expired domain. Considering domain requests from programs have the characteristics of repeated attempts and fixed time intervals, we discard domains that are requested less than a certain amount per day. In the realization of our approach, we only take domain names whose requests are more than eight times a day into account. We will explain it in section “C&C domain detection.” Our client domain name request time sequences are split daily, instead of weekly, because the shutdown of the clients at night will cause a large pause in the request sequence. We use 1-week campus traffic for our research. After discarding requests below the minimal request amount, a total of 361,281 valid request sequences entered the following analysis.

The first step for request sequence analysis is to take the effect of DNS resolving timeout retry on the operating system into consideration. Because domain name resolution of DNS usually runs on unreliable UDP protocol, situations like packet loss would probably happen. If no server response is received within a certain period of time after the client sent a query (Timeout), the same query would be sent again (Retry). In order to reduce the delay of DNS resolution for the application, timeout in client’s DNS library is usually small, far less than the DNS server’s timeout. Therefore, when the DNS server is still processing the iterative resolution, the client may already consider it as timeout and resend the request. Most client applications resolve domain name using operating systems’ API. Therefore, in order to study the timeout retry behavior of clients, we need to look into different operating systems.

The documentation^12,13 explained the behaviors of Windows DNS client. Timeout and retries in Windows DNS clients are determined by the registry value HKLM\System\CurrentControlSet\Services\dnscache\Parameters\DNSQueryTimeouts. By default, Windows will try five times. The timeout periods in Windows XP were 1, 1, 2, 4, 7 s, and for Windows Server 2003 and later, it is 1, 1, 2, 4, 4 s. When configured with multiple DNS servers, the process is more complicated. The DNS client behavior of Linux and BSD system is controlled by the/etc/resolv.conf configuration file:¹⁴ the $attempt$ option controls the timeout retries, while the $timeout$ option controls timeout length. Figure 4 shows actual measured time sequence of timeout retry messages under Windows Server 2008 and Ubuntu Linux, respectively.

Figure 4.

Timeout and retry behavior of DNS clients: (a) Windows server 2008 and (b) Ubuntu Linux.

The literature¹⁵ experimentally analyzed DNS failure request behavior of DNS clients under Windows, Linux, and Mac OS X with multiple browsers. DNS timeout retry has great significance for our time sequence analysis. In a series of DNS queries sent by the client, we must first identify the repeated requests sent by the DNS client due to timeout in a short period and merge them into one request. Only then, the judgment of periodic requests will be accurate. After analyzing the timeout retry feature for multiple operating systems’ DNS client, we set up a safe threshold. Only repeated requests that are within 18 s after the first query on the same domain name are believed to be the retries of the first query message due to timeout.

It is worth noting that when Linux is configured with multiple DNS servers, the timeout threshold of 18 s may not be enough, because multiple non-responding servers may raise the retrying period to more than half a minute. We ignore this situation, considering the timeout threshold we set is close to DNS server timeout, and Linux system is also not the target for most malware attack. So in this article, the simplification will not affect the final testing results.

To determine whether the client request on the domain is periodic, we calculate the request time interval { $d_{1}, d_{2}, \dots, d_{N}$ } for the request time sequence which has cleansed the timeout retry request { $t_{0}, t_{1}, t_{2}, \dots, t_{N}$ }, where $d_{i} = t_{i} - t_{i - 1}$ . Calculating the variance or standard deviation of the time interval is a simple method to decide whether the time sequence is periodic. However, we note that the standard deviation and the measured object share the same unit. For a large difference between the average intervals of the client request sequence, it is very difficult for us to use a uniform standard to determine whether it meets periodic characteristic. Here, we used variation coefficient to evaluate the consistency of time interval

$c_{v} = \frac{σ}{μ}$

$σ$ is the standard deviation of the samples, and $μ$ is the mean. As the variation coefficient is dimensionless, it is more suitable for the comparison among samples with different units or huge mean difference.

C&C domain detection

The algorithm of detecting C&C domain can be split into two parts. first, we need to detect which domain is a program domain. Second, we need to check whether a program domain is a true C&C domain. The procedure is presented in Figure 5. We highlight the key points in the algorithm as follows.

Figure 5.

C&C domain detection algorithm.

Detect_ProgramDomain

In the stage of detecting program domain, we extracted the domain request sequence from each host and analyzed the sequence if it meets the programmatic requesting features. The variation coefficient is able to describe the periodic characteristic of a query sequence, and the repetitive requesting nature of the program domain can be reflected by the number of retries for the same failed domain. Thus, to determine whether request sequence meets the characteristics for program domain, an evaluation function is defined on the basis of the variation coefficient of $c_{v}$ with repeating request numbers

$score = \frac{2 \cdot c_{v}}{\ln (N - 5)}$

N is the number of intervals in a request sequence. Here, we require that N must be more than 7. On one hand, the sequence needs a relatively observed length. If the sequence has a period, it would make sense that the sequence has eight observable intervals. One the other hand, there are 14,542 domains with the interval number ranging from 3 to 7. We use VirusTotal¹⁶ service to check these domains and we cannot find one malicious domain. Thus, we set N larger than 7 to make our sequence analysis more reasonable and it also greatly reduces the detect noise at the same time. Furthermore, we need to find a threshold to distinguish program domains from non-program domains. Therefore, we randomly choose 100 non-program domains and 100 program domains by manual as a test dataset. With the test dataset, we calculate the score for each domain in the test dataset. The result is shown in Figure 6. From Figure 6, it is clear to see that the score threshold should be 0.5, because the score of program domain is under 0.5.

Figure 6.

CDFs of program/non-program domain score.

C&C_RequestCond

Most domain requesters have the characteristics of programmatic behaviors.

While a domain is only used as malware C&C, all the domain requests are produced by malicious programs on the host. Therefore, almost all the requesters should have program-like request characteristics. If only a small proportion of requesters have program-like request characteristics, it is more likely to be a normal domain name. For example, visiting the domain of a news site, users on Web browser will not have a repeated periodic request behavior, while users with RSS client will expose periodic behavior because RSS client will access RSS contents by schedule. In other words, domain with occasionally repeated period requests is not enough as a judgment condition of C&C domain. IPRate of program domain is defined as the number of periodic IP requests divided by the total number of IP requests on the domain. We randomly select 100 non-C&C domains and 50 C&C domains by manual as the test dataset. Then we count the domain IPRate probability density function and depict the CDF curve as shown in Figure 7. As shown in Figure 7, we can find that there is a obvious separation between C&C domains and non-C&C domains when IPRate is equal to 0.5, so we set IPRate is larger than 0.5 as one necessary condition of C&C domain requesting behavior in this article.

Figure 7.

CDFs of C&C/non-C&C domain IPRate.

C&C_PeriodCond

2. All the requesters have a same request interval.

We already required that most of the requesters should have a repeating, periodic request to a domain in the previous part. Moreover, in most cases, each malware domain is always controlled by a same attacker. If many host machines have been infected by the same malware, they are more likely to be infected by a similar family of malware. Therefore, when they are communicating with the C&C, there would be an identical time interval. But for the legal domain names, for example, the domain name of the POP3 mail server, although the requests all come from Email clients, different users can have different fetch intervals, as it is set personally. So we use average variation coefficient to evaluate whether the average retrying interval of different requesters is consistent. We randomly selected 100 non-C&C domains and 50 C&C domains by manual as the test dataset. Then we count the average variation coefficient and depict the CDF curve as shown in Figure 8. From Figure 8, we can clearly see a separation line between C&C domains and non-C&C domains when average variation coefficient is equal to 0.2; thus, we set average variation coefficient is less than or equal to 0.2 as another necessary condition of C&C domain requesting behavior.

Figure 8.

CDFs of C&C/non-C&C average variation coefficient.

Filtering with KnownBenignDomainSet

Besides these two characteristics, we also used Alexa ranking¹⁷ for whitelisting. The top 1 million domains are considered as benign domains except the domains providing dynamic domain service such as DynDNS and 3322.org. It has been reported by security agencies that these dynamic domains were abused and most of them are exploited by a huge quantity of malware. In addition, we have defined additional rules to remove the domain name of Intra-Site Automatic Tunnel Addressing Protocol (ISATAP) and Web Proxy Auto-Discovery (WPAD) server.

Specially, during the experiment, we find some request sequences with very short period, for example, under 30 s. After we check these domains by manual, we find that these domains are benign, produced by program’s faults or abnormal programs. In fact, when a repeated request interval is less than 18, the exact interval would lie in ThreatExpert, Ltd.,^18,19 the range between 18 and 36, because we have set the 18-s merging window in section “Request sequences analysis.” In fact, it is not wise for the malware to use a very short retry interval, because it may bring too much pressure to C&C server, even make C&C server Denial-of-Service. Considering that it is very rare that a malware has such a short request interval, we ignore those sequences whose average interval is shorter than 30 s to avoid misjudgment.

Evaluation

In section “Request sequences analysis,” we get 361,281 domain request sequences where each domain request occurs at least eight times a day. After merging the timeout retry requests of the DNS client, there are 305,148 valid request sequences remained. Next, 56,133 request sequences were abandoned, because their daily requests are less than eight after merging. Our detection algorithm identified 12,757 client domain name request behaviors which have the repetitive, cyclical nature of the program domain requests; 6079 domain names were involved. Finally, after filtering the behavioral characteristics and whitelist of the domain names, 3290 domain names are judged as malware C&C domains.

Among the 3290 domain names detected, we found 2200 of them have the DGA characteristic, which is in the structure of <number>.nslook<number>.com, and <number> is a number of one to three digits. The domain name group belongs to the DGA category, but what differentiates it from other DGA malware is that this one try to request the same pseudo-random group of domain names repeatedly instead of changing domain names every day. The domain name group belongs to a known malware W32/Patched-AG.¹⁹

For other 1090 domain names detected by the algorithm, we need an efficient method to judge whether they belong to the malware. In this paper, we use Google to help recognize malware property of a domain name. Using Google as a security tool is a common method and there are past studies^20,21 which also used Google to detect phishing fraud websites.

We use the detected domain names as keywords to search on Google and take the first 100 of search results. It is worth mentioning that when searching on Google, we added quotation mark around the domain name so that Google will only return results containing the exact match of the whole string. Then, we record these information of the title, linked URL and web content segment in the first 100 results of a domain name.

For the first 100 results returned from the search engine, if there are URLs pointing to anti-virus software vendors, malware analysis sites, or blacklist website, the domain is likely to be a known malware domain. Figure 9 illustrates a typical example. The domain name ypgw.wallloan.com alerted by our algorithm is controlled by a known IRCBot Mocbot, and thus, the anti-virus software vendors’ (such as McAfee, ESET, SecureWorks, F-Secure, and Bitdefender) analysis of the virus takes up most of the top-ranking results.

Figure 9.

Google validation result.

The matching list in this article collects domain from 17 anti-virus vendors, 8 virus analysis tools (such as VirusTotal¹⁶, ThreatExpert¹⁸ etc.), 19 malware blacklists and monitoring site, and 4 anti-phishing blacklists. In addition, we matched the title of the search results with content fragments to find connections between domain names and malware, according to the keywords such as Trojan, Worm, Backdoor, Rootkit, and Virus.

Many domain names alerted by our algorithm have no results from Google. We also consider these domain names as possible malware domain names, for they have no connections with publicly known software, while those who has Google searching results are considered as legal. Note that we make Google a tool to verify and assess the output of our algorithm rather than a portion of our detection algorithm.

Figure 10 shows the results from Google search; 191 of the 1090 alerted domain names have connections with known malware, 636 of them have no searching results and thus are considered to be domain names of possible unknown malware, and 263 domain names are considered as legal domain names which are alerted by mistake. Also counting the 2200 domain names from DGA group as malicious, among 3290 alerted domain names, 3027 are C&C domain names of known or suspected malware. We get a better detecting performance with 92% accuracy rate.

Figure 10.

Google search result of a known malware domain.

We count the amount of clients which have made requests to the 3027 detected malware domain names. Since the algorithm is completely designed for the C&C domains of malware, these clients can be considered as being infected by malware. In our analysis of the DNS traffic on the campus network, 249 clients' IP addresses have been affected by malware, among which nearly 110 clients on average would be online every day (Figure 11).

Figure 11.

Number of infected clients.

Although the detected malware all have an expired C&C, malicious codes which are active on the client machines remain seriously harmful: (1) the control and communication of those malware may have failed, but the worm programs may continue to spread between hosts. (2) Malicious software may switch to another channel through fail-over mechanism in the case of a C&C failure.²² (3) Failed C&C domain may be reactivated, or taken over^23,24 by other attackers. So, the use of our C&C failure detection algorithm to find the infected hosts which temporarily lost control is of great significance to enhance network security and can effectively compensate for other C&C protocol analysis and signature detection which are difficult to find expired malware.

To further analyze the expired C&C domain request features, Figure 12 shows the average request interval distribution to the detected C&C domains. We avoid recounting the huge amount of similar domains in DGA group. We noted that about 50% malware has a 900 s request interval, and only less than 20% have an interval longer than 15 min, and more than 10% malware have an interval longer than 1 h.

Figure 12.

Cumulative distribution of query interval.

It is not surprising that a large amount of requests concentrates around 900 s, and it is related to the Operating System. The default negative answer cache time of Windows system’s DNSCache service is 900 s and is controlled by the value of HKLM\SYSTEM\CurrentControlSet\Services\Dnscache\Parameters\MaxNegativeCacheTtl in the registry.²⁵ Therefore, even if the error retry timeout of the malicious program is less than 900 s, failure to resolve the C&C domain name can also be answered by Windows from the cache.

Conclusion

The article presents an approach to detect malware C&C from DNS failures. The method focus on periodical failure-retry property of program-initiated domain name queries. Our approach first filter out the irrelevant queries from DNS traffic, then extract client-domain request sequence while merging duplicated packets caused by the operating system's DNS library. After that, we apply a series of procedures to identify request sequences belonging to malware C&C. Finally, we use Google search engine results to verify whether the domain name is malicious. In the campus network traffic, the algorithm successfully detected 3027 malware C&C domains which affected 249 client hosts. Meanwhile, the detection algorithm precision rate reached 92.0%. Malware taken down has not been extensively researched in the past, but it is of great significance to network security.

Footnotes

Academic Editor: Daniel Menasche

Declaration of conflicting interests

The author(s) declared no potential conflicts of interest with respect to the research,authorship,and/or publication of this article.

Funding

The author(s) disclosed receipt of the following financial support for the research,authorship,and/or publication of this article: This work was supported by National Key Research and Development Program of China (no. 2017YFB0802301),National Key Basic Research Program of China (no. 2013CB329603),the NSFC-Zhejiang Joint Fund for the Integration of Industrialization and Informatization (no. U1509219),Priority Development Field Project of Doctoral Fund (no. 20130073130006),the Key Program of National Natural Science Foundation of China (no. F0102),and National Natural Science Foundation of China (no. 61271220).

References

Queiruga-Dios

Hernández Encinas

Martín-Vaquero

. Malware propagation models in wireless sensor networks: a review. In: Proceedings of the international conference on European transnational education, San Sebastián, 19–21 October 2016. Cham: Springer International Publishing.

Chizari

Zulkurnain

. Modelling malware response in wireless sensor networks using stochastic cellular automata. J Mob Embed Distrib Syst 2014; 6(4): 159–166.

Bailey

Cooke

Jahanian

. A survey of botnet technology and defenses. In: Proceedings of the 2009 cybersecurity applications & technology conference for homeland security, Washington, DC, 3–4 March 2009, pp.299–304. New York: ACM.

Feily

Shahrestani

Ramadass

. A survey of botnet and botnet detection. In: Proceedings of the 3rd international conference on emerging security information systems and technologies, Athens, 18–23 June 2009, pp.268–273. New York: IEEE.

Zhu

Chen

. Botnet research survey. In: Proceedings of the 32nd annual IEEE international computer software and applications conference, Turku, 28 July–1 August 2008, pp.967–972. New York: IEEE.

Yadav

Reddy

AKK

Reddy

ALN

. Detecting algorithmically generated malicious domain names. In: Proceedings of the 10th ACM SIGCOMM conference on Internet measurement, Melbourne, VIC, Australia, 1–30 November 2010, pp.48–61. New York: IEEE.

Yadav

Reddy

AKK

Reddy

ALN

. Detecting algorithmically generated domain-flux attacks with DNS traffic analysis. IEEE ACM T Network 2012; 20(5): 1663–1677.

Wang

Huang

C-Y

Lin

S-J

. A fuzzy pattern-based filtering algorithm for botnet detection. Comput Netw 2011; 55(15): 3275–3286.

Tao

EB/OL, https://pt.sjtu.edu.cn/ (accessed 23 June 2013).

10.

Liu

. Lessons learned from May 19 China’s DNS collapse. In: Proceedings of the 2nd DNS-OARC workshop, Beijing, China, 5–6 November 2009.

11.

Gao

Yegneswaran

Chen

. An empirical reexamination of global DNS behavior. ACM SIGCOMM Comput Comm Rev 2013; 43(4): 267–278.

12.

Masri

. DNS clients and timeouts(part2) [R/OL], 14 December 2011, http://blogs.technet.com/b/stdqry/archive/2011/12/15/dns-clients-and-timeouts-part-2.aspx (accessed 29 June 2013).

13.

Microsoft. NET: DNS: DNS client resolution timeouts [DB/OL], 24 April 2013, http://support.microsoft.com/kb/2834226 (accessed 29 June 2013).

14.

resolv.conf(5)—Linux man page [DB/OL], http://linux.die.net/man/5/resolv.conf (accessed 29 June 2013).

15.

Gijsen

. Analyzing DNS(Sec) client behaviour. In: Proceedings of the DNS-OARC workshop, San Francisco, CA, 13–14 March 2011.

16.

VirusTotal, http://www.virustotal.com

17.

Alexa, http://www.alexa.com

18.

ThreatExpert

Ltd

. ThreatExpert [CP/OL], http://www.threatexpert.com/ (accessed 6 July 2013).

19.

Sophos

Ltd

. W32/Patched-AG [DB/OL], 29April2011, http://www.sophos.com/en-us/threat-center/threat-analyses/viruses-and-spyware/W32∼Patched-AG/detailed-analysis.aspx (26February2013, accessed 6 July 2013).

20.

Zhang

Hong

Cranorl

. CANTINA: a content-based approach to detecting phishing web sites. In: Proceedings of the 16th international world wide web conference, Banff, AB, Canada, 8–12 May 2007, pp.639–648. New York: ACM.

21.

Xiang

Hong

. A hybrid phish detection approach by identity discovery and keywords retrieval. In: Proceedings of the 18th international world wide web conference, Madrid, 20–24 April 2009, pp.571–580. New York: ACM.

22.

Neugschwandtner

Comparetti

Platzer

. Detecting malware’s failover C&C strategies with SQUEEZE. In: Proceedings of the 27th annual computer security applications conference, Orlando, FL, 5–9 December 2011, pp.21–30. New York: ACM.

23.

Stone-Gross

Cova

Cavallaro

. Your botnet is my botnet: analysis of a botnet takeover. In: Proceedings of the 16th ACM conference on computer and communications security, Chicago, IL, 9–13 November 2009, pp.635–647. New York: ACM.

24.

Dittrich

. So you want to take over a botnet. In: Proceedings of the 5th USENIX workshop on large-scale exploits and emergent threats, San Jose, CA, 24 April 2012.

25.

Microsoft. How to disable client-side DNS caching in Windows XP and Windows server 2003 [DB/OL], 12October2007, http://support.microsoft.com/kb/318803 (accessed 6 July 2013).