provos

Transcript

1 iFRAME s Point to Us All Your Fabian Monrose Panayiotis Mavrommatis Niels Provos Moheeb Abu Rajab Google Inc. Johns Hopkins University @google.com { } niels, panayiotis { @cs.jhu.edu moheeb, fabian } Abstract tacks are being replaced by other mechanisms. Chief among these is the exploitation of the web, and the ser- As the web continues to play an ever increasing role vices built upon it, to distribute malware. in information exchange, so too is it becoming the pre- vailing platform for infecting vulnerable hosts. In this This change in the playing field is particularly alarm- paper, we provide a detailed study of the pervasiveness ing, because unlike traditional scanning attacks that use drive-by downloads on the Internet. Drive- of so-called push-based infection to increase their population, web- URL s that attempt to exploit by downloads are caused by based malware infection follows a pull-based model. For their visitors and cause malware to be installed and run the most part, the techniques in use today for deliver- months we processed 10 automatically. Over a period of ing web-malware can be divided into two main cate- s, and our results shows that a non-trivial URL billions of gories. In the first case, attackers use various social en- URL s, initiate drive- amount, of over million malicious 3 gineering techniques to entice the visitors of a website by downloads. An even more troubling finding is that to download and run malware. The second, more de- 1 . 3% of the incoming search queries to approximately vious case, involves the underhanded tactic of targeting labeled URL Google’s search engine returned at least one various browser vulnerabilities to down- automatically as malicious in the results page. We also explore sev- load and run—i.e., unknowingly to the visitor—the bi- eral aspects of the drive-by downloads problem. Specifi- nary upon visiting a website. When popular websites cally, we study the relationship between the user brows- are exploited, the potential victim base from these so- ing habits and exposure to malware, the techniques used drive-by downloads called can be far greater than other to lure the user into the malware distribution networks, forms of exploitation because traditional defenses (e.g., and the different properties of these networks. firewalls, dynamic addressing, proxies) pose no barrier to infection. While social engineering may, in general, be an important malware spreading vector, in this work 1 Introduction we restrict our focus and analysis to malware delivered via drive-by downloads. It should come as no surprise that our increasing reliance [0] provided insights on this et al. Recently, Provos on the Internet for many facets of our daily lives ( e.g., new phenomenon, and presented a cursory overview of commerce, communication, entertainment, etc.) makes web-based malware. Specifically, they described a num- the Internet an attractive target for a host of illicit ac- t ber of server- and client-side exploitation techniques tha tivities. Indeed, over the past several years, Internet ser - are used to spread malware, and elucidated the mecha- vices have witnessed major disruptions from attacks, and nisms by which a successful exploitation chain can start the network itself is continually plagued with malfea- and continue to the automatic installation of malware. In sance [1]. While the monetary gains from the myriad this paper, we present a detailed analysis of the malware of illicit behaviors being perpetrated today ( e.g., phish- serving infrastructure on the web using a large corpus of ing, spam) is just barely being understood [11], it is clear malicious URLs collected over a period of ten months. that there is a general shift in tactics—wide-scale attacks Using this data, we estimate the global prevalence of aimed at overwhelming computing resources are becom- drive-by downloads, and identify several trends for dif- ing less prevalent, and instead, traditional scanning at- 17th USENIX Security Symposium 1 USENIX Association

2 ferent aspects of the web malware problem. Our results 2 Background reveal an alarming contribution of Chinese-based web Unfortunately, there are a number of existing exploita- sites to the web malware problem: overall, 7% of the tion strategies for installing malware on a user’s com- malware distribution servers and % of the web sites puter. One common technique for doing so is by re- that link to them are located in China. These results raise motely exploiting vulnerable network services. How- serious question about the security practices employed ever, lately, this attack strategy has become less suc- by web site administrators. cessful (and presumably, less profitable). Arguably, the Additionally, we study several properties of the mal- proliferation of technologies such as Network Address ware serving infrastructure, and show that (for the most Translators (NATs) and firewalls make it difficult to re- part) the malware serving networks are composed of motely connect and exploit services running on users’ tree-like structures with strong fan-in edges leading to computers. This, in turn, has lead attackers to seek other the main malware distribution sites. These distribution avenues of exploitation. An equally potent alternative is sites normally deliver the malware to the victim after a to simply lure web users to connect to (compromised) number of indirection steps traversing a path on the dis- malicious servers that subsequently deliver exploits tar- tribution network tree. More interestingly, we show that geting vulnerabilities of web browsers or their plugins. several malware distribution networks have linkages that Adversaries use a number of techniques to inject con- can be attributed to various relationships. tent under their control into benign websites. In many cases, adversaries exploit web servers via vulnerable In general, the edges of these malware distribution scripting applications. Typically, these vulnerabilitie s networks represent the hop-points used to lure users to ( in phpBB or InvisionBoard) allow an adversary e.g., the malware distribution site. By investigating these to gain direct access to the underlying operating sys- edges, we reveal a number of causal relationships that tem. That access can often be escalated to super-user eventually lead to browser exploitation. More troubling, privileges which in turn can be used to compromise any we show that drive-by downloads are being induced by web server running on the compromised host. In general, mechanisms beyond the conventional techniques of con- upon successful exploitation of a web server the adver- trolling the content of compromised websites. In par- sary injects new content to the compromised website. In ticular, our results reveal that Ad serving networks are most cases, the injected content is a link that redirects increasingly being used as hops in the malware serving URL that hosts a script the visitors of these websites to a chain. We attribute this increase to syndication, a com- crafted to exploit the browser. To avoid visual detection mon practice which allows advertisers to rent out part of by website owners, adversaries normally use invisible their advertising space to other parties. These findings zero pixel e.g., IFRAME s) to hide HTML components ( are problematic as they show that even protected web- the injected content. servers can be used as vehicles for transferring malware. Another common content injection technique is to use Additionally, we also show that contrary to common wis- websites that allow users to contribute their own con- dom, the practice of following “safe browsing” habits tent, for example, via postings to forums or blogs. De- ( avoiding gray content) by itself is not an effective i.e., pending on the site’s configuration, user contributed con- safeguard against exploitation. tent may be restricted to text but often can also contain The remainder of this paper is organized as follows. HTML such as links to images or other external content. In Section , we provide background information on how This is particularly dangerous, as without proper filter- vulnerable computer systems can be compromised solely ing in place, the adversary can simply inject the exploit by visiting a malicious web page. Section  gives an without the need to compromise the web server. URL overview of our data collection infrastructure and in Sec- Figure 1 illustrates the main phases in a typical in- tion  we discuss the prevalence of malicious web sites teraction that takes place when a user visits a web- on the Internet. In Section , we explore the mecha- site with injected malicious content. Upon visiting this nisms used to inject malicious content into web pages. website, the browser downloads the initial exploit script We analyze several aspects of the web malware distribu- ). The exploit script (in most cases, via an ( e.g., IFRAME tion networks in Section . In Section 7 we provide an ) targets a vulnerability in the browser or javascript overview of the impact of the installed malware on the one of its plugins. Interested readers are referred to infected system. Section  discusses implications of our Provos [0] for a number of vulnerabilities that et al. results and Section  presents related work. Finally, we are commonly used to gain control of the infected sys- conclude in Section 10. tem. Successful exploitation of one of these vulnera- 17th USENIX Security Symposium USENIX Association 

3 Pre-processing Phase. As Figure 2 illustrates, the data  processing starts from a large web repository maintained  by Google. Our goal is to inspect URLs from this repos- itory and identify the ones that trigger drive-by down-  URL loads. However, exhaustive inspection of each in  the repository is prohibitively expensive due to the large  in the repository (on the order of bil- number of URLs  lions). Therefore, we first use light-weight techniques to  that are likely malicious then subject them URLs extract to a more detailed analysis and verification phase.   Figure 1: A typical Interaction with of drive-by down- . load victim with a landing URL bilities results in the automatic execution of the exploit code, thereby triggering a drive-by download. Drive-by downloads start when the exploit instructs the browser to connect to a malware distribution site to retrieve malware executable(s). The downloaded executable is then auto- Figure 2: selection and verification workflow. URL 1 matically installed and started on the infected system . Finally, attackers use a number of techniques to evade We employ the mapreduce [9] framework to process detection and complicate forensic analysis. For example, billions of web pages in parallel. For each web page, we in javascript the use of randomly seeded obfuscated extract several features, some of which take advantage of their exploit code is not uncommon. Moreover, to com- URLs are hijacked to include the fact that many landing plicate network based detection attackers use a number malicious payload(s) or to point to malicious payload(s) or redirection steps before the browser eventually con- from a distribution site. For example, we use “out of tacts the malware distribution site. IFRAME s, obfuscated JavaScript, or place” s to IFRAME known distribution sites as features. Using a specialized machine-learning framework [7], we translate these fea- 3 Infrastructure and Methodology tures into a likelihood score. We employ five-fold cross- validation to measure the quality of the machine-learning Our primary objective is to identify malicious web sites framework. The cross-validation operates by splitting ( i.e. , URLs that trigger drive-by downloads) and help improve the safety of the Internet. Before proceeding the data set into 5 randomly chosen partitions and then further with the details of our data collection methodol- training on four partitions while using the remaining par- tition for validation. This process is repeated five times. ogy, we first define some terms we use throughout this malicious paper. We use the terms For each trained model, we create an ROC curve and use and landing pages that initiate URLs the average ROC curve to estimate the overall accuracy. interchangeably to denote the URLs drive-by downloads when users visit them. In our subse- Using this ROC curve, we estimate the false positive and according to their URLs quent analysis, we group these detection rate for different thresholds. Our infrastructu re top level domain names and we refer to the resulting set pre-processes roughly one billion pages daily. In order to . In many cases, the malicious pay- as the landing sites fully utilize the capacity of the subsequent detailed ver- load is not hosted on the landing site, but instead loaded ification phase, we choose a threshold score that results − 3 or a SCRIPT with a in an outcome false positive rate of about 10 via an IFRAME from a remote site. We . This call the remote site that hosts malicious payloads a dis- corresponding detection rate of approximately 0 . 9 . In what follows, we detail the different tribution site that we subject to URLs amounts to about one million components of our data collection infrastructure. the computationally more expensive verification phase. 17th USENIX Security Symposium  USENIX Association

4 In addition to analyzing web pages in the crawled web are flagged as malicious. The verification sys- URLs new repository, we also regularly select several hundred thou- tem records all the network interactions as well as the are URLs URLs sands for in-depth verification. These state changes. In what follows, we describe how we pro- as well as from URLs randomly sampled from popular cess the network traces associated with the detected ma- reported by URLs the global index. We also process licious URLs to shed light on the malware distribution users. infrastructure. Verification Phase. This phase aims to verify whether Constructing the Malware Distribution Networks. a candidate from the pre-processing phase is ma- URL To understand the properties of the web malware serving initiates a drive-by download). To do that, i.e., licious ( infrastructure on the Internet, we analyze the recorded that simultane- web-honeynet we developed a large scale network traces associated with the detected malicious ously runs a large number of Microsoft Windows images URLs malware distribution networks . to construct the in virtual machines. Our system design draws on the ex- We define a distribution network as the set of malware perience from earlier work [], and includes unique fea- delivery trees from all the landing sites that lead to a par- tures that are specific to our goals. In what follows we ticular malware distribution site. A malware delivery tree URL verification process. discuss the details of the consists of the landing site, as the leaf node, and all nodes Each honeypot instance runs an unpatched version of ( web sites) that the browser visits until it contacts the i.e., URL Internet Explorer. To inspect a candidate , the sys- malware distribution site (the root of the tree). To con- tem first loads a clean Windows image then automati- struct the delivery trees we extract the edges connecting cally starts the browser and instructs it to visit the candi- header from the these nodes by inspecting the Referer URLs URL date using a combina- . We detect malicious recorded successive HTTP requests the browser makes tion of execution based heuristics and results from anti- after visiting the landing page. However, in many cases URL virus engines. Specifically, for each visited we run the Referer headers are not sufficient to extract the the virtual machine for approximately two minutes and full chain. For example, when the browser redirection monitor the system behavior for abnormal state changes Referrer results from an external script the , in this including file system changes, newly created processes case, points to the base page and not the external script and changes to the system’s registry. Additionally, we Referer file. Additionally, in many cases the header is subject the HTTP responses to virus scans using multi- because the requests are made from within e.g., not set ( , we de- URLs ple anti-virus engines. To detect malicious a browser plugin or newly-downloaded malware). velop scoring heuristics used to determines the likelihood To connect the missing causality links, we interpret the that a score based URL is malicious. We determine a URL HTML and JavaScript content of the pages fetched by the on a combined measure of the different state changes browser and extract all the URLs from the fetched pages. resulting from visiting the URL . Our heuristics score that URLs Then, to identify causal edges we look for any based on the number of created processes, the URLs match any of the HTTP fetches that were subsequently number of observed registry changes and the number of URLs visited by the browser. In some cases, contain . file system changes resulting from visiting the URL randomly generated strings, so some requests cannot be To limit false positives, we choose a conservative de- matched exactly. In these cases, we apply heuristics cision criteria that uses an empirically derived thresh- based on edit distance to identify the most probable par- as malicious. This threshold is set URL old to mark a ent of the URL . Finally, for each malware distribution such that it will be met if we detect changes in the sys- site, we construct its associated distribution network by tem state, including the file system as well as creation combining the different malware delivery trees from all malicious of new processes. A visited URL is marked as landing pages that lead to that site. if it meets the threshold and one of the incoming HTTP responses is marked as malicious by at least one anti- Our infrastructure has been live for more than one virus scanner. Our extensive evaluation shows that this year, continuously monitoring the web and detecting ma- a criteria introduces negligible false positives. Finally, . In what follows, we report our findings URLs licious that meets the threshold requirement but has no in- URL based on analyzing data collected during that time pe- coming payload flagged by any of the anti-virus engines, riod. Again, recall that we focus here on the perva- . is marked as suspicious siveness of malicious activity (perpetrated by drive-by On average, the detailed verification stage processes downloads) that is induced simply by visiting a landing  about one million URLs daily, of which roughly  000 page, thereafter requiring no additional interaction on the 17th USENIX Security Symposium USENIX Association 

5 1.6 e.g., client’s part ( clicking on embedded links). Finally, 1.4 we note that due to the large scale of our data collection and some infrastructural constraints, a number longitu- 1.2 dinal aspects of the web malware problem ( the life- e.g., 1 time of the different malware distribution networks) are beyond the scope of this paper and are a subject of our 0.8 future investigation. 0.6 0.4 4 Prevalence of Drive-by Downloads 0.2 We provide an estimate of the prevalence of web- Search queries with results labeled as malicious (%) 0 Apr-2007 May-2007 Jun-2007 Jul-2007 Aug-2007 Nov-2007 Dec-2007 Dec-2007 Jan-2008 Sep-2007 Oct-2007 malware based on data collected over a period of ten Date months (Jan 007 - Oct 007). During that period, we for in-depth processing URLs subjected over 0 million Figure : Percentage of search queries that resulted in at through our verification system. Overall, we detected least one labeled as malicious; 7-day running avg. URL hosted on more than more than 3 million malicious URLs thousand landing sites. Overall, we observed more 180 than thousand different distribution sites. The findings 9 appear at uniformly distributed ranks within the top mil- are summarized in Table 1. Overall, these results show lion web sites—with the most popular landing page hav- the scope of the problem, but do not necessarily reflect , . These results further highlight the 588 ing a rank of 1 the exposure of end-users to drive-by downloads. In what significance of the web malware threat as they show the follows, we attempt to address this question by estimat- 6% extent of the malware problem; in essence, about 0 . ing the overall impact of the malicious web sites. URLs that appeared most frequently of the top million in Google’s search results led to exposure to malicious Jan - Oct 007 Data collection period activity at some point. , 66 330 URLs checked in-depth Total , 534 An additional interesting result is the geographic lo- URLs Unique suspicious landing 3 385 , , 889 cality of web based malware. Table  shows the ge- 417 Unique malicious landing URLs 3 , 590 , ographic breakdown of IP addresses of the top  mal- Unique malicious landing sites 181 , 699 ware distribution sites and the landing sites. The results Unique distribution sites 9 , 340 show that a significant number of Chinese-based sites of the 67% contribute to the drive-by problem. Overall, Table 1: Summary of collected data. of the landing sites 6% . 64 malware distribution sites and are hosted in China. These findings provide more evi- dence [1] of poor security practices by web site admin- To study the potential impact of malicious web sites , running out-dated and unpatched versions e.g. istrators, on the end-users, we first examine the fraction of incom- of the web server software. ing search queries to Google’s search engine that return URL labeled as malicious in the results page. at least one % of all dist. site % of all landing site hosting country dist. sites hosting country landing sites Figure  provides a running average of this fraction. The .% China 7.0% China graph shows an increasing trend in the search queries that 1.% 1.0% United States United States return at least one malicious result, with an average ap- .0% Russia .% Russia 3% . 1 proaching of the overall incoming search queries. Korea .% Malaysia .0% Germany Korea .0% .0% This finding is troubling as it shows that a significant fraction of search queries return results that may expose Table : Top  Hosting countries the end-user to exploitation attempts. To further understand the importance of this finding, Upon closer inspection of the geographic locality of we inspect the prevalence of malicious sites among the the web-malware distribution networks as a whole ( i.e., links that appear most often in Google search results. URLs appearing in the search the correlation between the location of a distribution site From the top one million , 6 and the landing sites pointing to it), we see that the mal- 000 belong to sites that have been engine results, about verified as malicious at some point during our data col- ware distribution networks are highly localized within lection. Upon closer inspection, we found that these sites common geographical boundaries. This locality varies 17th USENIX Security Symposium  USENIX Association

6 across different countries, and is most evident in China, 5 Malicious Content Injection with of the landing sites in China pointing to mal- 96% ware distribution servers hosted in that country. In Section , we showed that exposure to web-malware is not strongly tied to a particular browsing habit. Our as- sertion is that this is due, in part, to the fact that drive-by 4.1 Impact of browsing habits downloads are triggered by visiting staging sites that are In order to examine the impact of users’ browsing habits not necessarily of malicious intent but have content that on their exposure to exploitation via drive-by downloads, lures the visitor into the malware distribution network. we measure the prevalence of malicious websites across In this section, we validate this conjecture by study- the different website functional categories based on the ing the properties of the web sites that participate in the DMOZ classification [1]. Using a large random sample malware delivery trees. As discussed in Section , at- 2 . URLs 7 of about URL to million , we first map each tackers use a number of techniques to control the con- its corresponding DMOZ category. We were able to find tent of benign web sites and turn them into nodes in the the corresponding DMOZ categories for about 0% of malware distribution networks. These techniques can be  URL . We further inspect each URLs these through our divided into two categories: web server compromise and indepth verification system then measure the percentage blog posts). Unfor- e.g., third party contributed content ( in each functional category. Figure  of malicious URLs tunately, it is generally difficult to determine the exact shows the prevalence of detected malicious and suspi- contribution of either category. In fact, in some cases cious websites in each top level DMOZ category. even manual inspection of the content of each web site As the graph illustrates, website categories associ- may not lead to conclusive evidence regarding the man- adult websites) show a e.g., ated with “gray content” ( ner in which the malicious content was injected into the stronger connection to malicious content. For instance, web site. Therefore, in this section we provide insights URLs in the Adult category exhibited about 0.% of the into some features of these web sites that may explain drive-by download activity upon visiting these websites. their presence in the malware delivery trees. We only fo- These results suggest that users who browse such web- cus on the features that we can determine in an automated sites will likely be more exposed to exploitation com- fashion. Specifically, where possible, we first inspect pared to users who browse websites from the other func- the version of the software running on the web server tional categories. However, an important observation for each landing site. Additionally, we explore one im- from the same figure is that the distribution of malicious portant angle that we discovered which contributes sig- websites is not significantly skewed toward pages that nificantly to the distribution of web malware—namely, serve gray content. In fact, the distribution shows that drive-by downloads via Ads. website all malicious websites are generally present in categories we observed. Overall, these results show that while “safe browsing” habits may limit users’ exposure to drive-by downloads it does not provide an effective 5.1 Web Server Software safeguard against exploitation. We first begin by examining (where possible) the soft- 0.8% Suspicious ware running on the web-servers for all the landing sites Malware 0.7% that lead to the malware distribution sites. Specifically, 0.6% X-Powered-By ” ” and “ Server we collected all the “ 0.5% 0.4% header tokens from each landing page (see Table ). 0.3% Not surprisingly, of those servers that reported this in- 0.2% formation, a significant fraction were running outdated 0.1%  versions of software with well known vulnerabilities . 0.0% Prevalance of malicious URLs in category Unknown Food/Drink Animals Finance/Insurance Games Industries Online Communities Business Entertainment Computers/Electronics Adult Internet Telecommunications Real Estate Society Lifestyles Travel Health Beauty/Personal Care Shopping Automotive Home/Garden Local News/Current Events Arts/Humanities Reference For example, .1% of the Apache servers and .% of servers with PHP scripting support reported a version with security vulnerabilities. Overall, these results refl ect the weak security practices applied by the web site ad- ministrators. Clearly, running unpatched software with Figure : Prevalence of suspicious and malicious pages. known vulnerabilities increases the risk of content con- trol via server exploitation. 17th USENIX Security Symposium USENIX Association 

7 Srv. Softwar e Up-to-date Old count Unknown .% % ,0 .% Apache n/a n/a Microsoft IIS 11,0 n/a 1,70 n/a n/a n/a Unknown Scripting 7,7 1.% .% PHP .% Table : Server version for landing sites. In the case of Micr osoft IIS, we could not verify their version. 60 scribed in Section . For each tree, we examine every Weighted by frequency of appearance 55 Weighted by unique landing sites 2 intermediary node for membership in a set of well 000  50 known advertising networks. If any of the nodes qual- 45 ify, we count the landing site as being infectious via Ads. 40 Moreover, to highlight the impact of the malware deliv- 35 ered via Ads relative to the other mechanisms, we weight 30 the landing sites associated with Ads based on the fre- 25 quency of their appearance in Google search results com- 20 15 pared to that of all landing sites. Figure  shows the 10 percentage of landing sites belonging to Ad networks. Percentage of malware infections via advertising 5 On average, 2% of the landing sites were delivering mal- 0 ware via advertisements. More importantly, the overall 09-2007 06-2007 07-2007 08-2007 04-2007 10-2007 11-2007 05-2007 03-2007 Week analyzed weighted share for those sites was substantial—on aver- of the overall search results that returned land- 12% age, Figure : Percentage of landing sites potentially infect- ing pages were associated with malicious content due to ing visitors via malicious advertisements, and their rela- unsafe Ads. This result can be explained by the fact that tive share in the search results. Ads normally target popular web sites, and so have a much wider reach. Consequently, even a small fraction of malicious Ads can have a major impact (compared to 5.2 Drive-by Downloads via Ads the other delivery mechanisms). Today, the majority of Web advertisements are dis- Another interesting aspect of the results shown in Fig- tributed in the form of third party content to the adver- ure  is that Ad-delivered drive-by downloads seem to tising web site. This practice is somewhat worrisome, as appear in sudden short-lived spikes. This is likely due a web page is only as secure as it’s weakest component. to the fact that Ads appearing on several advertising web In particular, even if the web page itself does not contain sites are centrally controlled, and therefore allow the ma- any exploits, insecure Ad content poses a risk to adver- licious content to appear on thousands of web sites sites tising web sites. With the increasing use of Ad syndica- almost instantaneously. Similarity, once detected, these tion (which allows an advertiser to sell advertising space Ads are removed simultaneously, and so disappear as to other advertising companies that in turn can yet again quickly as they appeared. For this reason, we notice syndicate their content to other parties), the chances that that drive-by downloads delivered by other content in- insecure content gets inserted somewhere along the chain e.g., jection techniques ( individual web servers compro- quickly escalates. Far too often, this can lead to web mise) have more lasting effect compared to Ad deliv- pages running advertisements to untrusted content. This, ered malware, as each web site must be secured inde- g in itself, represents an attractive avenue for distributin pendently. malware, as it provides the adversary with a way to in- ject content to web sites with large visitor base without The general practice of Ad syndication contributes sig- having to compromise any web server. nificantly to the rise of Ad delivered malware. Our re- To assess the extent of this behavior, we estimate the sults show that overall 7% of the landing sites that de- overall contribution of Ads to drive-by downloads. To livered malware via Ads use multiple levels of Ad syn- do so, we construct the malware delivery trees from all dication. To understand how far trust would have to ex- URLs detected malicious following the methodology de- tend in order to limit the Ad delivered drive-by down- 17th USENIX Security Symposium 7 USENIX Association

8 1 1 No Ads Ads 0.8 0.8 0.6 0.6 CDF CDF 0.4 0.4 Network I 0.2 II 0.2 III IV V 0 0.4 0.6 0.8 0 0.2 1 0 14 0 18 20 2 4 6 10 16 12 8 Normalized Ad network position in the chain Number of redirection steps Figure 7: CDF of the normalized position of the top five Figure : CDF of the number of redirection steps for Ads Ad networks most frequently participating in malware that successfully delivered malware. delivery chains. loads, we plot the distribution of the path length from the landing site leading to the malware distribution sites for each delivery tree. The edges connecting the nodes in egories advertising networks do not participate directly these paths reflect the number of redirects a browser has in delivering malware. However, the relative position of to follow before receiving the final payload. Hence, for networks in the delivery chain may be used as an indi- syndicated Ads that delivered malware the path length cation of their relationship with the malware distribution is indicative of the number of syndication steps before sites – the deeper a network’s relative position the closer reaching the final Ad; in our case, the malware payload. it is related to the malware distribution site. Finally, in Figure  shows the distribution of the number of redi- the third category, indicated by network , our analysis V rects for syndicated Ads that delivered malware relative revealed that in almost 50% of all incidents, the advertis- to the other malicious landing URLs . The results are ing network is directly delivering malware. For example, quite telling: malware delivered via Ads exhibits longer pushes Ads that install malware in advertising network V 6 percent of all cases, more than 50% delivery chains, in the form of a browser toolbar. redirection steps were required before receiving the mal- ware payload. Clearly, it is increasingly difficult to main- tain trust along such long delivery chains. Finally we further elucidate this problem via an in- teresting example from our data corpus. The landing Inspecting the delivery trees that featured syndication page in our example refers to a Dutch radio station’s web reveals a total of  unique Ad networks participating site. The radio station in question was showing a ban- in these trees. We further studied the relative role of the ner advertisement from a German advertising site. Us- different networks by evaluating the frequency of appear- ing JavaScript, that advertiser redirected to a prominent ance of each Ad network in the malware delivery trees. advertiser in the US, which in turn redirected to yet an- Interestingly, our results show that five advertising net- other advertiser in the Netherlands. That advertiser redi- 75% of all malware deliv- works appear in approximately rected to another advertisement (also in the Netherlands) ery trees. Figure 7 shows the distribution of the relative that contained obfuscated JavaScript, which when un- position of each network in the malware delivery chains obfuscated, pointed to yet another JavaScript hosted in it participated in. The normalized position is calculated Austria. The final JavaScript was encrypted and redi- by dividing the index of the Ad network in each chain rected the browser via multiple IFRAME s to adxtnet.net , by the length of the chain. The graph shows that these an exploit site hosted in Austria. This resulted in the advertising networks split into three different categorie s: automatic installation of multiple Trojan Downloaders. , the ad- In the first category, which includes network I While it is unlikely that the initial advertising companies vertising network appears at the beginning of the deliv- were aware of the malware installations, each redirection ery chain. In the second category, which includes net- gave another party control over the content on the origi- , advertising networks appear frequently II-IV works nal web page—with predictable consequences. in the middle of the delivery chains. In both these cat- 17th USENIX Security Symposium USENIX Association 

9 1 6 Malware Distribution Infrastructure Landing Sites Distribution Sites In this section, we explore various properties of the host- 0.8 ing infrastructure for web malware. In particular, we ex- plore the size of of the malware distribution networks, 0.6 and examine the distribution of binaries hosted across sites. We argue that such analysis is important, as it sheds 0.4 light on the sophistication of the hosting infrastructures and the level of malfeasance we see today. As is the case Cumulative Fraction of Sites 0.2 with other recent malware studies (e.g., [, , 1]) we hope that this analysis will be of benefit to researchers 0 and practitioners alike. 0 150 100 50 200 /8 Prefix 1 Figure : The cumulative fraction of malware distribu- 8 IP prefix space. / tion sites over the 0.8 0.6 distribution servers and the landing sites linking to them. CDF Figure  shows that the malware distribution sites are 0.4 concentrated in a limited number of / prefixes. About 70% of the malware distribution sites have IP addresses 0.2 209. 58. within and -- 221. net- -- 61. * * * * [] observed work ranges. Interestingly, Anderson et al. 0 comparable IP space concentrations for the scam hosting 100 1 100000 10000 10 1000 infrastructure. The landing sites, however exhibit rela- Number of landing sites tively more IP space diversity; Roughly 0% of the land- Figure : CDF of the number of landing sites pointing to ing sites fell in the above ranges. a particular malware distribution site. 1 For the remaining discussion, recall that a malware t distribution network constitutes all the landing sites tha 0.8 point to a single distribution site. Using the methodol- ogy described in Section , we identified the distribution 0.6 networks associated with each malware distribution site. We first evaluate their size in terms of the total number of 0.4 landing sites that point to them. Figure  shows the dis- tribution of sizes for the different distribution networks . 0.2 The graph reveals two main types of malware distri- Cumulative Fraction of Distribution Sites bution networks: (1) networks that use only one landing 0 site, and () networks that have multiple landing sites. 0 300 250 200 150 500 100 450 400 50 350 AS rank As the graph shows, distribution networks can grow to have well over 1,000 landing sites pointing to them. Figure 10: The cumulative fraction of the malware dis- That said, roughly % of the detected malware distri- tribution sites across the different ASes. bution sites used only a single landing site at a time. We manually inspected some of these distribution sites and found that the vast majority were either subdomains on We further investigated the Autonomous System (AS) free hosting services, or short-lived domains that were locality of the malware distribution sites by mapping created in large numbers. It is likely, though not con- their IP addresses to the AS responsible for the longest firmed, that each of these sites used only a single landing matching prefixes for these IP addresses. We use the lat- site as a way to slip under the radar and avoid detection. est BGP snapshot from Routeviews [] to do the IP to AS mapping. Our results show that all the malware dis- Next, we examine the network location of the malware 17th USENIX Security Symposium  USENIX Association

10 tribution sites’ IP addresses fall into a relatively small s et Throughout our Malware hosting infrastructure. of ASes — only 00 as of this writing. Figure 10 shows measurement period we detected 9 , 430 malware distri- the cumulative fraction of these sites across the ASes bution sites. In 90% of the cases each site is hosted hosting them (sorted in descending order by the number 10% on a single IP address. The remaining sites are of sites in each AS). The graph further shows the highly hosted on IP addresses that host multiple malware distri- nonuniform concentration of the malware distribution bution sites. Our results show IP addresses that hosted up ASes. Finally, sites: 95% of these sites map to only 210 210 malware distribution sites. Closer inspection re- to the results of mapping the landing sites (not shown) pro- vealed that these addresses refer to public hosting servers of the sites falling in these 2 517 duced , ASes with 95% that allow users to create their own accounts. These ASes. 500 accounts appear as sub-folders of the the virtual host- ing server DNS name ( 512j. , 512j.com/akgy e.g., Lastly, the distribution of malware across domains ) or in many cases as com/alavin , 512j.com/anti also gives rise to some interesting insights. Figure 11 separate DNS aliases that resolve to the IP address of the shows the distribution of the number of unique mal- hosting server. We also observed several cases where the ware binaries (as inferred from MD hashes) down- hosting server is a public blog that allows users to have loaded from each malware distribution site. As the graph , their own pages ( e.g., mihanblog.com/abadan2 shows, approximately % of the distribution sites deliv- mihanblog.com/askbox ). ered a single malware binary. The remaining distribution - sites hosted multiple distinct binaries over their observa 1 tion period in our data, with % of the servers hosting more than 100 binaries. In many cases, we observed that the multiple payloads reflect deliberate obfuscation at- 0.8 tempts to evade detection. In what follows, we take a more in-depth look by studying the different forms of re- 0.6 lationships among the various distribution networks. CDF 0.4 1 0.2 0.8 0 0.7 0.1 0 0.6 0.5 0.3 0.2 1 0.9 0.8 0.4 0.6 Normalized Pairwise Intersection CDF Figure 1: CDF of the normalized pairwise intersection 0.4 between landing sites across distribution networks. 0.2 0 Overlapping landing sites. We further evaluate the 1000 10000 100 10 1 100000 overlap between the landing sites that point to the dif- Number of Unique Malware Binary Hashes ferent malware distribution sites. To do so, we calculate the pairwise intersection between the sets of the landing Figure 11: CDF of the number of unique binaries down- sites pointing to each of the distribution sites in our data loaded from each malware distribution site. with a set of landing set. For a distribution network i X j and network , with the set of landing sites sites X i j the normalized pairwise intersection of the two networks, 6.1 Relationships Among Networks , is calculated as, C i,j To gain a better perspective on the degree of connectiv- X | | X ∩ j i (1) C = i,j ity between the distribution networks, we investigate the | X | i common properties of the hosting infrastructure across X is the number of elements in the set | . In- X Where | the malware distribution sites. We also evaluate the de- of the distribu- 80% terestingly, our results showed that gree of overlap among the landing sites linking to the tion networks share at least one landing page. Figure 1 different malware distribution sites. 17th USENIX Security Symposium USENIX Association 10

11 shows the normalized pair-wise landing sets intersection ) targets a vulnerability in the browser or javascript across these distribution networks. The graph reveals a one of its plugins and takes control of the infected sys- strong overlap among the landing sites for the related net- tem, after which it retrieves and runs the malware ex- work pairs. These results suggest that many landing sites ecutable(s) downloaded from the malware distribution are shared among multiple distribution networks. For ex- site. Rather than inspecting the behavior of each phase ample, in several cases we observed landing pages with in isolation, our goal is to give an overview of the col- multiple IFRAME s linking to different malware distribu- lective changes that happen to the system state after vis- tion sites. Finally, we note that the sudden jump to a . Figure 1 shows the distribution URL iting a malicious pair-wise score of one is mostly due to network pairs in of the number of Windows executables downloaded af- which the landing sites for one network are a subset of as observed from monitor- ter visiting a malicious URL those for the other network. ing the interaction between the browser and the malware distribution site. As the graph shows, visiting malicious 1 URLs can lead to a large number of downloads ( on av- erage, but as large as 0 in the extreme case). 0.8 1 0.6 0.8 CDF 0.4 0.6 0.2 CDF 0.4 0 0.8 0.9 1 0.2 0.4 0.3 0.5 0.6 0.7 0 0.1 Normalized Pairwise Intersection 0.2 Figure 1: CDF of the normalized pairwise intersection 0 30 20 10 0 60 50 40 between malware hashes across distribution networks. Number of Downloaded Executables Figure 1: CDF of the number of downloaded executa- Content replication across malware distribution sites. URL bles as a result of visiting a malicious We finally evaluate the extent to which malware is repli- cated across the different distribution sites. To do so, Another noticeable outcome is the increase in the we use the same metric in Equation 1 to calculate the number of running processes on the virtual machine. normalized pairwise intersection of the set of malware This increase is associated with the automatic execution hashes served by each pair of distribution sites. Our re- , we collected the URL of binaries. For each landing 25% of the malware distribution sites, sults show that in number of processes that were started on the guest op- at least one binary is shared between a pair of sites. erating system after being infected with malware. Fig- While malware hashes exhibit frequent changes as a re- ure 1 shows the CDF of the number of processes sult of obfuscation, our results suggest that there is still a launched after the system is infected. As the graph shows g- level of content replication across the different sites. Fi visiting malicious produces a noticeable increase URLs ure 1 shows the normalized pair-wise intersection of the in the number of processes, in some cases, inducing so malware sets across these distribution networks. As the much overhead that they “crashed” the virtual machine. graph shows, binaries are less frequently shared between Additionally, we examine the type of registry changes distribution sites compared to landing sites, but taken as Overall, we that occur when the malware executes. a whole, there is still a non-trivial degree of similarity . of the detected registry changes after visiting 5% 57 among these networks. landing pages. We divide these changes into the fol- lowing categories: BHO indicates that the malware in- stalled a Browser Helper Object that can access privi- 7 Post Infection Impact leged state in the browser; Preferences means that the browser home page, default search engine or name server Recall that upon visiting a malicious , the browser URL Security indicates that where changed by the malware; downloads the initial exploit. The exploit (in most cases, 17th USENIX Security Symposium 11 USENIX Association

12 1 Connections % Protocol/Port 0.9 HTTP (0, 00) 7% 0.8 IRC (0-7001) .% 0.% FTP (1) 0.7 0.% UPnP (100) 0.6 0.7% Mail () 0.5 CDF .% Other 0.4 0.3 Table : Most frequently contacted ports directly by the 0.2 downloaded malware. 0.1 0 0 60 80 100 120 140 160 180 200 220 240 260 280 300 20 40 et al. Polychronakis [1]. Number of processes launched after infection Figure 1: CDF of the number of processes started after 7.1 Anti-virus engine detection rates visiting a malicious URL As we discussed earlier, web based malware uses a pull- delivery mechanism in which a victim is required based malware changed firewall settings or even disabled au- linking to to visit the malware hosting server or any URL indicates that the mal- Startup tomatic software updates; it in order to download the malware. This behavior puts ware is trying to persist across reboots. Notice that these forward a number of challenges to defense mechanisms categories are not mutually exclusive ( i.e., a single ma- malware signature generation schemes) mainly due ( e.g., may cause changes in multiple categories). URL licious to the inadequate coverage of the malware collection sys- Table  summarizes the percentage of registry changes tem. For example, unlike active scanning malware which per category. Notice that “Startup” changes are more delivery mechanism (and so sufficient push-based uses a prevalent indicating that malware tries to persist even af- placement of honeypot sensors can provide good cover- ter the machine is rebooted. age), the web is significantly more sparse and, therefore, more difficult to cover. Category Security Preferences BHO Startup In what follows, we evaluate the potential implications .% % URLs 1.7% .1% .% of the web malware delivery mechanism by measuring the detection rates of several well known anti-virus en- Table : Registry changes from drive-by downloads. gines. Specifically, we evaluate the detection rate of each malware anti-virus engine against the set of suspected samples collected by our infrastructure. Since we can not In addition to the registry changes, we analyzed the rely on anti-virus engines, we developed a heuristic to network activity of the virtual machine post infection. In detect these suspected binaries before subjecting them to our system, the virtual machines are allowed to perform the anti-virus scanners. For each inspected URL via our only DNS and HTTP connections. Table  shows the in-depth verification system we test whether visiting the percentage of connection attempts per destination port. URL caused the creation of at least one new process on Even though we omit the HTTP connections originat- the virtual machine. For the URL s that satisfy this condi- ing from the browser, HTTP is still the most prevalent  tion, we simply extract any binary download(s) from the port for malicious activity post-infection. This is due recorded HTTP response and “flag” them as suspicious. to “downloader” binaries that fetch, in some cases, up binaries over HTTP. We also observe a significant We applied the above methodology to identify suspi- to 60 percentage of connection attempts to typical IRC ports, cious binaries on a daily basis over a one month period of April, 007. We subject each binary for each of the of all non-HTTP connec- 50% accounting for more than tions. As a number of earlier studies have already shown anti-virus scanners using the latest virus definitions on that day. Then, for an anti-virus engine, the detection (e.g., [, 1, , 1, , 1]), the IRC connection attempts rate is simply the number of detected (flagged) samples are most likely for unwillingly (to the owner) adding the compromised machine to an IRC botnet, confirming the divided by the total number of suspicious malware in- earlier conjecture by Provos et al. [0] regarding the con- stances inspected on that day. Figure 1 illustrates the nection between web malware and botnets. More de- individual detection rates of each of the anti-virus en- tailed examples of malware’s behavior can be found in gines. The graph reveals that the detection capability of 17th USENIX Security Symposium USENIX Association 1

13 the anti-virus engines is lacking, with an average detec- false positives in the shared binaries, which is within the tion rate of 70% for the best engine. These results are bounds of our prediction. disturbing as they show that even the best anti-virus en- gines in the market (armed with their latest definitions) 8 Discussion fail to cover a significant fraction of web malware. Undoubtedly, the level of malfeasance on the Internet is a 100 AV I cause for concern. That said, while our work to date has AV II AV III shown that the prevalence of web-malware is indeed a 80 serious threat, the analysis herein says nothing about the number of visitors that become infected as a result of vis- 60 iting a malicious page. In particular, we note that since our goal is to survey the landscape, our infrastructure is 40 intentionally configured to be vulnerable to a wide range Detection Rate (%) of attacks; hopefully, savvy computer users who dili- 20 gently apply software updates would be far less vulnera- ble to infection. To be clear, while our analysis unequiv- 0 ocally shows that millions of users are exposed to ma- 35 40 0 5 10 25 20 15 30 licious content every day, without a wide-scale browser Days since April, 1st vulnerability study, the actual number of compromises Figure 1: Detection rates of  anti-virus engines. remains unknown. Nonetheless, we believe the perva- sive nature of the results in this study elucidates the state of the malware problem today, and hopefully, serves to educate both users, web masters and other researchers False Positives. Notice that the above strategy may about the security challenges ahead. falsely classify benign binaries as malicious. To eval- Lastly, we note that several outlets exists for taking uate the false positives, we use the following heuristic: advantage of the results of our infrastructure. For in- we optimistically assume that all suspicious binaries will stance, the data that Google uses to flag search results eventually be discovered by the anti-virus vendors. Us- is freely available through the Safe Browsing API [], as ing the set of suspicious binaries collected over a month well as via the Safe Browsing diagnostic page []. We historic period, we re-scan all undetected binaries two hope these services prove to be of benefit to the greater months later (in July, 007) using the latest virus defini- community at large. tions. Then, all undetected binaries from the rescanning step are considered false positives. Overall, our results show that the earlier analysis is fairly accurate with false 9 Related Work positive rates of less than 10%. We further investigated a number of binaries identified as false positives and found Virtual machines have been used as honeypots for de- that a number of popular installers exhibit a behavior tecting unknown attacks by several researchers [, 1, similar to that of drive-by downloads, where the installer 17, , ]. Although, honeypots have traditionally been process first runs and then downloads the associated soft- used mostly for detecting attacks against servers, the ware package. To minimize the impact of false positives, same principles also apply to client honeypots (e.g., an we created a white-list of all known benign downloads, instrumented browser running on a virtual machine). For and all binaries in the white-list are exempted from the used client-side techniques et al. example, Moshchuk analysis in this paper. to study spyware on the web (by crawling 1 million Of course, we are being overly conservative here as URLs in May 00 [17]). Their primary focus was not on our heuristic does not account for binaries that are never detecting drive-by downloads, but in finding links to ex- detected by any anti-virus engine. However, for our ecutables labeled spyware by an adware scanner. Addi- goals, this method produces an upper bound for the re- for drive-by down- tionally, they sampled 45  000 URLs sulting false positives. As an additional benchmark we over time. However, the decrease loads and showed a asked for direct feedback from anti-virus vendors about fundamental limitation of analyzing the malicious nature the accuracy of the undetected binaries that we (now) URLs of discovered by “spidering” is that a crawl can 6% share with them. On average, they reported about only follow content links, whereas the malicious nature 17th USENIX Security Symposium 1 USENIX Association

14 is significant. For instance, we find that 1 . 3% of the in- of a page is often determined by the web hosting infras- coming search queries to Google’s search engine return tructure. As such, while the study of Moshchuk et al. provides valuable insights, a truly comprehensive analy- at least one link to a malicious site. sis of this problem requires a much more in-depth crawl Moreover, our analysis reveals several forms of rela- of the web. As we were able to analyze many billions of tions between some distribution sites and networks. A URLs , we believe our findings are more representative more troubling concern is the extent to which users may of the state of the overall problem. be lured into the malware distribution networks by con- tent served through online Ads. For the most part, the More closely related is the work of Provos [0] et al. g syndication relations that implicitly exist in advertisin [] which raised awareness of the et al. and Seifert networks are being abused to deliver malware through threat posed by drive-by downloads. These works are Ads. Lastly, we show that merely avoiding the dark aimed at explaining how different web page compo- corners of the Internet does not limit exposure to mal- nents are used to exploit web browsers, and provides an ware. Unfortunately, we also find that even state-of-the- overview of the different exploitation techniques in use ct art anti-virus engines are lacking in their ability to prote proposed an approach for detecting today. Wang et al. against drive-by downloads. While this is to be expected, exploits against Windows XP when visiting webpages in it does call for more elaborate defense mechanisms to Internet Explorer []. Their approach is capable of de- curtail this rapidly increasing threat. tecting zero-day exploits against Windows and can de- termine which vulnerability is being exploited by expos- ing Windows systems with different patch levels to dan- Acknowledgments 17 , 000 URL s, gerous URL s. Their results, on roughly showed that about 200 of these were dangerous to users. We would like to thank Oliver Fisher, Dean Mc- This paper differs from all of these works in that it of- Namee, Mark Palatucci and Ke Wang for their help with Google’s malware detection infrastructure. This work fers a far more comprehensive analysis of the different was funded in part by NSF grants CNS-0711 and aspects of the problem posed by web-based malware, in- CNS-00. cluding an examination of its prevalence, the structure of the distribution networks, and the major driving forces. Lastly, malware detection via dynamic tainting analy- References sis may provide deeper insight into the mechanisms by [1] The open directory project. See http://www.news. which malware installs itself and how it operates [10, 1, . com/2100-1023-877568.html 7]. In this work, we are more interested in structural [] Safe Browsing API, June 007. See http://code. properties of the distribution sites themselves, and how google.com/apis/safebrowsing/ . malware behaves once it has been implanted. Therefore, [] Safe Browsing diagnostic page, May 00. See we do not employ tainting because of its computational http://www.google.com/safebrowsing/ expense, and instead, simply collect changes made by the . diagnostic?site=yoursite.com malware that do not require having the ability to trace the , IDIROGLOU , S., A NAGNOSTAKIS [] A , K. G., S KRITIDIS information flow in detail. , P., X INIDIS , K., M ARKATOS , E., AND K EROMYTIS A. D. Detecting Targeted Attacks Using Shadow Hon- eypots. 10 Conclusion , S., AVAGE , C., S LEIZACH , D. S., F NDERSON [] A AND URL The fact that malicious s that initiate drive-by down- OELKER , G. M. Spamscatter: Characterizing Inter- V loads are spread far and wide raises concerns regarding net Scam Hosting Infrastructure. In Proceedings of the USENIX Security Symposium (August 007). the safety of browsing the Web. However, to date, little is known about the specifics of this increasingly common ARFORD An Inside Look [] B AGNESWARAN , P., AND Y , V. malware distribution technique. In this work, we attempt at Botnets . Advances in Information Security. Springer, 007. to fill in the gaps about this growing phenomenon by pro- viding a comprehensive look at the problem from several , N., [7] B EM HAZEER ARIK , G., L EVENBERG , J., S , J., H perspectives. Our study uses a large scale data collection ONG AND T , S. Large scale machine learning and meth- ods. US Patent: 717. infrastructure that continuously detects and monitors the behavior of websites that perpetrate drive-by downloads. AHANIAN , E., J OOKE P , D. The HERSON M AND , F., [] C C Our in-depth analysis of over 66 million URL s (spanning Zombie Roundup: Understanding, Detecting, and Dis- turbing Botnets. In Proceedings of the first Workshop on month period) reveals that the scope of the problem 10 a 17th USENIX Security Symposium USENIX Association 1

15 [1] R , M. A., Z AND , F., ONROSE , J., M ARFOSS AJAB (July Steps to Reducing Unwanted Traffic on the Internet , A. A Multifaceted Approach to Understand- ERZIS T 00). Proceedings of ACM ing the Botnet Phenomenon. In AND G , J., HEMAWAT , S. Mapreduce: Simplified [] D EAN SIGCOMM/USENIX Internet Measurement Conference data processing on large clusters. In Proceedings of the (Oct., 00), pp. 1–. (IMC) Sixth Symposium on Operating System Design and Imple- , A., F AND , N., EAMSTER AGON AMACHANDRAN [] R D , (Dec 00), pp. 17–10. mentation D. Revealing Botnet Membership using DNSBL , E., Y GELE RUEGEL , C., K IRDA , M., K IN , H., AND [10] E nd Counter-Intelligence. In Proceedings of the 2 Work- Proceedings of , D. Dynamic Spyware Analysis. In ONG S shop on Steps to Reducing Unwanted Traffic on the Inter- (June 007). the USENIX Annual Technical Conference net (SRUTI) (July 00). [11] F S , V., P AXSON , J., P RANKLIN ERRIG AND , A., , AVAGE [] The Route Views Project. http://www.antc. S. An Inquiry into the Nature and Causes of the Wealth of . uoregon.edu/route-views/ Internet Miscreants. In Proceedings of the ACM Confer- ING , Y., [] S AND TEENSON , C., S EIFERT , R., H OLZ , T., B ence on Computer and Communications Security (CCS) , M. A. D AVIS Know Your Enemy: Malicious Web (October 007). Servers. http://www.honeynet.org/papers/ , V., F ONG , M., [1] G ORRAS , G., P U , P., Y EGNESWARAN , August 007. mws/ , W. BotHunter: Detecting Malware Infection EE L AND , R., [] W ECK OUSSEV , X., R IANG , D., J , Y.-M., B ANG Proceedings of through IDS-driven Dialog Correlation. In , S. Automated ING K AND , S., HEN , C., C ERBOWSKI V t (007), pp. 17– the h USENIX Security Symposium 16 web patrol with strider honeymonkeys. In Proceedings 1. of Network and Distributed Systems Security Symposium [1] M ODADUGU N. Web Server Soft- , (00), pp. –. ware See 007. June Malware, and , D., ECK , H., B HEN , Y., C IU , Y.-M., N [] W ANG http://googleonlinesecurity. , X., R IANG J , HEN , C., C ERBOWSKI , R., V OUSSEV blogspot.com/2007/06/ AND Strider honeymonkeys: Ac- S., , S. K ING web-server-software-and-malware.html . tive, client-side honeypots for finding malicious web- AVAGE [1] M OORE , D., V OELKER , G. M., AND , S. Infer- S http://research.microsoft.com/ sites. See Proceedings ring Internet Denial of Service Activity. In users/shuochen/HM.PDF . th 10 of (Aug. 001). USENIX Security Symposium , C., , M., K GELE , D., E ONG , H., S IN [7] Y AND RUEGEL K IRDA , E. Panorama: Capturing System-wide Informa- [1] M K AND , C., RUEGEL , E. Exploring IRDA , A., K OSER Pro- tion Flow for Malware Detection and Analysis. In Multiple Execution Paths for Malware Analysis. In Pro- ceedings of the 14th ACM Conference of Computer and ceedings of the 2007 IEEE Symposium on Security and (October 007). Communication Security Privacy (May 007). , A., B OSHCHUK , T., D RAGIN [1] M EVILLE , D., G RIBBLE , L EVY , H. SpyProxy: Execution-based Detection S., AND Notes of Malicious Web Content. 1 Some compromised web servers also trigger dialog windows as k- RAGIN , A., B OSHCHUK [17] M AND , S., RIBBLE , T., G s anal- ing users to manually download and run malware. However, thi , H. A crawler-based study of spyware in the web. EVY L ysis considers only malware installs that require no user in teraction.  Proceedings of Network and Distributed Systems Secu- In This mapping is readily available at Google.  (00). rity Symposium t corre- We consider a version as outdated if it is older than the lates or our data sponding version released by January, 007 (the start date f [1] P OLYCHRONAKIS , M., M AVROMMATIS , P., AND collection).  P ROVOS , N. Ghost Turns Zombie: Exploring the Life We restrict our analysis to Windows executables identified b y Cycle of Web-based Malware. In Proceedings of the 1st searching for PE headers in each payload. USENIX Workshop on Large-Scale Exploits and Emer- (April 00). gent Threats (LEET) , H., ROJECT , R. Know your enemy: AND A LLIANCE [1] P http://www. Tracking Botnets, March 00. See honeynet.org/papers/bots/ . AVROMMATIS [0] P ROVOS , N., M C N AMEE , D., M , P., ANG , K., AND M ODADUGU , N. The Ghost in the W Browser: Analysis of Web-based Malware. In Proceed- ings of the first USENIX workshop on hot topics in Bot- nets (HotBots’07). (April 007). 17th USENIX Security Symposium 1 USENIX Association

Related documents

CHURCH HISTORY IN THE FULNESS OF TIMES Student Manual

CHURCH HISTORY IN THE FULNESS OF TIMES Student Manual

HURCH C ISTORY H HURCH C H ISTORY IN THE ULNESS F IN THE ULNESS F OF T IMES OF IMES T S tudent M anual S anual M tudent RELIGION 341 THROUGH 343

More info »
best performing cities report 2018

best performing cities report 2018

JANUARY 2019 BEST-PERFORMING CITIES WHERE AMERICA’S JOBS 2018 ARE CREATED AND SUSTAINED JESSICA JACKSON, JOE LEE, MICHAEL C.Y. LIN, AND MINOLI RATNATUNGA

More info »
CDIR 2018 07 27

CDIR 2018 07 27

S. Pub. 115-7 2017-2018 Official Congressional Directory 115th Congress Convened January 3, 2017 JOINT COMMITTEE ON PRINTING UNITED STATES CONGRESS UNITED STATES GOVERNMENT PUBLISHING OFFICE WASHINGTO...

More info »
Draft Environmental Impact Statement for the Safer Affordable Fuel Efficient (SAFE) Vehicles Rule for Model Year 2021 2026 Passenger Cars and Light Trucks

Draft Environmental Impact Statement for the Safer Affordable Fuel Efficient (SAFE) Vehicles Rule for Model Year 2021 2026 Passenger Cars and Light Trucks

The Safer Affordable Fuel-Efficient (SAFE) Vehicles Rule for Model Year 2021–2026 Passenger Cars and Light Trucks Draft Environmental Impact Statement July 2018 Docket No. NHTSA-2017-0069

More info »
7340.2H  Bsc dtd 3 29 18

7340.2H Bsc dtd 3 29 18

ORDER JO 7340.2H Air Traffic Organization Policy Effective Date: March 29, 2018 Contractions SUBJ: contractions used by ed word and phrase This handbook contains the approv personnel of the Federal Av...

More info »
Microsoft Word   October 2018 E 17 7 department codes

Microsoft Word October 2018 E 17 7 department codes

E.University Financial Services E-17-7 University of Kentucky Department s – Sorted by Department Number President OFFICE OF THE PRESIDENT 10000 10200 UNIVERSITY SENATE COUNCIL President President 107...

More info »
JO 7400.11C   Airspace Designations and Reporting Points

JO 7400.11C Airspace Designations and Reporting Points

U.S. DEPARTMENT OF TRANSPORTATION ORDER FEDERAL AVIATION ADMINISTRATION 7400.11C JO Air Traffic Organization Policy August 13, 2018 SUBJ: Airspace Designations and Reporting Points . This O rder, publ...

More info »
Microsoft Word   appdx.doc

Microsoft Word appdx.doc

Appendix – Cost Estimates Cost Assumptions All costs are in 2005 dollars Cost Per Horse Power ted with the power routine in The required horsepower was calcula the spreadsheet. It accounts for static ...

More info »
Microsoft Word   PAH Cap Needs Main Report 11 24 10 REV Final.doc

Microsoft Word PAH Cap Needs Main Report 11 24 10 REV Final.doc

Capital Needs in the Public Housing Program Contract # C-DEN-02277 -TO001 Revised Final Report November 24, 2010 Prepared for U.S. Department of Housing and Urban Development 451 7th Street S.W. Washi...

More info »
LA CateringMenuNoPrices18

LA CateringMenuNoPrices18

little america hotel catering menu General Information The following information is designed to assist you in planning your function with Little America Hotel. Please consult with your Catering or Con...

More info »
Major Academic Plan   BA in History

Major Academic Plan BA in History

BA in History (734120) MAP Sheet Family Home and Social Sciences, History For students entering the degree program during the 2018-2019 curricular year. University Core and Graduation Requirements Sug...

More info »
provos

provos

The Ghost In The Browser Analysis of Web-based Malware Niels Provos, Dean McNamee, Panayiotis Mavrommatis, Ke Wang and Nagendra Modadugu Google, Inc. @google.com niels, deanm, panayiotis, kewang, ngm ...

More info »
2019 County Directory  working

2019 County Directory working

Livingston County Directory 2019 Geneseo, New York www.livingstoncounty.us

More info »
Microsoft Word   FINAL Lumina Case Studies FINAL.docx

Microsoft Word FINAL Lumina Case Studies FINAL.docx

Leading in Tough imes T Workbook Case Studies for Higher Education Leaders By Brent D. Ruben and Susan Jurow

More info »
Trial Brief of Dennis Gay

Trial Brief of Dennis Gay

UNITED STATES OF AMERICA BEFORE FEDERAL TRADE COMMISSION OFFICE OF ADMINISTRATIVE LAW JUDGES In the Matter of ) ) BASIC RESEARCH, L.L.C., ) A.G. WATERHOUSE, L.L.C., ) KLEIN-BECKER USA, L.L.C., ) NUTRA...

More info »
aldine.indd

aldine.indd

Aldine Press Books at the Harry Ransom Humanities Research Center The University of Texas at Austin A Descriptive Catalogue by Craig W. Kallendorf ‡ Maria X. Wells Austin Harry Ransom Humanities Resea...

More info »
At Home in the W  national.indd

At Home in the W national.indd

At Home in the Woods Lessons Learned in the Wildland/Urban Interface

More info »
Citi GPS Technology Work 2

Citi GPS Technology Work 2

v2.0 TECHNOLOGY AT WORK The Future Is Not What It Used to Be Citi GPS: Global Perspectives & Solutions January 2016 Citi is one of the world’s largest financial institutions, operating in all major es...

More info »
stelprdb5370798

stelprdb5370798

anti-La Sal National Forest M VISITOR GUIDE Modern Get-away Ancient Lands Dark Canyon Wilderness La Sal Pass Maple Canyon (© Jason Stevens) he deep sandstone canyons, mountaintops, meadows, lakes T Wh...

More info »
Netnogrpahy

Netnogrpahy

1 ROBERT V. KOZINETS* The author develops “netnography” as an online marketing research technique for providing consumer insight. “Netnography” is ethnography adapted to the study of online communi ti...

More info »