1 Online Tracking: A 1-million-site Measurement and Analysis Steven Englehardt Arvind Narayanan Princeton University Princeton University [email protected] [email protected] This is an extended version of our paper that appeared at ACM CCS 2016. to resort to a stripped-down browser  (a limitation we ABSTRACT explore in detail in Section 3.3). (2) We provide compre- We present the largest and most detailed measurement of hensive instrumentation by expanding on the rich browser online tracking conducted to date, based on a crawl of the extension instrumentation of FourthParty , without re- top 1 million websites. We make 15 types of measurements quiring the researcher to write their own automation code. on each site, including stateful (cookie-based) and stateless (3) We reduce duplication of work by providing a modular (fingerprinting-based) tracking, the effect of browser privacy architecture to enable code re-use between studies. tools, and the exchange of tracking data between different Solving these problems is hard because the web is not de- sites (“cookie syncing”). Our findings include multiple so- 2 signed for automation or instrumentation. Selenium, the phisticated fingerprinting techniques never before measured main tool for automated browsing through a full-fledged in the wild. browser, is intended for developers to test their web- own This measurement is made possible by our open-source sites. As a result it performs poorly on websites not con- 1 , which uses an web privacy measurement tool, OpenWPM trolled by the user and breaks frequently if used for large- automated version of a full-fledged consumer browser. It scale measurements. Browsers themselves tend to suffer supports parallelism for speed and scale, automatic recovery memory leaks over long sessions. In addition, instrument- from failures of the underlying browser, and comprehensive ing the browser to collect a variety of data for later analy- browser instrumentation. We demonstrate our platform’s sis presents formidable challenges. For full coverage, we’ve strength in enabling researchers to rapidly detect, quantify, found it necessary to have three separate measurement points: and characterize emerging online tracking behaviors. a network proxy, a browser extension, and a disk state mon- itor. Further, we must link data collected from these dis- 1. INTRODUCTION parate points into a uniform schema, duplicating much of the browser’s own internal logic in parsing traffic. Web privacy measurement — observing websites and ser- vices to detect, characterize and quantify privacy-impacting A large-scale view of web tracking and privacy. behaviors — has repeatedly forced companies to improve In this paper we report results from a January 2016 mea- their privacy practices due to public pressure, press cov- surement of the top 1 million sites (Section 4). Our scale erage, and regulatory action [5, 15]. On the other hand, enables a variety of new insights. We observe for the first web privacy measurement presents formidable engineering time that online tracking has a “long tail”, but we find a and methodological challenges. In the absence of a generic surprisingly quick drop-off in the scale of individual track- tool, it has been largely confined to a niche community of ers: trackers in the tail are found on very few sites (Sec- researchers. tion 5.1). Using a new metric for quantifying tracking (Sec- We seek to transform web privacy measurement into a tion 5.2), we find that the tracking-protection tool Ghostery widespread practice by creating a tool that is useful not just (https://www.ghostery.com/) is effective, with some caveats to our colleagues but also to regulators, self-regulators, the (Section 5.5). We quantify the impact of trackers and third press, activists, and website operators, who are often in the parties on HTTPS deployment (Section 5.3) and show that dark about third-party tracking on their own domains. We cookie syncing is pervasive (Section 5.6). also seek to lessen the burden of continual oversight of web Turning to browser fingerprinting, we revisit an influential tracking and privacy, by developing a robust and modular 2014 study on canvas fingerprinting  with updated and im- platform for repeated studies. proved methodology (Section 6.1). Next, we report on sev- OpenWPM (Section 3) solves three key systems challenges eral types of fingerprinting never before measured at scale: faced by the web privacy measurement community. It does font fingerprinting using canvas (which is distinct from can- so by building on the strengths of past work, while avoiding vas fingerprinting; Section 6.2), and fingerprinting by abus- the pitfalls made apparent in previous engineering efforts. ing the WebRTC API (Section 6.3), the Audio API (Section (1) We achieve scale through parallelism and robustness by 6.4), and the Battery Status API (6.5). Finally, we show utilizing isolated measurement processes similar to FPDetec- that in contrast to our results in Section 5.5, existing pri- tive’s platform , while still supporting stateful measure- effective at detecting these newer and not vacy tools are ments. We’re able to scale to 1 million sites, without having more obscure fingerprinting techniques. 1 2 https://github.com/citp/OpenWPM http://www.seleniumhq.org/
9 HTTPS w\ Passive 5.2 Prominence: a third party ranking metric HTTP HTTPS Mixed Content In Section 5.1 we ranked third parties by the number of first party sites they appear on. This simple count is a good Firefox 47 first approximation, but it has two related drawbacks. A ma- jor third party that’s present on (say) 90 of the top 100 sites Chrome 47 would have a low score if its prevalence drops off outside the top 100 sites. A related problem is that the rank can be sen- Figure 5: Secure connection UI for Firefox Nightly 47 and Chrome 47. Clicking on the lock icon in Firefox reveals sitive to the number of websites visited in the measurement. the text “Connection is not secure” when mixed content is Thus different studies may rank third parties differently. present. We also lack a good way to compare third parties (and especially trackers) over time, both individually and in ag- 55K Sites 1M Sites gregate. Some studies have measured the total number of HTTP Only 82.9% X cookies , but we argue that this is a misleading metric, 8.6% HTTPS Only 14.2% since cookies may not have anything to do with tracking. 2.9% X HTTPS Opt. To avoid these problems, we propose a principled met- ric. We start from a model of aggregate browsing behavior. Table 3: First party HTTPS support on the top 55K and There is some research suggesting that the website traffic fol- top 1M sites. “HTTP Only” is defined as sites which fail lows a power law distribution, with the frequency of visits to upgrade when HTTPS Everywhere is enabled. ‘HTTPS th 1 [3, 22]. to the ranked website being proportional to N N Only” are sites which always redirect to HTTPS. “HTTPS The exact relationship is not important to us; any formula Optional” are sites which provide an option to upgrade, for traffic can be plugged into our prominence metric below. but only do so when HTTPS Everywhere is enabled. We carried out HTTPS-everywhere-enabled measurement for Definition: . 1 only 55,000 sites, hence the X’s. t ) = Σ ( Prominence s,t )=1 edge ( ) s ( rank where is present ) indicates whether third party s, t ( edge t 5.3 Third parties impede HTTPS adoption on site s . This simple formula measures the frequency with which an “average” user browsing according to the power-law Table 3 shows the number of first-party sites that sup- model will encounter any given third party. port HTTPS and the number that are HTTPS-only. Our The most important property of prominence is that it results reveal that HTTPS adoption remains rather low de- de-emphasizes obscure sites, and hence can be adequately spite well-publicized efforts . Publishers have claimed approximated by relatively small-scale measurements, as shown that a major roadblock to adoption is the need to move in Figure 4. We propose that prominence is the right metric all embedded third parties and trackers to HTTPS to avoid for: mixed-content errors [57, 64]. Mixed-content errors occur when HTTP sub-resources are 1. Comparing third parties and identifying the top third loaded on a secure site. This poses a security problem, lead- parties. We present the list of top third parties by promi- ing to browsers to block the resource load or warn the user nence in Table 14 in the Appendix. Prominence rank- depending on the content loaded . mixed con- Passive ing produces interesting differences compared to rank- tent, that is, non-executable resources loaded over HTTP, ing by a simple prevalence count. For example, Content- cause the browser to display an insecure warning to the user Distribution Networks become less prominent compared mixed content is a far Active but still load the content. to other types of third parties. more serious security vulnerability and is blocked outright 2. Measuring the effect of tracking-protection tools, as we by modern browsers; it is not reflected in our measurements. do in Section 5.5. Third-party support for HTTPS. To test the hypoth- 3. Analyzing the evolution of the tracking ecosystem over esis that third parties impede HTTPS adoption, we first time and comparing between studies. The robustness of characterize the HTTPS support of each third party. If a the (Figure 4) makes it ideally rank-prominence curve third party appears on at least 10 sites and is loaded over suited for these purposes. HTTPS on all of them, we say that it is HTTPS-only. If it is loaded over HTTPS on some but not all of the sites, we say that it supports HTTPS. If it is loaded over HTTP 1 10 on all of them, we say that it is HTTP-only. If it appears 1K-site measurement on less than 10 sites, we do not have enough confidence to 0 10 50K-site measurement make a determination. 1M-site measurement − 1 10 Table 4 summarizes the HTTPS support of third party domains. A large number of third-party domains are HTTP- − 2 10 only (54%). However, when we weight third parties by Prominence (log) − 3 prominence, only 5% are HTTP-only. In contrast, 94% of 10 prominence-weighted third parties support both HTTP and 600 0 800 400 1000 200 HTTPS. This supports our thesis that consolidation of the Rank of third-party third-party ecosystem is a plus for security and privacy. Figure 4: Prominence of third party as a function of promi- Impact of third-parties. We find that a significant nence rank. We posit that the curve for the 1M-site mea- fraction of HTTP-default sites (26%) embed resources from surement (which can be approximated by a 50k-site mea- third-parties which do not support HTTPS. These sites would surement) presents a useful aggregate picture of tracking. be unable to upgrade to HTTPS without browsers display-
10 50 Prominence Tracker HTTPS Support Percent weighted % 40 Non-Tracker HTTP Only 54% 5% 30 1% HTTPS Only 5% 20 94% 41% Both 10 Table 4: Third party HTTPS support. “HTTP Only” is 0 defined as domains from which resources are only requested arts over HTTP across all sites on our 1M site measurement. news adult home sports health games society science regional business ‘HTTPS Only” are domains from which resources are shopping reference average recreation computers only requested over HTTPS. “Both” are domains which kids and teens have resources requested over both HTTP and HTTPS. Results are limited to third parties embedded on at least Figure 6: Average # of third parties in each Alexa category. 10 first-party sites. 5.5 Does tracking protection work? Top 1M Top 55k Users have two main ways to reduce their exposure to % FP % FP Class tracking: the browser’s built in privacy features and exten- Own 25.4% 24.9% sions such as Ghostery or uBlock Origin. 2.6% 2.1% Favicon Contrary to previous work questioning the effectiveness Tracking 10.4% 20.1% of Firefox’s third-party cookie blocking , we do find the 2.6% CDN 1.6% feature to be effective. Specifically, only 237 sites (0.4%) Non-tracking 44.9% 35.4% have any third-party cookies set during our measurement 15.6% 6.3% Multiple causes set to block all third-party cookies (“Block TP Cookies” in Table 2). Most of these are for benign reasons, such as redi- Table 5: A breakdown of causes of passive mixed-content recting to the U.S. version of a non-U.S. site. We did find ex- warnings on the top 1M sites and on the top 55k sites. ceptions, including 32 that contained ID cookies. For exam- “Non-tracking” represents third-party content not classified ple, there are six Australian news sites that first redirect to as a tracker or a CDN. news.com.au before re-directing back to the initial domain, which seems to be for tracking purposes. While this type of ing mixed content errors to their users, the majority of which workaround to third-party cookie blocking is not rampant, (92%) would contain active content which would be blocked. we suggest that browser vendors should closely monitor it Similarly, of the approximately 78,000 first-party sites that and make changes to the blocking heuristic if necessary. are HTTPS-only, around 6,000 (7.75%) load with mixed pas- Another interesting finding is that when third-party cookie sive content warnings. However, only 11% of these warnings blocking was enabled, the average number of third parties (around 650) are caused by HTTP-only third parties, sug- per site dropped from 17.7 to 12.6. Our working hypothesis gesting that many domains may be able to mitigate these for this drop is that deprived of ID cookies, third parties cur- warnings by ensuring all resources are being loaded over tail certain tracking-related requests such as cookie syncing HTTPS when available. We examined the causes of mixed (which we examine in Section 5.6). content on these sites, summarized in Table 5. The major- ity are caused by third parties, rather than the site’s own 0 . 1 content, with a surprising 27% caused solely by trackers. 0 8 . . 0 6 5.4 News sites have the most trackers 4 . 0 The level of tracking on different categories of websites 0 2 . varies considerably — by almost an order of magnitude. To 0 . 0 measure variation across categories, we used Alexa’s lists of Fraction of TP Blocked 4 − − 3 0 − 2 1 − 10 10 10 10 10 top 500 sites in each of 16 categories. From each list we Prominence of Third-party (log) sampled 100 sites (the lists contain some URLs that are not home pages, and we excluded those before sampling). Figure 7: Fraction of third parties blocked by Ghostery as In Figure 6 we show the average number of third parties a function of the prominence of the third party. As defined loaded across 100 of the top sites in each Alexa category. earlier, a third party’s prominence is the sum of the inverse ranks of the sites it appears on. Third parties are classified as trackers if they would have We also tested Ghostery, and found that it is effective at been blocked by one of the tracking protection lists (Sec- reducing the number of third parties and ID cookies (Fig- tion 4). ure 11 in the Appendix). The average number of third-party Why is there so much variation? With the exception of includes went down from 17.7 to 3.3, of which just 0.3 had the adult category, the sites on the low end of the spectrum third-party cookies (0.1 with IDs). We examined the promi- are mostly sites which belong to government organizations, nent third parties that are not blocked and found almost all universities, and non-profit entities. This suggests that web- of these to be content-delivery networks like cloudflare.com sites may be able to forgo advertising and tracking due to the or widgets like maps.google.com, which Ghostery does not presence of funding sources external to the web. Sites on the try to block. So Ghostery works well at achieving its stated high end of the spectrum are largely those which provide ed- objectives. itorial content. Since many of these sites provide articles for However, the tool is less effective for obscure trackers free, and lack an external funding source, they are pressured (prominence < 0 . 1). In Section 6.6, we show that less promi- to monetize page views with significantly more advertising.
13 top 10 scripts accounted for 83% of usage, in line with our Destination GainAnalyser Oscillator other observations about the small number of third parties responsible for most tracking. We provide a list of scripts in FFT =0 Table 13 in the Appendix. The number of confirmed non-tracking uses of unsolicited Triangle Wave IP candidate discovery is small, and based on our analysis, none of them is critical to the application. These results eb8a30ad7... [-121.36, -121.19, ...] SHA1( ) have implications for the ongoing debate on whether or not unsolicited WebRTC IP discovery should be private by de- Dynamics fault [59, 8, 58]. Oscillator Destination Compressor Classification # Scripts # First-parties Bu er ff 625 (88.7%) 57 Tracking 40 (5.7%) 10 Non-Tracking Sine Wave 40 (5.7%) Unknown 32 [33.234, 34.568, ...] ad60be2e8... MD5( ) Table 8: Summary of WebRTC local IP discovery on the top 1 million Alexa sites. Figure 8: AudioContext node configuration used to gen- Used by www.cdn-net.com/cc.js erate a fingerprint. Top: 6.4 AudioContext Fingerprinting in an Used by client.a.pxi. Bottom: . AudioContext The scale of our data gives us a new way to systemati- pub/*/main.min.js and js.ad-score.com/score.min.js in an cally identify new types of fingerprinting not previously re- . Full details in Appendix 12. OfflineAudioContext ported in the literature. The key insight is that fingerprint- ing techniques typically aren’t used in isolation but rather -80 Chrome Linux 47.0.2526.106 in conjunction with each other. So we monitor known track- -100 Firefox Linux 41.0.2 ing scripts and look for unusual behavior (e.g., use of new -120 Firefox Linux 44.0b2 APIs) in a semi-automated fashion. Using this approach we -140 dB AudioContext found several fingerprinting scripts utilizing -160 and related interfaces. -180 15 In the simplest case, a script from the company Liverail -200 Oscilla- and AudioContext checks for the existence of an -220 torNode to add a single bit of information to a broader fin- 700 800 850 750 1000 950 900 gerprint. More sophisticated scripts process an audio signal Frequency Bin Number OscillatorNode to fingerprint the device. generated with an Figure Oscilla- Visualization of processed 9: This is conceptually similar to canvas fingerprinting: audio torNode from the fingerprinting script output signals processed on different machines or browsers may have https://www.cdn-net.com/cc.js for three different browsers slight differences due to hardware or software differences be- on the same machine. We found these values to remain tween the machines, while the same combination of machine constant for each browser after several checks. and browser will produce the same output. Figure 8 shows two audio fingerprinting configurations found in three scripts. The top configuration utilizes an for the current battery level or charging status of a host AnalyserNode to extract an FFT to build the fingerprint. device. Olejnik et al. provide evidence that the Battery Both configurations process an audio signal from an Oscil- API can be used for tracking . The authors show how latorNode before reading the resulting signal and hashing the battery charge level and discharge time have a sufficient it to create a device audio fingerprint. Full configuration number of states and lifespan to be used as a short-term details are in Appendix Section 12. identifier. These status readouts can help identify users who We created a demonstration page based on the scripts, take action to protect their privacy while already on a site. which attracted visitors with 18,500 distinct cookies as of For example, the readout may remain constant when a user this submission. These 18,500 devices hashed to a total of clears cookies, switches to private browsing mode, or opens 713 different fingerprints. We estimate the entropy of the fin- a new browser before re-visiting the site. We discovered two gerprint at 5.4 bits based on our sample. We leave a full eval- fingerprinting scripts utilizing the API during our manual uation of the effectiveness of the technique to future work. analysis of other fingerprinting techniques. We find that this technique is very infrequently used as heartbeat.js, re- One script, https://go.lynxbroker.de/eat of March 2016. The most popular script is from Liverail, trieves the current charge level of the host device and com- present on 512 sites. Other scripts were present on as few bines it with several other identifying features. These fea- as 6 sites. This shows that even with very low usage rates, tures include the canvas fingerprint and the user’s local IP we can successfully bootstrap off of currently known finger- address retrieved with WebRTC as described in Section 6.1 printing scripts to discover and measure new techniques. and Section 6.3. The second script, http://js.ad-score.com/ BatteryManager score.min.js, queries all properties of the 6.5 Battery API Fingerprinting interface, retrieving the current charging status, the charge As a second example of bootstrapping, we analyze the level, and the time remaining to discharge or recharge. As Battery Status API, which allows a site to query the browser with the previous script, these features are combined with 15 https://www.liverail.com/ other identifying features used to fingerprint a device.
14 6.6 The wild west of fingerprinting scripts 7. CONCLUSION AND FUTURE WORK In Section 5.5 we found the various tracking protection Web privacy measurement has the potential to play a key measures to be very effective at reducing third-party track- role in keeping online privacy incursions and power imbal- ing. In Table 9 we show how blocking tools miss many of the ances in check. To achieve this potential, measurement tools scripts we detected throughout Section 6, particularly those must be made available broadly rather than just within the using lesser-known techniques. Although blocking tools de- research community. In this work, we’ve tried to bring this tect the majority of instances of well-known techniques, only ambitious goal closer to reality. a fraction of the total number of scripts are detected. The analysis presented in this paper represents a snapshot of results from ongoing, monthly measurements. OpenWPM and census measurements are two components of the broader EL + EP Disconnect Web Transparency and Accountability Project at Princeton. % Sites % Scripts Technique % Scripts % Sites We are currently working on two directions that build on the Canvas 88.3% 17.6% 78.5% 25.1% work presented here. The first is the use of machine learning 10.3% Canvas Font 10.3% 97.6% 90.6% to automatically detect and classify trackers. If successful, 21.3% WebRTC 4.8% 5.6% 1.9% this will greatly improve the effectiveness of browser pri- Audio 5.6% 53.1% 1.6% 11.1% vacy tools. Today such tools use tracking-protection lists that need to be created manually and laboriously, and suf- Table 9: Percentage of fingerprinting scripts blocked by fer from significant false positives as well as false negatives. Disconnect or the combination of EasyList and EasyPrivacy Our large-scale data provide the ideal source of ground truth for all techniques described in Section 6. Included is the for training classifiers to detect and categorize trackers. percentage of sites with fingerprinting scripts on which scripts are blocked. The second line of work is a web-based analysis platform that makes it easy for a minimally technically skilled ana- lyst to investigate online tracking based on the data we make Fingerprinting scripts pose a unique challenge for manu- available. In particular, we are aiming to make it possible ally curated block lists. They may not change the rendering for an analyst to save their analysis scripts and results to of a page or be included by an advertising entity. The script the server, share it, and for others to build on it. content may be obfuscated to the point where manual in- spection is difficult and the purpose of the script unclear. 8. ACKNOWLEDGEMENTS 0 1 . We would like to thank Shivam Agarwal for contribut- ing analysis code used in this study, Christian Eubank and 8 0 . Peter Zimmerman for their work on early versions of Open- . 0 6 WPM, and Gunes Acar for his contributions to OpenWPM . 0 4 and helpful discussions during our investigations, and Dillon Reisman for his technical contributions. . 2 0 We’re grateful to numerous researchers for useful feed- . 0 0 back: Joseph Bonneau, Edward Felten, Steven Goldfeder, − 6 2 − 5 − − 3 4 − 10 10 10 10 10 Fraction of Scripts Blocked Harry Kalodner, and Matthew Salganik at Princeton, Fer- Prominence of Script (log) nando Diaz and many others at Microsoft Research, Franziska Roesner at UW, Marc Juarez at KU Leuven, Nikolaos Laoutaris Figure 10: Fraction of fingerprinting scripts with promi- at Telefonia Research, Vincent Toubiana at CNIL, France, nence above a given level blocked by Disconnect, EasyList, Lukasz Olejnik at INRIA, France, Nick Nikiforakis at Stony or EasyPrivacy on the top 1M sites. Brook, Tanvi Vyas at Mozilla, Chameleon developer Alexei Miagkov, Joel Reidenberg at Fordham, Andrea Matwyshyn OpenWPM’s active instrumentation (see Section 3.2) de- at Northeastern, and the participants of the Princeton Web tects a large number of scripts not blocked by the current Privacy and Transparency workshop. Finally, we’d like to privacy tools. Disconnect and a combination of EasyList thank the anonymous reviewers of this paper. and EasyPrivacy both perform similarly in their block rate. This work was supported by NSF Grant CNS 1526353, The privacy tools block canvas fingerprinting on over 78% a grant from the Data Transparency Lab, and by Amazon of sites, and block canvas font fingerprinting on over 90%. AWS Cloud Credits for Research. However, only a fraction of the total number of scripts uti- lizing the techniques are blocked (between 10% and 25%) showing that less popular third parties are missed. Lesser- 9. REFERENCES known techniques, like WebRTC IP discovery and Audio fingerprinting have even lower rates of detection.  G. Acar, In fact, fingerprinting scripts with a low prominence are C. Eubank, S. Englehardt, M. Juarez, A. Narayanan, and C. Diaz. The web never forgets: Persistent tracking blocked much less frequently than those with high promi- mechanisms in the wild. In , 2014. Proceedings of CCS nence. Figure 10 shows the fraction of scripts which are urses,  G. Acar, M. Juarez, N. Nikiforakis, C. Diaz, S. G ̈ blocked by Disconnect, EasyList, or Easyprivacy for all tech- F. Piessens, and B. Preneel. FPDetective: dusting the niques analyzed in this section. 90% of scripts with a promi- web for fingerprinters. In Proceedings of CCS . ACM, 2013. nence above 0.01 are detected and blocked by one of the  L. A. Adamic and B. A. Huberman. Zipf’s blocking lists, while only 35% of those with a prominence law and the internet. Glottometrics , 3(1):143–150, 2002. above 0.0001 are. The long tail of fingerprinting scripts are  H. C. Altaweel I, largely unblocked by current privacy tools. Good N. Web privacy census. Technology Science , 2015.
20 Fingerprinting Script First-party Count Classification 147 Tracking cdn.augur.io/augur.min.js Tracking 115 click.sabavision.com/*/jsEngine.js 72 static.fraudmetrix.cn/fm.js Tracking *.hwcdn.net/fp/Scripts/PixelBundle.js Tracking 72 www.cdn-net.com/cc.js 45 Tracking Tracking 45 scripts.poll-maker.com/3012/scpolls.js Non-Tracking static-hw.xvideos.com/vote/displayFlash.js 31 Tracking g.alicdn.com/security/umscript/3.0.11/um.js 27 Tracking load.instinctiveads.com/s/js/afp.js 16 15 Tracking cdn4.forter.com/script.js socauth.privatbank.ua/cp/handler.html 14 Tracking 6 Unknown retailautomata.com/ralib/magento/raa.js 6 Tracking live.activeconversion.com/ac.js olui2.fs.ml.com/publish/ClientLoginUI/HTML/cc.js 3 Tracking 3 Tracking cdn.geocomply.com/101/gc-html5.js Unknown retailautomata.com/ralib/shopifynew/raa.js 2 2 Unknown 2nyan.org/animal/ pixel.infernotions.com/pixel/ 2 Tracking 2 Unknown 22.214.171.124/ralib/magento/raa.js 80 - 80 others present on a single first-party TOTAL 705 - Table 13: WebRTC Local IP discovery on the Top Alexa 1 Million sites. **: Some URLs are truncated for brevity. Prominence # of FP Rank Change Site 6.72 doubleclick.net +2 447,963 google-analytics.com 6.20 609,640 − 1 gstatic.com 5.70 461,215 − 1 google.com 5.57 0 397,246 4.20 309,159 facebook.com +1 176,604 3.27 +3 googlesyndication.com 233,435 0 facebook.net 3.02 133,391 +4 2.76 googleadservices.com 2.68 fonts.googleapis.com − 4 370,385 scorecardresearch.com 59,723 +13 2.37 2.37 94,281 adnxs.com +2 twitter.com 2.11 143,095 − 1 fbcdn.net 2.00 172,234 − 3 ajax.googleapis.com 1.84 − 6 210,354 1.83 yahoo.com +5 71,725 rubiconproject.com 1.63 45,333 +17 openx.net 1.60 59,613 +7 googletagservices.com 39,673 +24 1.52 mathtag.com 1.45 81,118 − 3 advertising.com 1.45 49,080 +9 Table 14: Top 20 third-parties on the Alexa top 1 million, sorted by prominence. The number of first-party sites each third-party is embedded on is included. Rank change denotes the change in rank between third-parties ordered by first-party count and third-parties ordered by prominence.