user study video breakdown.eps

Transcript

1 Real-Time Screen-Camera Communication Behind Any Scene Tianxing Li, Chuankai An, Xinran Xiao, Andrew T. Campbell, a nd Xia Zhou Department of Computer Science, Dartmouth College, Hanove r, NH {tianxing, chuankai, campbell, xia}@cs.dartmouth.edu xi [email protected] d is QR code [6], where information (typically a URL) is encode ABSTRACT va- into a 2D barcode. Recent research endeavors have led to inno m- We present HiLight, a new form of real-time screen-camera co nce the tive barcode designs that boost the data rate [14, 26] or enha munication without showing any coded images (e.g., barcode s) for transmission reliability [15, 16, 28, 36]. off-the-shelf smart devices. HiLight encodes data into pix el translu- dis- These efforts are exciting; however, they commonly require screen content, so that camera-equipped de- cency change atop any ent the playing visible coded images, which interfere with the cont vices can fetch the data by turning their cameras to the scree n. Hi- screen is playing and create unpleasant viewing experience s. In this om- Light leverages the alpha channel, a well-known concept in c era paper, we seek approaches to enable unobtrusive screen-cam hange. puter graphics, to encode bits into the pixel translucency c communication, which allows the screen to concurrently ful fill a By removing the need to directly modify pixel RGB values, Hi- ly, we dual role: displaying content and communication. Ultimate Light overcomes the key bottleneck of existing designs and e nables envision a screen-camera communication system that transm its and screen real-time unobtrusive communication while supporting any ation receives dynamic data in real time, while ensuring communic mart content. We build a HiLight prototype using off-the-shelf s s display- occurs unobtrusively regardless of the content the screen i devices and demonstrate its efficacy and robustness in pract ical set- ing — let it be an image, a movie, a video clip, a web page, a tings. By offering an unobtrusive, flexible, and lightweigh t com- inter- game interface, or any other application window. As the user ens munication channel between screens and cameras, HiLight op acts with the screen and switches the content, the communica tion s, e.g., up opportunities for new HCI and context-aware application tional sustains. Hence, communication is truly realized as an addi smart glasses communicating with screens to realize augmen ted re- functionality for the screen, without placing any constrai nts on the ality. screen’s original functionality (displaying content). We are not alone in working towards this vision. Recent ef- Categories and Subject Descriptors screen- forts have made valuable progress on designing unobtrusive en- camera communication [10, 37, 39, 41, 42]. However, a fundam C.2.1 [ Network Architecture and Design ]: Wireless communica- tal gap to the vision remains, mainly because existing desig ns all tion of the require direct modifications of the pixel color (RGB) values screen content. This methodology cannot enable real-time unobtru- Keywords e gener- sive communication atop arbitrary screen content that can b Screen-camera communication; visible light communicatio n; alpha ated on the fly with user interactions. The reason is two-fold . First, channel h- modifying pixel RGB values in real time has to rely on the Grap ics Processing Unit (GPU) by directly leveraging related GP U li- braries. However, the operating system typically does not a llow a 1. INTRODUCTION third-party application to pre-fetch or modify the screen c ontent of In a world of ever-increasing smart devices equipped with sc reens een of a other applications or system interfaces (e.g., the home scr s and cameras, enabling screens and cameras to communicate ha ation, smartphone or tablet). Thus, to achieve real-time communic rma- been attracting growing interests. The idea is simple: info le ap- existing designs are limited to screen content within a sing tion is encoded into a visual frame shown on a screen, and any although plication or standalone files (e.g., image, video). Second, camera-equipped device can turn to the screen and immediate ly the main processor (CPU) can pre-fetch and modify arbitrary pixel rum band, fetch the information. Operating on the visible light spect RGB values, modifying RGB values at the CPU incurs a signifi- terfer- screen-camera communication is free of electromagnetic in signs cant delay (hundreds of milliseconds), which makes these de rnative ence, offering a promising out-of-band communication alte unable to support real-time communication atop dynamic con tent for short-range information acquisition. The most popular example such as video. r To overcome this barrier, we propose a new design paradigm fo Permission to make digital or hard copies of all or part of thi s work for personal or com- decouples unobtrusive screen-camera communication, which e not made or distributed classroom use is granted without fee provided that copies ar o munication and screen content image layers. Our key idea is t for profit or commercial advantage and that copies bear this n otice and the full cita- tion on the first page. Copyrights for components of this work owned by others than ent by create a separate image layer (a black matte, fully transpar o copy otherwise, or re- ACM must be honored. Abstracting with credit is permitted. T ent im- default) dedicated to communication atop all existing cont publish, to post on servers or to redistribute to lists, requ ires prior specific permission e at age layers. We encode data into the pixel translucency chang and/or a fee. Request permissions from [email protected] g. the separate image layer, so that the receiver (camera) can p erceive May 18–22, 2015, Florence, Italy. MobiSys’15, the color intensity change in the composite image and decode data. c 2015 ACM 978-1-4503-3494-5/15/05 ...$15.00. Copyright © http://dx.doi.org/10.1145/2742647.2742667.

2 To control pixel translucency, we leverage the alpha channe l [27], within 8.3 ms even for high-resolution video frames (1080p) , a well-known concept in computer graphics. We control the le vel sufficient to support video playing at 60 FPS and 120 FPS frame rates. of pixel translucency change so that it is perceivable only t o cam- nslu- eras but not human eyes. Furthermore, by controlling the tra • HiLight is robust across diverse practical settings with ha nd cency of each pixel independently, we enable multiple simul tane- motion and perspective distortion, supporting up to 1-mete r om- ous transmitter elements on the screen. This creates a MIMO c ◦ viewing angle for the 10.5-inch viewing distance and 60 munication channel between screens and cameras [7], which c an transmitter screen and maintaining stable performance onc e y. further boost data rate or improve transmission reliabilit the ambient light is above 40 lux in indoor environments. First Our methodology has two key benefits. , operating on the alpha channel, the system no longer needs to directly modify pixel We summarize our key contributions as follows: Contributions. RGB values while achieving the same effect. Since alpha valu es are blended by the GPU, the encoding is almost instantaneous, wh ich is • We analyze existing unobtrusive screen-camera communica- y dynamic critical to support real-time communication atop arbitrar tion designs using detailed measurements and identify thei r content. More importantly, since alpha blending is a mature fea- n limitations to enable real-time unobtrusive communicatio iOS) ture offered by the GPU, existing platforms (e.g., Android, atop any screen content. provide application-layer functions that use related GPU A PIs to • We propose a new design paradigm for unobtrusive screen- modify pixel alpha values. Thus, the CPU can call these funct ions camera communication, which decouples communication from at the application layer without directly dealing with GPU A PIs or the screen content and uses the alpha channel to encode data modifying the OS or low-level drivers. Second , by realizing com- into pixel translucency change [21]. Since alpha values are munication at a separate image layer, the system makes unobt rusive blended by the GPU, data encoding is almost instantaneous, the con- communication universally available and truly parallel to which is the key to enabling real-time communication atop tent playing on the screen – no matter if it is a static, dynami c, or arbitrary content. multi-layer content, and regardless of the frame rate it is p laying at ile the and the frame resolution. Users can use the screen as it is, wh • We design and build HiLight using off-the-shelf smart de- nd communication between the screen and the camera occurs behi vices, the first system that realizes on-demand data transmi s- the scene in real time, unobtrusively. sions in real time unobtrusively atop arbitrary screen con- 1 , the Following our methodology, we design and build HiLight tent. We extensively evaluate our prototype under diverse amera com- first system that supports real-time, unobtrusive screen-c practical settings, examine its potential performance on a elf smart munication atop arbitrary screen content using off-the-sh wide range of devices (e.g., smartphones, tablets, laptops , s. First, devices. We address the following specific design challenge and high-end cameras), and assess its user perception. when determining the level of pixel translucency change, we face a ni- By offering an unobtrusive, flexible, and lightweight commu n relia- tradeoff between user viewing experiences and transmissio cation channel between screens and cameras, HiLight presen ts op- slucency bility. HiLight seeks the best tradeoff by adapting the tran smart portunities for new HCI and context-aware applications for the change based on the color distribution of screen content and re ad- devices. For example, a user wearing smart glasses can acqui frame transition. It smooths the pixel translucency change across V screen, ditional personalized information from the screen (e.g., T the screen while ensuring the change is reliably detectable by the t users are smartphone, or tablet screen) without affecting the conten , the whole encoding has to finish within a tight receiver. Second lications currently viewing. HiLight also provides far-reaching imp d time limit (16 ms) to support video’s normal playing speed an for facilitating new security and graphics applications. avoid the flicker effect [9]. We design lightweight encoding al- gorithms that require only sampled color information of sam pled 2. SCREEN-CAMERA COMMUNICATION: content frames. This reduces the overhead of pre-fetching c ontent frames and ensures that data encoding and decoding can be fin- LIMITATIONS AND SOLUTION ished within 8 ms. Third , the receiver perceives a mixture of color sal In this section, we first present the design goals for a univer ange, intensity change caused not only by encoded translucency ch effi- screen-camera communication system. We then analyze the in change, but also by other interfering sources such as screen content lly we cacy of existing designs to achieve these design goals. Fina ambient light, and camera noise. We design efficient algorit hms to introduce our new methodology. extract the desired color intensity change, and adapt our st rategy to der the scene type. This design ensures the system robustness un 2.1 Design Goals diverse practical settings. l We bear in mind the following goals when designing a universa We evaluate the HiLight prototype with an Samsung Tab S table t screen-camera communication system. Light as the transmitter and an iPhone 5s as the receiver. We test Hi The screen-camera data com- Staying Unobtrusive to Users. using 112 images, 60 video clips, and various web browsing an d ire munication is completely hidden from users and does not requ gaming scenarios. Our key findings are as follows: om- showing any visible coded images on the screen. Hence, data c • HiLight supports unobtrusive communication atop arbitrary is view- munication does not interfere with any content that the user screen content as the user is interacting with the screen, ac hiev- g ing at the screen, and can be easily embedded into any existin ing 1.1 Kbps throughput with 84 – 91%+ accuracy for all screens (e.g., smartphone screens, laptop screens) withou t sacrific- scene types. t). ing their original functionality (i.e., displaying conten Scene. Any Supporting The data communication can occur re- • shelf HiLight realizes real-time data processing using off-the- gardless of the screen content, let it be an image, a video cli p, a smart devices. Its encoding and decoding delays are both movie, a gaming scene, a multi-layer application windows, or the 1 . Furthermore, the communi- home screen of a smartphone or tablet We have made the demo video clips of HiLight available at http://dartnets.cs.dartmouth.edu/hilight . cation continues as the screen content changes on the fly (e.g ., the

3 re, the rendering optimization made by the web browser. Therefo user browses a web page, plays games, and watches a movie). To n- although existing designs can leverage GPU programming to e s to be support all types of screen content, the communication need d to the able real-time unobtrusive communication, they are limite independent of the screen content, which can be generated on the lone files screen content within a single application or static standa fly and vary over time as the user interacts with the screen. and are unable to support all dynamic screen content. Processing Dynamic Data in Real Time. The system can trans- CPU-Based RGB Modifications. Another approach to modi- ssing mit and receive dynamic data on the fly, rather than pre-proce fying pixel RGB values is to use the CPU to pre-fetch pixel RGB an image or a video file to embed data. This is necessary when h allows values of the current screen content. Although this approac the data comes on the fly or when the screen content is not known ne, the system to enable unobtrusive communication atop any sce in advance (e.g., interactive gaming scene). The challenge is that e data it entails long encoding delay and fails to support real-tim ps), the when the screen is displaying video (e.g., movies, video cli communication. We further quantify the processing time using off- stem frame rate is at least 24 frames per second (FPS). Thus, the sy m- the-shelf smart devices (Nexus 4, Nexus 5, Samsung Note 3, Sa has to finish encoding data into a frame within 42 ms before the ). To capture the all with Android 4.4 sung S5, Samsung Tab S, screen displays the next frame. When the video is played at 60 FPS un- screen content on an Android device, we read the frame buffer or 120 FPS, the encoding has to finish within 16 ms or 8 ms. directory. We leverage the OpenCV der the /dev/graphics/ Operating on Off-the-Shelf Smart Devices. The system can library [2], a popular computer vision library, to calculat e the com- evices transmit and receive data in real time using existing smart d epre- plementary RGB values for each pixel. This calculation is a r (e.g., smartphones, tablets). These devices have limited c omputa- sentative of the RGB modification step in all existing design s [10, tional power, and their cameras are not as sophisticated as h igh-end 37, 39, 41, 42]. We then use the Simple Direct Layer (SDL) libr ary od- single-lens reflex (SLR) cameras. Hence the encoding and dec e it to render an image on the screen. We choose SDL library becaus evices ing designs have to be lightweight and robust, so that smart d roid is much more flexible than the rendering library [3] in the And sup- as transmitters and receivers can process data in time while framework. For all smart devices we tested, the processing t ime porting dynamic content such as video. of data encoding far exceeds 42 ms, the timing threshold (§ 2.1) Next, we analyze existing proposals on unobtrusive screen- camera sion. Es- required to support real-time screen-camera data transmis e communication and examine whether they can achieve the abov pecially for high-resolution (1080p) video, CPU-based RGB mod- goals. ifications need 1–2 s to embed data into a frame. While the enco d- ing time slightly drops as the frame resolution decreases, i t never 2.2 Current Limitations 2 drops below 300–600 ms. Note that these steps are the minimal for Researchers have proposed designs [10, 37, 39, 41, 42] to re- e. embedding data into the RGB values of the screen content fram move the need of showing visible coded images while enabling Some designs require additional steps (e.g., copying frame s [37]), screen-camera communication. Despite the tangible benefits of ex- leading to even longer delays. To speed up encoding, one can m od- ls de- isting designs, they are unable to achieve all the design goa els ify only a subset of pixels. Yet this reduces the number of pix olor scribed in § 2.1, mainly because they directly deal with the c transmis- used to transmit data, sacrificing either the data rate or the in red, values of each pixel, specified as the color intensity values single sion reliability. More importantly, so far we have assumed a green, and blue (RGB) diodes [34]. To ensure that RGB value t of image layer. For content with multiple image layers, the cos e RGB changes are unnoticeable to users, they have to pre-fetch th with pre-fetching and modifying pixel RGB values grows linearly values of all content pixels to subtly modify RGB values and e n- s the number of image layers. Thus, CPU-based RGB modification code data. As a result, existing designs cannot serve as a uni fied cannot achieve real-time data communication. solution that can achieve real-time data communication dec oupled To sum up, existing unobtrusive Summary of Observations. from the screen content (e.g., the home screen of a smartphon e, the cations screen-camera communication designs require direct modifi interface windows of other independent applications). Nex t, we an- of content pixel color (RGB) values and fail to transmit dyna mic alyze the two possible approaches to modifying pixel RGB val ues reen data atop arbitrary dynamic content, where either data or sc using existing designs and identify their limitations. on- content comes on the fly. Furthermore, by creating an opaque c GPU-Based RGB Modifications. To enable real-time modifica- y the tent layer with modified RGB values, these designs can nullif o- tions of pixel RGB values, one has to rely on GPU’s parallel pr existing rendering optimization. To mitigate the problem, current enGL [1] cessing power and leverage GPU-related libraries (e.g., Op sac- designs will have to give up whole-screen communication and on the Android and iOS). The key idea is to supply GPU with two In addition, rifice either the data rate or the transmission reliability. programs, known as kernels or shaders , which are executed inde- its their integrating communication with screen content greatly lim pendently per pixel. Modern shaders are fast. Based on our ex peri- applicable scenarios. ments on a Google Nexus 6 (with Android 5.0 OS) phone, it takes only 0.13 ms to modify RGB values of all the pixels in a single 2.3 Solution: Decoupled Communication s two video frame (1920 x 1080 pixels). This approach, however, ha Using Alpha Channel , it is limited to modifying screen content within a limitations. First To overcome the above limitations, we propose a new design e file). single application or a standalone file (e.g., a video or imag paradigm for unobtrusive screen-camera communication. Mo ti- e.g, Because of the system security model of existing platforms ( vated by the fact that the tight integration of communicatio n and en content Android), third-party applications cannot pre-fetch scre screen content is problematic, we propose to decouple scree n-camera from other applications or system interfaces (e.g., home sc reen). communication from the image layers of screen content. Furt her- , by replacing the original screen content with an opaque, Second more, we design lightweight data encoding that leverages th e power ex- modified new content layer, the modifications can nullify the isting rendering optimization. For example, consider the u ser is 2 We also test devices (e.g., Samsung S5, Nexus 5) with the rece nt n, browsing a web page in a browser. To enable data communicatio Android 5.0, which provides a new screen capture API to speed up we have to run an OpenGL application and display the modified the process of pre-fetching screen content. Our results sho w that e page through this OpenGL app. This additional step can remov CPU-based RGB modifications still take more than 200 ms.

4 α = 0.01 α = 0.5 = 0 α = 0.01 = 0 α α α = 0.5 460 Grid 440 Communication 420 layer 400 Color intensity 0.2 0.5 0 0.6 0.4 0.3 0.1 Content layer α (a) Overview (c) Pixel color intensity perceived (b) Resulting composite image by a camera (receiver) mage layer on the screen. (a) Atop the content image layer, we Decoupling screen-camera communication from the content i create an Figure 1: additional image layer (a black matte, fully transparent by communication layer . To default) dedicated to data communication, referred to as th e transmit data, we divide the communication layer into grids , and encode data into pixel translucency change of each grid without affecting users’ viewing experiences. (b) shows the resulting composite ima ge after modifying the pixel translucency of the communicat ion layer. (c) Camera-equipped devices perceive color intensity change as value increases. α of GPU and does not require direct modifications of content RG ensity ine how cameras on existing smart devices perceive color int B More importantly, since alpha blending is a mature fea- values. change. We set up a Samsung Note 3 phone as the transmitter, an d ture offered by the GPU, alpha values can be modified by callin g a a Samsung S5 phone 15 cm away as the receiver. We create a top system function at the application layer without any OS-lev el mod- α black image layer in Note 3, adjust pixel values of this top im- ifications (see more in § 5). age layer uniformly, and measure the color intensity of each pixel At the high level, our design principle nsity captured by the S5’s camera. Figure 1(c) shows the color inte is two-fold: averaged over all pixels as α increases. It confirms that increas- • We create a separate image layer (a black matte, fully trans- d value on the top image layer linearly decreases the perceive α ing as parent by default) dedicated to communication, referred to color intensity. We also observe that even for the α change of 0.01, communication layer the , atop all content image layers (Fig- an still imperceptible to human eyes (Figure 1(b)), the S5’s camera c communication from screen con- decoupling ure 1(a)). By detect the difference. tent, we allow the screen to better fulfill its dual role (com- el Motivated by the above observations, we encode data into pix munication and displaying content). α ) change on the top communication layer. For to- translucency ( Hz. There- day’s smart devices, the screen refresh rate is typically 60 • We encode data by changing the pixel transparency of the fore, to ensure the change imperceptible to human eyes, we ch ange communication layer. This removes the need to directly mod- 3 α [9]. values by only 1%–8%, sufficient to avoid the flicker effect ify pixel RGB values while achieving the same effect. In Since the change can be perceived by cameras, camera-equipp ed particular, we leverage , a well-known concept alpha channel cap- devices can examine the perceived color intensity change in in computer graphics, to control pixel transparency and en- com- tured frames and decode data. Furthermore, we can divide the code data [21]. Since alpha values are blended by GPU ( < 1 munication layer into grids and change the α values of the pixels in , ms), our design significantly drives down the encoding delay Hence each grid functions as an indepen- each grid independently. which is the key to support real-time, unobtrusive communi- dent transmitter element, resulting into a MIMO channel bet ween cation atop any screen content. screens and cameras. Next, we begin with the background on alpha channel and how we Note that the same principle holds when using a white matte as utilize it to transmit data. We then elaborate the key benefit s of our values of an area α the communication layer, where increasing the methodology and design challenges. ite im- in the communication layer brightens this area in the compos e for age. However, black and white are the only two colors feasibl As a well-known con- Communication Using Alpha Channel. realizing alpha channel based communication. The reason is as fol- rm cept in computer graphics, alpha channel is widely used to fo tensity lows. In general, assume the communication layer’s color in a composite image with partial or full transparency [27]. It stores in a color channel (i.e., red, green, or blue channel) is C with the 1 between 0 and 1 to indicate the pixel translucency. 0 a value α . Based on the alpha blending principle, for a pixel α alpha value of means that the pixel is fully transparent, and 1 means that it is fully in this color channel, in the content layer with color intensity C 2 for all pixels in the communi- α opaque. By default, we set = 0 in this color channel in the final com- C this pixel’s color intensity ith the cation layer to make it fully transparent and not interfere w = · α , where − + (1 C posite image is calculated as C C α · ) 1 2 screen content. As we increase the α values of the communication , C to the minimal C C . Hence setting 1] , , [0 ∈ α [0 and ∈ 255] 1 1 2 f the layer (Figure 1(a)), we change how users perceive the color o onds (0) or the maximal (255) in all color channels, which corresp = 1 α content layer. Specifically, when , the pixel of the top com- to the black or the white color, always dims or brightens a pix el in munication layer is opaque, hence users perceive this pixel as black, the composite image, regardless of the pixel’s original col C ), or ( 2 y dim the color of the communication layer. Thus we can essentiall atop the key for encoding data into the pixel translucency change an area by increasing the values of this area on the communica- α any screen content. In contrast, setting to any value within (0, C 1 tion layer. As an example, Figure 1(b) shows the composite im age 255) in a color channel does not uniformly dim or brighten pix els values of the communication α on the screen, where we set pixel in all colors and thus is unable to enable communication atop any layer to 0 (left), 0.01 (middle), and 0.5 (right), illustrat ed in Fig- pixel color of the content layer. α ure 1(a). A higher leads to a darker appearance in the composite image. 3 This dimming effect is perceived as color intensity (summat ion The occurrence of the flicker effect depends on the screen ref resh rate and the degree of change across frames. of RGB channel values) change by a camera. We further exam-

5 values α 2. Specify Data Alpha blending modulation BFSK 3 GPU CPU Screen of the communication layer Sampled Alpha matrix generator 1. Fetch Image bu!er frames 3. Render N Y Deferring Scene Cut scene? Frame detection transmission Figure 2: Flowchart of modifying alpha values. The CPU passes spec- The encoding procedure of the HiLight transmitter. Figure 3: values to GPU α ified . using a system function at the application layer val- α The GPU then applies the alpha blending technique to combine ues with an image frame. value change challenges. First , it is nontrivial to determine the α ) for each pixel. On one hand, ( ∆ should stay minimal to α ∆ α be imperceptible to users. On the other hand, a larger α means ∆ , operat- First Benefits. Our methodology offers three benefits. a more detectable change in the perceived color intensity at the ing on the alpha channel significantly reduces the encoding t ime. receiver. Hence we need to seek the best tradeoff between use r Figure 2 shows the procedure for the system to process a frame experience and communication reliability. Second , the per-frame using alpha-channel communication. The main processor pas ses encoding needs to finish within 16 ms to support video’s norma l values specified α of the communication layer to the GPU, which playing speed and avoid the flicker effect. While we remove th e ech- blends them with the content layer using the alpha blending t ata need to directly modify RGB values, the remaining steps for d s of nique [27]. The blending can be finished in 1 ms, several order encoding (fetching and rendering in Figure 2) can still take more magnitude faster than modifying pixel RGB values on CPU. Sec- 16 ms. We need to further reduce the encoding delay. than , since we enable communication using a separate image layer ond We address the challenges as follows. First , we set the small α ∆ (communication layer), communication can occur regardles s of the (1–8%) to keep the translucency change unobtrusive to users . To ge screen content, its frame rate, and the number of content ima ensure that the change can be reliably detected, we configure the tion layers. Thus our design allows the screen-camera communica of each pixel based on the color distribution of the content α ∆ r scene channel to support arbitrary scene, including a multi-laye lps re- frame and the transition across frames. The configuration he values of the con- (e.g., Angry Birds) that needs to modify the α ceivers deal with dark areas or frames with drastic content c hange. e layer, tent layer. Also, since data encoding is in the separate imag , to further reduce the data encoding delay, we fetch only Second ther we can parallelize data encoding and content playing and fur one content frame per sampling window (0.1 s in our implement a- , by adding a transparent communi- Third reduce encoding delay. tion) and sample a subset of pixels for their color values . This is cation layer and slightly adjusting its translucency, our d esign does not doable for existing designs to achieve whole-screen com muni- not interfere with the GPU rendering optimization for the un derly- ontent cation. Since existing designs rely on directly modifying c ing image layers while achieving the same effect of directly modi- color values, they have to fetch every frame and obtain color values fying pixel RGB values. of all pixels in a frame. Challenges. To implement our methodology, we face three key Next, we first overview the overall encoding procedure, and t hen challenges. First , determining the level of translucency change is focus on two main design components in detail. ile nontrivial. The change has to be unobtrusive to human eyes wh reliably detectable by cameras. Also, α value changes RGB values 3.1 Overview by a percentage. Thus for darker colors, the absolute color i nten- t Figure 3 overviews the encoding procedure. We sample conten sity change is smaller and harder to detect. Second , when transmit- ∆ image frames to configure α for each pixel at the communication ting dynamic data atop a video, all encoding steps need to fini sh timize layer. Note that the content frame information only helps op within 42 ms to support video’s normal playing speed. Furthe r- α the configuration and is not an enabler for the communica- ∆ more, to reduce the flicker effect [9], the frequency of the tr anslu- t image tion. Unlike current designs that require modifying conten cency change at the communication layer has to match the scre en’s frames, we only need to read (not write) coarse-grained info rma- ght refresh rate (60 Hz). Hence the encoding needs to be lightwei tion of content image frames. Given two adjacent sampled fra mes, to finish within 16 ms, and can operate on smart devices. Third , the transmitter examines their color transition to adapt th e encod- e for a dynamic scene, the screen content changes over time. Th resulting ing strategy. If the frame content changes drastically, the resulting color intensity change interferes with our encod ed color color intensity change can overwhelm the intensity change e ncoded intensity change. Even for static content, environmental f actors frames, which typ- cut scene with data. We refer to these frames as such as ambient light and camera noise can affect decoding. A ll )). ically occur rarely (1.3% in a test video stream in Figure 5(d these interfering sources make it challenging to extract da ta from We skip the cut-scene frames and defer encoding. Once detect ing the perceived color intensity change. based on the color a non-cut scene frame, we determine the ∆ α xt We address the above challenges and design HiLight. In the ne sition. brightness of the area in the content layer and the frame tran oth two sections, we describe in detail our design of HiLight on b ∆ α for each pixel, the transmitter modulates bits us- Given the the transmitter and the receiver. ing Binary Frequency Shift Keying (BFSK). We choose BFSK be- cause of its simplicity and its robustness to interference (e.g., am- 3. HiLight TRANSMITTER . In our implementation, bit 0 and 1 are represented bient light) HiLight transmitter encodes data into the pixel translucen α cy ( ix by translucency change at 20 Hz and 30 Hz respectively, over s value) change at the communication layer. The translucency change frames (Figure 4). We set six frames as a frame window, becaus e it is fixed at the screen’s refresh rate (60 Hz), independent of t he con- is the minimal to achieve 20 Hz and 30 Hz change under the scree n tent frame rate refresh rate of 60 Hz. By removing the low-frequency compone nts that can change at any time . We face two design

6 20 is calculated as: 1 1 16 0 0 ∑ ′ ∆α ) | p | − p ( C C ) ( ≤ i ≤ X 1 ij ij 12 1 j ≤ ≤ Y ′ d F, F ) = ( (1) , p 8 X × Y Power (dB) Alpha value 4 ′ 0 0 p are the color intensity values of pixels ) p where ( C and ) ( C ij ij 10 30 25 20 15 0 35 200 100 150 50 ′ ′ , respectively. The second metric [25, in frame F and F p p and ij ij Time (ms) Frequency (Hz) metric, examines the differ- histogram-based 31], referred to as the (b) Frequency domain (a) Time domain he whole ence in the color distribution of two frames. It partitions t color intensity range into bins, counts the number of pixels with ) change. (a) shows Figure 4: Encoding bits into pixel translucency ( α as: color values within each bin, and calculates the difference the sequence of values for symbols ‘01’. (b) shows the translucency α 2 ′ ∑ )) ( H ( i ( H − ) i FT. change in the frequency domain for bit 0 and 1 after applying F ′ ) = F, F ( d , (2) h ′ )) H ) , H i ( i ( max( i ∈ [1 ,K ] ′ i ( H max( 6 )) i ( =0 ,H ) ′ are the number of pixels in the ) and H ( ( i ) i i -th bin for H where < ( 15Hz), the frequency power of the data bit is less affected by en- ′ F frame F and respectively, and K is the number of bins. We vironmental noise (e.g., ambient light). In comparison, modulation d choose these two metrics because they are easy to compute, an on in schemes (e.g., Manchester code) that rely on phase transiti ocal complement each other. Pixel-based metric captures small l dation the time domain to extract data can suffer performance degra ed met- changes (e.g., a small moving object), while histogram-bas caused by environmental noise. In addition, since FSK encod es ric captures the global change. data as relative color intensity changes, HiLight is robust against - We combine these two metrics to classify a frame. For both met used by image quality degradation (e.g., block boundary effects ca on. Hence rics, a higher value indicates a more drastic frame transiti video compression). we define two thresholds for each metric to signal a static and a cut values for each pixel, the α After generating the sequence of static as d d and scene. We denote the thresholds for the metric p main processor passes these values to GPU. The GPU then ap- p cut static cut . We classify a frame d for the metric and d d , and d e layers plies the alpha blending technique [27, 34] to generate imag h p h h as a cut scene if the values of both metrics are above their cut scene (communication layer) with specified translucency, combin es them cut cut d > d > d and d thresholds, i.e., . We classify a frame uts with content image frames, forms composite images, and outp p h p h as a static scene if both metric values are below the static sc ene them to the image buffer. static static < d . A frame is a grad- and < d d thresholds, i.e., d Next we describe two main design components (scene detectio n p h p h ual scene if neither condition is satisfied. In our implement ation, α value changes ) in detail. and determining α ∆ static cut static cut , d d , and as 10, 100, 0.1, d , we empirically set d p p h h ture in and 1, respectively. These values also align with the litera 3.2 Scene Detection computer vision [29, 30, 32]. t Scene detection is the key component for the system to suppor 3.3 Determining Alpha Value Change ( ) α ∆ arbitrary screen scene. It runs continuously in the backgro und, ame Armed with the content frame information (scene type, and fr samples content frames, and identifies the screen scene type based he de- color distribution), the second key component determines t on the frame transition. The scene type is crucial for decidi ng gree of pixel translucency change at the communication laye r. The ng whether the encoding should be performed, and for determini goal is to keep the resulting color intensity change unobtru sive n layer the degree of pixel translucency change at the communicatio while ensuring they are still reliably detectable by receiv ers. (§ 3.3). of each pixel based To achieve the goal, we configure the ∆ α t frames Specifically, the transmitter periodically samples conten sition. on the color distribution of the content frame and frame tran every 0.1 s. We choose 0.1 s as the sampling window because it Our configuration is driven by two observations. First , ∆ α changes le missing is sufficient for detecting frame transition accurately whi the pixel color intensity by a percentage. Thus, for a fixed ∆ α , the a small number of frames. For each sampled frame, the thread smaller resulting absolute color intensity change of dark pixels is btains further uniformly samples one-sixteenth of the pixels and o Second , frame and less detectable than that of bright pixels [21]. their color information. content change also leads to color intensity change, which i nter- Given the sampled pixels of two sampled frames, we aim to cat- feres with the intensity change encoded with data. Motivate d by es: egorize the current frame into one of the following scene typ α ∆ these observations, we increase the for pixels atop dark con- the static, the gradual, and the cut scene . These categories capture ded tent areas or upon a gradual scene frame. This enhances the co the degree of frame transition, and are widely used in comput er vi- ing the intensity change in these two challenging scenarios, allow sion [17]. Figure 5 shows example frame pairs for each scene t ype. receiver to better extract data. While screen content can change drastically over a long time , the for each pixel α ∆ value change α Specifically, we determine the re- change between two adjacent frames is gradual and minor. The en into in two steps (Algorithm 1). The first step is to divide the scre t fore, static and gradual scenes are the most common, while cu , we ∆ α at the grid level. For each grid G grids and decide the lor in- scene occurs rarely (Figure 5(d)). Because of the drastic co k calculate its average color intensity value over all sampled ) G ( C code tensity change of the cut scene and its rare occurrence, we en k α pixels in G ∆ . We configure the grid-level within α change data only atop the other two types of scenes. G k C based on the grid color intensity 8%] , , and the scene [1% ) G ( two To determine the scene type, we measure the dissimilarity of k type. A side effect of this step is that for adjacent grids wit h dif- t met- adjacent sampled frames using two existing metrics. The firs α values, the grid border appears as a noticeable edge. To ferent metric, calculates the differ- pixel-based ric [44], referred to as the ∆ α minimize this side effect, the second step is to fine-tune the F ames ence of the pixel color intensity values of two frames. For fr ′ ′ α for each pixel within a grid and diminish the value difference pixels, the pixel-based metric and each with X × Y F d F, F ( ) p

7 Scene type Cut Gradual Static 800 600 400 200 0 1000 Frame index (a) Static scene scene (d) Frame scene type in a video stream (b) Gradual scene (c) Cut . 2 ( = 0 . 2 ) . = 12 ( d d = 216 , d = 0 = 1 d 6 ) . d ( ) = 0 2 d , , p p p h h h , the gradual, and the cut scene). We evaluate the difference of each pair of Figure 5: (a)-(c) Example image pairs of three scene types (the static d . (d) shows the scene type of each frame in a video stream. d frames using both the pixel-based metric and the histogram-based metric p h 1 α ). Algorithm 1: Determine alpha value change ( ∆ 0.8 , sampled content frame; 2) , scene type. S : 1) input F T : ∆ value change α α output for each pixel. 0.6 (%) ∆α F into X grids: G Divide , ..., G 1 X 0.4 ← k to 1 do X for Grid height α ∆ // Computing grid-level 0.2 p ( ) | p G ∈ G C } ) = Mean { C ( ij ij k k ) G = 4% ( C if ≤ then ∆ α 50 G k 0 100 = 3% ≤ ) α ∆ G ( C else if then G k Grid width ( then ∆ 150 ≤ = 2% α else if C ) G G k else = 1% α ∆ G α if = gradual then ∆ α S = 2 · ∆ T G G α ∆ Configuring Figure 6: for each pixel within a grid, assuming the // Fine-tuning within a grid ∆ α change is 1%. Here the darkness indicates the ∆ α value, grid-level α , y ( ) ← x ( G FetchCenter ) c c k not the actual pixel appearance. We gradually decrease ∆ α as pixels ∈ for do p G ij k 2 2 become further away from the grid center. This diminishes th e appear- ) i x +( ( j − y − ) c c α ∆ − G 2 = α ∆ e × ij ance of the grid border and optimizes the viewing experience . sqrt (2 π ) end end Extracting desired Demodulation Captured Data illumination change (FFT) frames N Discarding Scene Y among pixels on the grid border. In particular, for pixels in the cen- Cut scene? captured frames detection ∆ α is the same as the grid-level change G , their ter of the grid α k ∆ α . This ensures that the receiver can still differentiate tra nslu- G cency changes of adjacent grids. For pixels further away fro m the An overview of the decoding procedure in HiLight. Figure 7: following a standard α ∆ grid center, we gradually decrease their 1) normal distribution N (0 (Figure 6). The fine-tuning smooths , values for α ∆ the translucency change across grids. Finally the e- patterns caused by the encoded change. These patterns help r it into all pixels are fed into the modulator, which modulates each b cover the bits hidden in the mixed intensity change. Note that we translucency ( α ) change over time. integrate these scene-dependent strategies into a single f ramework, which allows the system to support arbitrary scenes in a unifi ed solution. 4. HiLight RECEIVER e- Next we first overview the decoding procedure, followed by a d r- HiLight receiver captures incoming frames, examines the pe tailed description on extracting the desired color intensi ty change. esign ceived color intensity changes, and decodes data. The main d ut of the challenge is to extract the desired color intensity change o 4.1 Overview perceived change. This is challenging because the color int ensity change perceived by the receiver comes from not only encoded α Figure 7 summarizes the main steps of decoding. The receiver changes, but also changes in screen content, ambient light c ondi- keeps monitoring the incoming frames, and buffers frames in a data tion, as well as camera noises. The receiver needs to extract frame window (six frames). It then samples a frame in this fra me e of all from the perceived color intensity change given the presenc e window, compares it to the last sampled frame, and applies th these interfering factors. same scene detection algorithm as that in the transmitter (§ 3.2) olor To address the challenge, we design strategies to filter out c to identify the scene type. If the scene type is the cut scene, the adapt our intensity change associated with interfering factors, and receiver discards this frame window since the transmitter d oes not atic scene, strategy to the current scene type. In particular, for the st encode data on these frames. For frames of other scene types ( i.e., ize we leverage an audio beamforming algorithm [33, 35] to minim the gradual or static scene), the receiver divides the captu red trans- the impact of interfering factors and extract the desired co lor inten- mitter screen into grids, identifies and enhances the desire d color tify sity change encoded with data; for the dynamic scene, we iden intensity change in each grid (§ 4.2).

8 250 250 The extracted color intensity change is then passed to a BFSK 240 project demodulator, which applies Fast Fourier Transform (FFT) to 240 230 translucency change into the frequency domain. To reduce th e im- 230 value α pact of ambient light noise with frequency components close to 0, 220 220 ∆α Alpha value ong the receiver filters out frequency components below 15 Hz. Am 210 210 the frequency components above 15 Hz, the receiver identifie s the 200 200 0 Perceived color Intensity Perceived color Intensity 120 100 60 40 20 0 80 80 120 100 0 60 40 20 ng component with the highest power, maps it to the correspondi Time (ms) Time (ms) hen bit, and outputs the bit into a candidate pool. The receiver t (b) w/ encoding bit 1 (a) w/o encoding bit 1 slides the frame window by one frame, and repeats the above pr o- is frame cedure to generate another candidate bit. The final bit for th values when ex- Figure 8: Identifying patterns caused by encoded α window is the bit with the most occurrences in the pool. indow (0.1 amining the perceived color intensity changes over a frame w 4.2 Extracting Desired Color Intensity Change s), assuming bit 1 is encoded. The receiver perceives mixed color intensity change caused by α values, screen content, ambient light, and camera noise. encoded real time. Thus we design a lightweight scheme without train ing to ded Among them, the color intensity change associated with enco assign weights to regions within a grid. values is the desired change that the receiver aims to extrac t, and α Our design is based on a simple observation: while frames can the rest are interfering sources. To minimize the impact of t hese t frames change drastically over a long time, when examining adjacen ese inter- interfering sources, our design is driven by the fact that th on, within a frame window (i.e., 0.1 s) and focusing on a tiny regi pend- fering sources have nonuniform impact across the screen, de we observe that the color intensity change is often monotoni c at a ivide a ing on the brightness of the area. Therefore, we can further d constant speed (Figure 8(a)). As a result, when it is mixed wi th grid into smaller regions, evaluate the impact of interferi ng sources the encoded change, the α α change creates variations in the speed on each region, and assign higher weight to regions less affe cted (b) of perceived color intensity change. As an example, Figure 8 f the by interfering sources. This allows us to reduce the impact o th shows the perceived color intensity of a region when mixed wi ated with interfering factors, and enhance the desired change associ value decreases (no α value change encode for bit 1. When the α data. The key question then is to determine the weight of each re- dimming effect) at the second and fourth frames, it slows dow n the verage gion within a grid. An effective weight assignment should le perceived color intensity change at both points. These spee d varia- the scene type of the screen content. Next we describe our str ategy tions are reflected by the sign of the second derivative of the color for each scene type (the static and gradual scene) in detail. intensity change over time. Similarly, α value change for encoding For the static scene, we are inspired by an au- Static Scene. inten- bit 0 also leads to a pattern in the second derivative of color ion- dio beamforming algorithm called Minimum Variance Distort nfer sity change in a frame window. We leverage these patterns to i less Response (MVDR) [33, 35]. This algorithm was originall y the encoded bit. le can- designed to enhance audio signals in a desired direction whi Clearly, for regions less affected by interfering sources, these celing out noise signals. The key idea is to leverage the corr elation patterns resulting from encoded bits are less noticeable. T herefore, (e.g., delay spread and energy distribution in the frequenc y domain) we can determine the weight of each region by examining wheth er across noise signals and cancel the noise energy. these patterns in the second derivatives exist. In our imple men- he We develop a variant of this algorithm in our context, where t their tation, for regions where these patterns do not exist, we set by inter- goal is to enhance the contribution of regions less affected weights as zero to minimize their impact. By focusing on regi ons fering sources. Specifically, for a grid with N regions, we calculate that exhibit detectable patterns, we reduce the impact of in terfer- the weight matrix for its regions as: ing sources and make the system robust in diverse settings. F ur- thermore, the pattern detection only requires computing th e second − 1 × = R W (3) C , R sum ased derivatives. Hence we can update the weight matrix on the fly b on current screen content. R where N × N is a noise correlation matrix capturing the correla- tion across regions of this grid, constant matrix with 1 × N is a C all elements equal to , and R is the summation of all elements 1 5. SYSTEM IMPLEMENTATION sum − 1 R . in We implement HiLight at the application layer using off-the - via a training process R We obtain the noise correlation matrix shelf smart devices. We implement HiLight transmitter at th e An- for a static scene. In particular, once the receiver detects a static Phone droid platform, and the receiver at the iOS framework using i M frame windows. It then applies FFT scene, it buffers frames for er 5s. We choose iPhone 5s as the receiver because for the receiv to compute the frequency components of each frame window, an d to decode data, its camera needs to be able to capture at least 60 estimates the correlation R between region i and j as: ij frames per second, and the iOS AVFoundation framework sup- ports 120 FPS video recording on iPhone 5s. In contrast, the c ur- ∑ 1 T = R (4) , , i, j X ] X ∈ [1 , N × ij ik rt rent Android framework (Android 4.4 or lower) does not suppo jk M 4 k ≤ 1 M ≤ , and phone manufactures (e.g., Sam- video recording at 60 FPS . sung, HTC) do not make their phone camera API/library public X are the vectors of frequency components of the X and where ik jk (an ImageView To implement the HiLight transmitter, we use th frame window, for region k i and j respectively. From our exper- Android UI container) to create the communication layer (Fi gure 1(a)) = 5 iments, we find that is sufficient to train the correlation M 4 cted by matrix and cancel out the contributions of regions more affe The recent Android 5 framework [4] includes a new camera API, which supports 4K resolution video recording with 65 Mb/s ba nd- interfering sources. Thus the training takes 0.5 s overall. width. It supports 60 FPS or a higher frame rate. We plan to im- When the screen content is dynamic, it is hard Gradual Scene. plement HiLight on the new platform once the hardware suppor t is ge in for training-based methods to catch up with the content chan available.

9 HiLight basic HiLight basic HiLight HiLight 1.2 100 1.2 100 80 80 0.9 0.9 60 60 0.6 0.6 40 40 Accuracy (%) 0.3 Accuracy (%) 20 0.3 Throughput (Kbps) 20 Throughput (Kbps) 0 0 550 450 765 150 650 350 250 0 0 Drama Web Scenery Game Sport Avg. color intensity Browsing (a) Static scene (b) Dynamic scene Experimental setup Figure 9: Figure 10: ids on the communication layer. We also HiLight’s performance atop arbitrary scene assuming 120 gr with Samsung Tab S (left) as the plot the performance of HiLight basic [21] that excludes two design enhancements (§ 3.3 and § 4.2) to examine transmitter and iPhone 5s as the their contribution. he number of transmitted bits, Since the throughput is equal to the accuracy multiplied by t receiver (right). roughput. we plot a single bar reflecting numbers in both accuracy and th atop all image layers. We use the setImageAlpha method to Statistics of test images. Table 1: α specify values at the application layer. This method passes the [650, Avg. color [150, [250, [350, [450, [550, 150 < α SurfaceFlinger given values down to the layer of the An- intensity 350) 450) 550) 650) 765) 250) droid framework, which further leverages the glBlendFunc API 31 23 23 14 6 # of images 5 10 in the OpenGL ES library [24] to run the alpha blending on the g, a GPU. Given the ubiquity of hardware accelerated compositin similar implementation can be realized on other mobile plat forms We increase the viewing distance when testing HiLight on tra ns- (e.g., iOS, Windows). value We implement scene detection and α mitters with larger screens in § 6.4. While we fix them on two m- generation as two separate threads. In the first thread, we sa ld sce- phone holders for most experiments, we also test the hand-he to fetch the screen ple the frame buffer under /dev/graphics/ nto 120 nario in § 6.3. By default, we divide the transmitter screen i content every 0.1 s. We leverage the OpenCV library to imple- 2 grids, where each grid is 2.7 cm in size (Table 2). We repeat each ment the scene detection algorithm, which runs continuousl y to r a experiment for five rounds, and perform all experiments unde detect the scene type of the incoming sampled frames. The firs t fluorescent lamp ( lux) in an office environment. We focus on 100 thread passes the detection result and frame information to the sec- s suc- two performance metrics: 1) accuracy, the percentage of bit α ond thread, which then generates values and modulates each bit hput, the cessfully received over all bits transmitted; and 2) throug using BFSK. The transmitter signals the start of a transmiss ion by number of bits successfully received per second. sending 1 in all grids for 0.1 s. Finally, to avoid touch events be- ing blocked by the communication layer, which is atop all con tent 6.1 Supporting Arbitrary Scene image layers, we register a to return View.OnTouchListener To evaluate HiLight’s ability to support arbitrary scene, w e ran- false so that the system ignores the communication layer and passes domly select 112 images with all levels of average color inte nsity the touch event to the applications below the communication layer. am- (Table 1), 60 video clips, a 30-min web browsing scene, and 6 g To implement the HiLight receiver, we use the AVCapture ses- 5 . We run HiLight to transmit random bits atop each screen ing apps sion in the iOS AVFoundation framework to capture frames. To iver. content, and measure the throughput and accuracy at the rece reduce the camera noise in the dark environment, we leverage the In all of our graphs, we plot error bars covering the 90% confid ence built-in light sensor to detect the dark environment, and in crease interval. the camera ISO value to raise its sensitivity accordingly. W e ap- ply an existing screen detector [42] to locate the transmitt er screen, Figure 10(a) plots the average accuracy and through- Static Scene. extract the screen area from each captured frame, and divide the a higher put as the average pixel color intensity value varies, where To speed up the frame decoding, captured screen area into grids. color intensity value indicates a brighter image. We divide the its. we fetch only a quarter of pixels in the screen area to decode b ge the whole color intensity range into seven intervals, and avera ing We implement the main components (scene detection, extract results of images with color intensity values in the same int erval. desired color intensity change, and FFT decoding) in three s epa- slucency Intuition says that dark images lead to less detectable tran rate threads. We display the decoding results in a pop-up win dow serve changes and thus much lower throughput. Surprisingly, we ob on the receiver’s screen. that HiLight is not sensitive to image color intensity, main taining r- 91%+ accuracy and 1.1 Kbps throughput for any images with ave ten- age color intensity above 150. Even for dark images (color in 6. HiLight EXPERIMENTS sity values below 150), HiLight still achieves 85% accuracy and 1 to- We perform detailed experiments to evaluate the HiLight pro Kbps throughput. This sets a notable improvement over our pr ior type, focusing on its ability to support unobtrusive transm issions an be design, referred to as HiLight basic [21]. The improvement c atop arbitrary scene in real time and its robustness in pract ical set- he adap- attributed to the effectiveness of two design components: t tings. We also seek to understand how hardware choice on scre en tion of α value change (§ 3.3) and the strategy of extracting en- and camera affects its performance, as well as user’s perception on coded color intensity change (§ 4.2). Our hypothesis is confi rmed the unobtrusiveness of our system. ∆ to 1% α by the results of HiLight basic [21], which fixes the We use the Samsung Tab S as the default Experimental Setup. and does not use weight matrix to extract desired color inten sity p the transmitter (we test other types of screens in § 6.4). We set u 5 transmitter and receiver (iPhone 5s) at their normal viewin g dis- All the test images and video clips are available at http:// 30 cm for the transmitter’s 10.5-inch screen , see Figure 9). tance ( dartnets.cs.dartmouth.edu/hilight-test .

10 20 20 20 Rendering (0.5-0.8 ms) Rendering (0.5-0.8 ms) Rendering (0.5-0.8 ms) α Generating α (<0.02 ms) Generating (<0.02 ms) Generating α (<0.02 ms) 15 15 15 Scene detection (5.8-7 ms) Scene detection (2.2-3.2 ms) Scene detection (3.8-5 ms) Decoding (<0.1 ms) Decoding (0.1-0.2 ms) Decoding (<0.05 ms) Sampling (<0.1 ms) Sampling (0.1-0.3 ms) Sampling (0.1-0.2 ms) 10 10 10 5 5 5 0 0 0 Note 3 Nexus 4 Nexus 4 Note 3 S5 Nexus 5 Nexus 4 Tab S Nexus 5 Tab S Nexus 5 S5 Note 3 S5 Tab S Encoding time per frame (ms) Encoding time per frame (ms) Encoding time per frame (ms) Transmitter device Transmitter device Transmitter device (b) Medium resolution (720p) (c) Low resolution (480p) (a) High resolution (1080p) rame within 8 ms, Figure 11: HiLight’s encoding time per frame under different frame res olution. Across all test devices, HiLight encodes a single f sufficient to support both 60 FPS and 120 FPS frame rates. To future improve HiLight’s performance on very dark changes. Impact of grid size on HiLight performance. Table 2: scenes (color intensity below 150), we can set white as the gr id 30 120 600 6 # of grids 360 240 180 or color of the communication layer atop dark areas. We plan it f 2 ) 2.7 1.8 (cm 1.4 0.9 0.5 Size 10.8 54.2 future work. 11K 6.8K 17K 23K # of pixels 683K 137K 34K Static scene For the dynamic scene, we test three types of Dynamic Scene. 99.4 Accuracy (%) 93.9 89.4 86.8 82.9 78.7 75.2 video clips including scenery video (e.g., campus video), s ports, Throughput 4.5 and drama (e.g., TV shows, movie trailers). They represent v 0.06 0.3 1 1.6 2 ideo 2.8 (Kbps) vely. For with minor, medium, and drastic frame transition, respecti Dynamic scene each type, we test 20 video clips and each clip lasts 1–2 mins. We 80.3 91.0 87.1 73.2 67.2 Accuracy (%) 85.4 78.3 also test two example scenarios when the screen content is ge ner- Throughput 0.3 1.4 1.9 2.6 4 1 0.05 ated on the fly. These include 1) web browsing, where the user i s (Kbps) reading and scrolling down a web page, refreshing the page, a nd switching to other web pages; and 2) gaming, where the user is lleng- inter-symbol interference [26] kicks in, making it more cha playing games (e.g., Angry Bird, Super Mario). acent grids ing to differentiate pixel translucency changes across adj r all Figure 10(b) plots the average throughput and accuracy unde ns- and causing more bit errors. However, the increase in the tra , types of dynamic scene. Our key observations are as follows. First mitted bits outweighs the accuracy drop, leading to the thro ughput 2 ut overall HiLight maintains 84% accuracy with 1 Kbps throughp (600 grids), Hi- growth. In particular, for the grid size of 0.5 mm tain across these scene types, even for drama video clips that con Light achieves 4 Kbps under both static and dynamic scene. Th e e of drastic screen content change. By examining the performanc minimal grid size is ultimately limited by the resolution of the cam- om- HiLight basic, we validate the efficacy of our two key design c era sensor and the distance between the transmitter and the r eceiver. ponents, which lead to 18% improvement in general and 20% for Second , dynamic scene leads to slightly drama video in particular. 6.2 Processing Time lower throughput and accuracy than the static scene, mainly for two To evaluate HiLight’s ability to support real-time communi ca- α reasons. The first reason is that to configure changes, the system nd tion, we measure its processing time of per-frame encoding a uses sampled color information of sampled frames to estimat e ac- decoding. We measure the data encoding time as the duration start- es vary tual frames. The sampling leads to estimation errors as fram from fetching a frame to the end of rendering. The decoding ing α over time, making the adaptation of sub-optimal. One solution ∆ time is the time taken to process frames in a frame window (six is to sample more frequently upon more drastic frame transit ion. frames) upon the arrival of a new frame. We plan it for future work. Another reason is that we defer tra ns- Encoding Time. We measure HiLight’s encoding time on five nly in missions upon cut scene. Yet since cut scene occurs rarely (o is not affected by the Android devices. Since the encoding time n 0.6% sports and drama), the resulting throughput loss is less tha number of grids on the screen and heavily depends on the frame , among different dynamic Third and 2.4% for sports and drama. resolution, we test three resolution levels, each with 10 vi deo clips. scenes, drama scene is the most challenging as it contains mo re he Figure 11 plots the total per-frame encoding time as well as t drastic content changes that greatly interfere with encode d color , across all de- breakdown. We make three key observations. First intensity change. Web browsing scene achieves the best perf or- vices and frame resolution levels, HiLight’s encoding dela y is con- mance among all because it has a fair portion of static scene w hen sistently below 8.3 ms. This is mainly because HiLight lever ages - the user reads web pages. Most games we test often change back α values and removes the need to directly modify GPU to process ground images drastically, and thus the gaming scene is simi lar to ms, pixel RGB values. With the per-frame encoding time below 8.3 drama. HiLight can further support high-resolution video streami ng at 120 We also examine how different grid size Varying the Grid Size. FPS while transmitting data unobtrusively. on the communication layer affects HiLight under the static or dy- Second , among all encoding steps, scene detection occupies 78– namic scene. We test seven grid sizes (Table 2) using 10 image s 87% of the encoding time and ranges from 2.5 ms to 7 ms. It is and 10 video clips, and list HiLight’s average throughput an d accu- longer for the higher frame resolution, because there are mo re pix- racy in Table 2. Overall, as the number of grids increases, Hi Light’s els for the algorithm to sample and process. Third , our sampling throughput grows rapidly and its accuracy drops slowly. Thi s is be- strategy significantly reduces the overhead of fetching and decod- , lead- cause more grids mean more concurrent transmitter elements ing a frame. Since we sample one frame per frame window (six ing to more transmitted bits. Yet as each grid has fewer pixel s, the frames) and sample one-sixteenth of pixels in the sampled fr ame,

11 100 100 100 1 1 1 80 80 80 0.8 0.8 0.8 60 60 60 0.6 0.6 0.6 40 40 40 0.4 0.4 0.4 Static Static Static 20 20 20 0.2 0.2 0.2 Accuracy (%) Accuracy (%) Accuracy (%) Dynamic Dynamic Dynamic Throughput (Kbps) Throughput (Kbps) Throughput (Kbps) 0 0 0 0 0 0 200 30 60 90 120 100 30 60 150 120 90 0 300 TX/RX Distance (cm) Ambient light intensity (lux) TX/RX distance (cm) (a) Supporting distance (c) Hand motion (b) Ambient light Impact of practical factors on HiLight’s performance. Figure 12: start to As the distance further increases, throughput and accuracy Table 3: HiLight’s decoding time over a frame window under different er dis- drop quickly. This is expected, because light attenuates ov grid size, using iPhone 5s. tance, leading to weaker illumination changes that are hard er to Decoding time (ms) under varying # of grids Decoding detect. In addition, since the receiver camera does not have an op- 30 120 180 240 360 600 steps 6 tical zoom, as the receiver is further away, the transmitter screen 2.95 Sampling ured becomes smaller in the captured frame. This reduces the capt Scene 1.22 pixels for decoding and lowers the decoding accuracy. detection Extracting 0.02 Next, we examine HiLight’s sensitivity to am- Ambient Light. changes lux, bient light. We test four indoor ambient light conditions (1 2.47 3.31 0.81 0.07 0.21 1.1 1.73 Demodulation lux), and plot HiLight’s performance in 300 lux, and 100 lux, 40 Overall 5.29 5.92 6.66 7.5 5 4.4 4.26 or HiLight, Figure 12(b). Intuitively, dark settings are challenging f where pixel translucency changes are harder to detect and ca mera entails 0.35 ms, which is 1/96 of fetch- fetching pixel color values noise is also higher. Yet we observe that the HiLight’s perfo rmance ing every frame and . obtaining the color values of all pixels is fairly stable once the light illuminance is above 40 lux (far be- We further examine HiLight’s data decoding Decoding Time. 100 lux). HiLight low the normal ambient light, which is around . Ta- time as the number of grids on the communication layer varies ilt-in achieves the robustness because the system leverages the bu ble 3 lists the time taken by each decoding step when processi ng s the light sensor on iPhone 5s to sense the ambient light and raise frames in a frame window on iPhone 5s. Aiming to capture the camera sensitivity in dark settings. This allows the receiv er to bet- decoding time upon each incoming frame on the fly, we do not in- amera ter sense the pixel translucency change. While increasing c clude the MVDR training delay (§ 4.2) in the table, because we sensitivity also introduces camera noise, most camera nois e is fil- the do not need to perform the training for each frame, but only in or tered out by our weight matrix used to extract the desired col 6 . From beginning of the whole decoding process for a static scene intensity change (§ 4.2). Table 3, our key observations are as follows. , under all the grid First Hand Motion. We now move on to examining the impact of in sizes, HiLight is able to decode frames in a frame window with hand motion on HiLight. We hold the receiver in the air facing ding 8 ms. This demonstrates that even when the transmitter is sen itter a fixed transmitter, and vary the distance between the transm data at 120 Hz (supported by higher screen refresh rate), the re- and the receiver. Figure 12(c) plots the results when the tra nsmit- ceiver is able to process all incoming data in real time. Second , for ter screen is displaying static or dynamic content. As expec ted, time all decoding steps except BFSK demodulation, their running ng to hand motion causes misalignment between two devices, leadi is independent of the number of grids. This is because these s teps iew- image blur and thus lowering the accuracy. Yet at the normal v are performed upon sampled pixels of each frame. Only the num - < 3–4% com- ing distance (30 cm), the accuracy drop is minor ( ber of sampled pixels affects their running time. In compari son, pared to the perfectly aligned scenario in Figure 12(a)). As the dis- BFSK demodulation performs FFT for each grid independently to tance increases, the performance gap to the perfectly align ed sce- decode each bit. More grids lead to more FFT computations and nario increases. This is because at longer distances, hand m otion thus a longer processing time. er is causes more significant screen misalignment, and the receiv more likely to misidentify the grids on the screen. Comparin g the 6.3 Practical Considerations scene two types of scene, we observe that the accuracy of the static We now evaluate the impact of practical factors on HiLight’s per- static drops more quickly. This is because the weight matrix for the formance. We test 10 images and 10 video clips, and repeat all scene is calculated once during decoding, while for the dyna mic experiments for five rounds. d scene, the weight matrix is updated for every frame window an Supporting Distance. We start with examining the transmission he thus can better deal with screen misalignment. To minimize t n the distance HiLight supports. We increase the distance betwee impact of hand motion, one can leverage image tracking and al ign- ) screen and the camera from 30 cm (the normal viewing distance screen. ment algorithms in computer vision to track the transmitter to 150 cm, and plot the resulting throughput and accuracy und er The challenge is to ensure that these algorithms can be perfo rmed ht’s both static and dynamic scene in Figure 12(a). Overall HiLig in real time. We plan to study this as part of our future work. performance is relatively stable when the distance is withi n 1 m, Viewing Angle. Finally, we examine HiLight with varying hori- maintaining 90%+ accuracy for the static scene and 85%+ for t he zontal and vertical viewing angles to assess the impact of th e mis- ign. dynamic scene. This demonstrates the efficacy of HiLight des alignment between the transmitter and the receiver. In the fi rst two 6 experiments, we fix the transmitter screen and rotate the rec eiver Our measurements show that MVDR training takes 1 ms to 27 ms ◦ ◦ while keeping their to 60 either horizontally or vertically by 0 as the number grids increases from 6 to 600, respectively.

12 RX TX γ 100 100 100 1 1 1 80 80 80 0.8 0.8 0.8 60 60 60 0.6 0.6 0.6 40 40 40 0.4 0.4 0.4 Static Static Static 20 20 20 0.2 0.2 0.2 Accuracy (%) Accuracy (%) Accuracy (%) Dynamic Dynamic Dynamic Throughput (Kbps) Throughput (Kbps) Throughput (Kbps) 0 0 0 0 0 0 0 10 20 30 40 50 60 60 50 40 30 0 20 10 0 20 30 60 40 10 50 Horizontal viewing angle θ (degree) (degree) β TX rotation angle (degree) γ Top-down viewing angle β (a) Horizontal viewing angle (c) TX rotation angle θ γ (b) Top-down viewing angle ht’s performance. Impact of the horizontal and vertical viewing angle on HiLig Figure 13: Table 5: Impact of cameras on HiLight’s performance, using Samsung Table 4: Impact of screens on HiLight’s performance, using iPhone 5s Tab S as the transmitter. as the receiver. Through- Accuracy Scene Cam- Scene Through- # of Accuracy # of TX Screen Receiver (%) put (Kbps) type era type put (Kbps) (%) grids grids 91 Samsung OLED 10.5" Static 1.1 Static 1.1 120 91 120 8 MP iPhone 5s 30 90 × 1600 Dynamic 0.3 2560 90 Tab S Dynamic 0.3 30 60 91 LCD 5.0" Static 0.6 90 Samsung 540 Static 4.9 13 MP Nexus 5 90 540 91 0.2 1920 × 1080 Dynamic 4.9 20 Note 3 Dynamic Static 91 120 1.1 Static 6.7 720 93 LED 13.3" 16 MP Mac Air Samsung S5 1440 × 540 4.9 Dynamic 90 900 Dynamic 0.3 30 90 6.6 Static 180 Canon 60D 91 92 720 1.6 LED 27" Static iMac 18 MP 92 Dynamic (SLR camera) 5.0 2560 540 × 1440 Dynamic 0.6 60 91 distance the same (30 cm). In the third experiment, we fix the r e- ◦ ◦ Varying the Screen. We first fix iPhone 5s as the receiver . to 60 ceiver and make the transmitter screen self-rotate from 0 and test different types (LCD, LED, OLED) of screens in varyi ng Figure 13 shows the three experimental setups, the observed per- 7 Since our current implementation sizes (5–27 in) as transmitters. spective distortion in each setup , and HiLight’s throughput and of the HiLight transmitter is based on the Android framework , to ◦ accuracy. Overall, HiLight supports up to 60 viewing angle in test HiLight on devices such as Mac Air and iMac, we make video all directions for both types of scenes. Its performance gra cefully clips with data embedded into a static or dynamic scene, play these itter degrades as the viewing angle increases, because the transm cy and video clips on each screen, and measure the real-time accura screen captured by the receiver distorts as a trapezoid, mak ing it tatic and throughput at the receiver. Table 4 lists the results for 10 s , since challenging for the receiver to identify grids. In addition 10 dynamic scenes. Clearly screen resolution and size are th e two different pixels on the screen have different depths to the r eceiver, key factors affecting the performance. iMac’s screen works the best some portion of the captured screen can be out of focus. How- with the largest size and second-highest resolution. Among differ- scene ever, the performance degradation is less than 7–8% for both ent screen types, OLED screens are the most preferable, beca use types. This is because HiLight relies on the relative color i nten- OLED screens do not have backlight and each pixel emits light in- es such sity change to extract data and is insensitive to static nois dependently. Therefore, colors on OLED screens are brighte r with as the image quality degradation caused by being out of focus . To higher contrast, making color intensity change easier to de tect. In distortion, further enhance HiLight’s robustness against perspective comparison, both LCD and LED screens have backlight that re- ransfor- one can leverage image projection algorithms (e.g., affine t s duces the color contrast. LED screens outperform LCD screen s show mation [8]) to avoid grid misidentification. Our experiment ly. because the LED backlight renders color change more precise es 100 that the current implementation of these algorithms requir ms to process a frame, thus we need an optimized implementati on Next, we fix the Samsung Tab S as the Varying the Camera. to support real-time communication. rocess transmitter, use different cameras to capture frames, and p ut. captured frames offline to examine the accuracy and throughp 6.4 Impact of Screen/Camera Hardware We test four types of cameras including a high-end SLR camera - with 18 MP resolution. Results in Table 5 demonstrate the sig We next examine how the physical capability of screens and as for nificant performance gain brought by high-resolution camer cameras affects HiLight’s performance. We test HiLight on differ- e pix- both types of scene. Higher-resolution cameras capture mor ent screens and cameras and examine its highest throughput w hile grids on els on the transmitter screen, and thus they support smaller maintaining accuracy above 90% . We adjust the distance for each the transmitter screen and achieve higher throughput. In pa rticular, ra pair of devices to ensure the captured screen size in the came the high-end SLR camera supports 720 grids on the 10-in scree n preview the same. enabling 6.6 Kbps throughput, six times higher than that ach ieved 7 ng by iPhone 5s in our prototype. Even the cameras on some existi - Using the 10-inch OLED screen, we observe minor pixel bright ◦ smart devices (Note 3 and S5) are sufficient for HiLight to rea ch 5 – ness degradation caused by viewing angle change (<10% at 60 ates 6 Kbps, similar to that of the high-end camera. This demonstr viewing angle). Its impact on HiLight’s performance is negl igible.

13 Scenery Bright Sport Medium 3 3 3 3 Drama Dark 2 2 2 2 User perception score User perception score User perception score User perception score 1 1 1 1 8 10 HiLight 1 2 Original 10 4 8 6 1 2 3 4 5 2 4 2 3 4 HiLight 5 Original 6 (%) Fixed ∆α (%) ∆α Fixed (%) Fixed (%) ∆α Fixed ∆α (c) Static scene, fixed α (d) Dynamic scene, fixed ∆ α (a) Static scene (b) Dynamic scene ∆ here the perception scores are explained in Table 7. Achievi ng scores of 2 or Figure 14: User perception on HiLight’s unobtrusive transmissions, w anslucency changes ( ncy uniformly with a fixed amount, HiLight adapts the pixel tr α ) below is acceptable. Comparing to changing pixel transluce ∆ non-uniformly across pixels, and thus diminishes the visua l effects caused by data communication. eness of the dynamic scenes). This result demonstrates the effectiv Table 6: Statistics of the 20 video clips used in our user study. based on sampled screen content pixels α HiLight’s adaptation of ∆ Static scene Scene Dynamic scene ∆ and its gradient smoothing of within each grid (Figure 6). α Dark type Bright Drama Sport Scenery Medium , the maximal amount of translucency changes that users Second # of video clips 3 3 4 3 3 4 cannot perceive depends on the screen content. Between stat ic and dynamic scenes, dynamic scenes allow larger pixel trans lu- User perception scores. Table 7: cency changes that are unnoticeable by users. We hypothesiz e that User perception Score es in this is because human eyes are less sensitive to subtle chang Certainly no noticeable visual effects. Good viewing exper ience. 1 re dras- pixel brightness when the screen content itself contains mo 2 Uncertain about visual effects. Still good viewing experie nce. we tic color intensity changes. We observe a similar pattern as 3 Certainly noticeable visual effects. analyze different types of dynamic scenes. Figure 14(d) plo ts the average user perception score for three types of dynamic sce nes, great HiLight’s robustness on diverse hardware platforms and its as the fixed increases. For drama scenes that have the most α ∆ ameras. potential on future smart devices with more sophisticated c drastic color intensity changes, 6% of fixed α changes are still al- most unnoticeable by users. Among the static scenes, we obse rve 6.5 User Perception ) atop that users can tolerate larger pixel translucency changes ( ∆ α Finally, we conduct a user study to examine whether HiLight darker screen content (Figure 14(c)). This is because the ab solute causes any noticeable visual artifacts on the original scre en con- brightness changes of darker areas are smaller than that of b righter e and tent. Our user study is conducted with 10 participants (5 mal ∆ areas. These observations together justify the configuration in α 5 female) in the age range of 18 to 30. We select 20 original vid eo for dark areas and ∆ HiLight (Algorithm 1), which increases the α . For clips (each lasts for 10 s) covering diverse scenes (Table 6) dynamic scenes. each video clip, we create six modified versions by encoding d ata ) changes. In one version, we use HiLight into pixel translucency ( α 7. POTENTIAL APPLICATIONS to encode data, which adapts ∆ α non-uniformly across pixels (Al- gorithm 1). In the other five versions, we change the values of all α HiLight can enable innovative interaction Augmented Reality. pixels uniformly with a fixed amount, varying from 1% to 5% for designs for augmented reality (AR) apps. For AR glasses with the static scenes and 2% to 10% for the dynamic scenes. Among ob- heads-up displays and cameras, they can leverage HiLight to the total 140 video clips (20 original and 120 modified), we ra n- tain additional information layers from any surrounding sc reens. domly assign 100 of them to each participant. To avoid partic ipant mation Users wearing smart glasses can acquire personalized infor response bias [12], we do not inform participants which vide o clips ram, from any screen, including the subtitles of a movie or TV prog rtic- are modified and which are original. Before the study, each pa ints customized coupon or URL link within an advertisement, and h ipant watches the original video clips on a Samsung Tab S scre en reen. in gaming apps, without affecting the content shown on the sc y, each to gain an initial sense of the video content. During the stud Continuous Authentication. Operating on the visible light spec- participant watches the assigned 100 video clips (mixed wit h the hys- trum, HiLight communication occurs only when the camera is p original and modified versions) in a random order on the Tab S henticate ically facing the screen. This allows us to continuously aut screen and rates from 1 to 3 to indicate the certainty of obser ving 8 g apps. user’s physical presence, highly valuable for many existin visual effects in each video clip . Table 7 shows the perception Consider a user wearing smart glasses in front of a screen. On - scores. Achieving a score of 1 is ideal and 2 is acceptable. continu- line banking or health-related apps can leverage HiLight to d We plot the average user perception scores in Figure 14(a) an ously authenticate user’s presence and immediately log out the user First , by adapting the amount (b). We make two key observations. reaming when the user leaves the screen. In addition, online video st ∆ of translucency changes ( α ) non-uniformly across the screen, Hi- providers such as Amazon can leverage HiLight to monitor the ex- α in ∆ Light significantly reduces the visual effects caused by the act portion of video a user has watched, and define accurate pr icing both static and dynamic scenes. User’s perception level on H iLight scheme based on the information. con- is very close to that of watching the original video clips. In trast, a fixed ∆ α across all pixels easily leads to noticeable visual Video Tagging. When capturing a video, it is hard to add ob- values (4% for static scenes and 6% for α ∆ effects even for small ject tags on the fly. Using HiLight, we can embed information a s- 8 sociated with the local object or location into all public sc reens. A participant can replay a video clip multiple times to deter mine etches Thus, when a camera captures video, it also simultaneously f the score.

14 We plan to vary the color (between black and white) across dif - the additional information from surrounding screens and at taches it ferent grids of the communication layer based on the content area with the video. Social apps such as Facebook can then automat i- color (e.g., use white color for dark areas and black for brig ht ar- ags for cally extract the information in the captured video and add t eas). This color adaptation can enhance the system reliabil ity atop the user. a wide range of screen content. Forth , to better support device mo- bility (e.g., a user wearing smart glasses as the receiver), we plan to 8. RELATED WORK HiLight, integrate low-complexity object tracking algorithms into a- We categorize existing research on screen-camera communic so that the receiver can keep tracking the screen area and dec ode tion into two categories. data even during constant hand or head movement. Finally , our system is still a one-way channel. To realize a two-way chann el, we Active research Obtrusive Screen-Camera Communication. de- can place two devices with screens facing each other and each ded has examined screen-camera communication using visible co vice uses its front-facing camera to receive data. Obtainin g receiver images [6]. One class of work focuses on boosting data rate, b y e.g., grid feedback allows the transmitter to adapt its configuration ( rcode leveraging better modulation schemes [26] or innovative ba ms size, frame window). We plan to explore the associated syste designs [14, 43]. Another class of work aims to enhance link r elia- and design challenges. 22, 28], bility, by tackling the frame synchronization problem [15, or enabling barcode detection at long distances [16, 23], or design- ing multi-layer error correction schemes [36]. A recent stu dy [40] 10. ACKNOWLEDGMENTS ur generates dynamic QR code to transmit sensor data on the fly. O The authors sincerely thank shepherd Chunyi Peng and the re- work leverages insights from these studies, but differs in t hat we viewers for their valuable feedback and Jiawen Chen at Googl e for aim to enable screen-camera communication without showing vis- his insights on GPU programming. We also thank DartNets Lab ible coded images. members Zhao Tian, Rui Wang, Fanglin Chen, and Xiaole An for he their support on our study. This work is supported in part by t Unobtrusive Screen-Camera Communication. Prior work has Dartmouth Burke Research Initiation Award and the National Sci- le studied how to hide information in a given screen content whi - ence Foundation under grant CNS-1421528. Any opinions, find enabling screen-camera communication. Yuan leverage wa- et al. ings, and conclusions or recommendations expressed in this mate- termarking to embed messages into an image [41, 42]. [10, 37, rial are those of the authors and do not necessarily reflect th ose of s with 39] enable unobtrusive communication by switching barcode the funding agencies or others. complementary hue. PiCode and ViCode [19] integrate barcod es with existing images to enhance viewing experiences. All th ese e codes. methods have greatly reduced the visual artifacts of visibl 11. REFERENCES B val- Yet they all require direct modifications of content pixel RG [1] https://www.opengl.org/ . real ues, which prevents them from supporting arbitrary scene in . [2] http://opencv.org/platforms/android.html time. The unique contribution of HiLight is to broaden the ap - http://developer.android.com/guide/ [3] plicable scenario of unobtrusive screen-camera communica tion. It . topics/resources/drawable-resource.html removes the need to directly modify RGB values and decouples [4] https://developer.android.com/about/ communication from screen content . HiLight is similar in spirit . versions/android-5.0.html to prior efforts that modulate bits by changing the screen’s back- http://en.wikipedia.org/wiki/OLED [5] . light [13, 18, 20]. All these designs, however, require spec ial hard- R [6] Automatic identification and data capture techniques - Q ware support (e.g., shutter plane). HiLight differs in that it works code 2005 bar code symbology specification. ISO/IEC upon off-the-shelf smart devices. 18004:2006. [7] A , A., SHOK . Challenge: Mobile optical networks ET AL 9. CONCLUSION (2010). through visual MIMO. In Proc. of MobiCom We presented HiLight, the first system enabling real-time, u nob- Geometry I . Springer Science & Business, , M. [8] B ERGER trusive screen-camera communication atop any screen scene (can 2009. helf be generated on the fly). We implemented HiLight on off-the-s [9] B ULLOUGH , J., ET AL . Effects of flicker characteristics from ttings. smart devices and evaluated HiLight in diverse practical se solid-state lighting on detection, acceptability and comf ort. By placing no constraint on the screen’s original functiona lity (dis- , 3 (2011), 337–348. Lighting Research and Technology 43 ens up playing content) while enabling communication, HiLight op C ARVALHO HEN HU , C.-H., [10] C , R., C , L.-J. IVC: AND opportunities for new HCI and context-aware applications. Proc. of HotMobile Imperceptible video communication. In Our current system has several limitations and possible ext en- (2014). (poster) sion that we plan to address in future work. First , the throughput of [11] D , D. J. Low density parity AY K AC M AND , M. C., AVEY 0 Hz). our system is currently limited by the screen refresh rate (6 check codes over gf (q). In Information Theory Workshop, For OLED screens, the physical response time of a pixel can be 1998 (1998), IEEE, pp. 70–71. uency less than 0.01 ms [5], translating into the pixel change freq , V., M [12] D , E., ELL , I., C EDHI , N., V AIDYANATHAN UTRELL h higher than 100 KHz. We plan to seek methods to better approac AND T HIES , W. "Yours is Better!": Participant Response this physical limit and boost the system throughput. Second , we (2012). Proc. of CHI Bias in HCI. In plan to explore advanced modulation and coding schemes to im - [13] F , D., ATTAL . A multi-directional backlight for a ET AL prove the transmission reliability. We observe that most bi t errors Nature wide-angle, glasses-free three-dimensional display. are randomly spread out. Since the current communication ch an- , 7441 (2013), 348–351. 495 han- nel is one-way without any feedback from the receiver, error [14] H , T., Z ING , G. COBRA: Color AO X HOU , R., AND [11] dling schemes such as low-density parity-check (LDPC) code barcode streaming for smartphone systems. In Proc. of or Reed-Solomon code [38] are good candidates. , our current Third (2012). MobiSys design sets a uniform color (black) for the communication la yer.

15 , E. Evaluation of global OANNIDIS I AND , P. L., OSIN [30] R U , Q. LightSync: Unsynchronized U [15] H , W., G U , H., AND P image thresholding for change detection. Pattern Proc. of visual communication over screen-camera links. In , 14 (2003), 2345–2356. Recognition Letters 24 MobiCom (2013). , N. V. Statistical approach to [31] S ETHI , I. K., P ATEL AND , Y., S HE , J., B IAN , AO UANG , Z., X , J., H UE [16] H U , W., M IS&T/SPIE’s Symposium on scene change detection. In HEN S K., AND , G. Strata: Layered coding for scalable Electronic Imaging: Science & Technology (1995), Proc. of MobiCom (2014). visual communication. In pp. 329–338. IAO , B.-Y. A robust scene-change [17] H UANG , C.-L., AND L , P. C., MITS [32] S NNONI , A. Toward A AND detection method for video segmentation. IEEE Transactions IEEE Transactions on specification-driven change detection. , 12 (2001), on Circuits and Systems for Video Technology 11 , 3 (2000), 1484–1488. Geoscience and Remote Sensing 38 1281–1288. AND , T., EI [33] S , X. Autodirective audio HANG UR , S., W Z [18] H UANG , S. Backlight modulation circuit having rough and capturing through a synchronized smartphone array. In ACM S fine illumination signal processing circuit, Mar. 27 2012. U Mobisys 2014 . Patent 8,144,112. , K. W., AN [34] T ET AL . FOCUS: a usable & effective approach UANG , W. H. PiCode: 2D barcode with OW M [19] H AND , W., (2013). to OLED display power management. In UbiComp embedded picture and ViCode: 3D barcode with embedded AND AN DE S ANDE A SSIGNOR , J., [35] V , T. Real-time (2013). video (poster). In Proc. of MobiCom beamforming and sound classification parameter generation ASUDA , M. Display AYASHI H AND , S., , K., M IMURA [20] K (2012), S007. TNO report TNO-DV in public environments. apparatus and method for controlling a backlight with ENG , C., AND [36] W ANG , A., M A , S., H U , C., H UAI , J., P multiple light sources of a display unit, Sept. 11 2012. US , G. Enhancing reliability to boost the throughput over HEN S Patent 8,264,447. Proc. of MobiCom screen-camera links. In (2014). , X. HiLight: N , T., A , C., C AMPBELL , A., AND Z HOU [21] L I HEN , O., S ENG Z AND , G., , HANG , C., Z ENG , A., P ANG [37] W Hiding bits in pixel translucency changes. In Proc. of the 1st B. InFrame: Multiflexing full-frame visible communication ACM MobiCom Workshop on Visible Light Communication channel for humans and devices. In Proc. of HotNets (2014). Systems (VLCS) (2014). Reed-Solomon [38] W , S. B., AND B HARGAVA , V. K. ICKER , J. [22] L I K AM W A , R., R AMIREZ , D., AND H OLLOWAY codes and their applications . John Wiley & Sons, 1999. Styrofoam: A tightly packed coding scheme for Proc. of the camera-based visible light communication. In IPPMAN , A., , G., L R ASKAR , R. VRCodes: OO [39] W AND 1st ACM MobiCom Workshop on Visible Light Unobtrusive and active visual codes for interaction by (2014). Communication Systems (VLCS) (2012). exploiting rolling shutter. In Proc. of ISMAR AND , G., H IURA , S., S OO OHAN MITHWICK , Q., , A., W [23] M OZAKI GAWA ONEZAWA , T., O [40] Y , H., , M., K , Y., N YONO , R. Bokode: Imperceptible visual tags for camera R ASKAR AKAZAWA N AND T OKUDA , H. AKAMURA , O., , J., N Proc. of SIGGRAPH based interaction from a distance. In SENSeTREAM: Enhancing online live experience with (2009). sensor-federated video stream using animated Proc. of UbiComp two-dimensional code. In (2014). AND S INSBURG UNSHI [24] M , A., G , D., HREINER OpenGL , D. ES 2.0 programming guide . Pearson Education, 2008. , M., A ARGA , A., SHOK ANA , W., D UAN [41] Y , K., V M AND , M., G RUTESER , N. Computer vision ANDAYAM , Y. Automatic video ANAKA T AND , A., AGASAKA [25] N Proceedings of the methods for visual mimo optical systems. indexing and full-video search for object appearances. In IEEE International Workshop on Projector-Camera Systems Proc. of the IFIP TC2/WG 2.6 Second Working Conference (2011), 37–43. (held with CVPR) (1992). on Visual Database Systems II [42] Y . Dynamic and invisible messaging for , W., ET AL UAN ERLI , S. D., A [26] P , N., AND K ATABI , D. PixNet: HMED visual mimo. In IEEE Workshop on Applications of Interference-free wireless links using LCD-camera pairs. In (2012). Computer Vision (WACV) Proc. of MobiCom (2010). ET AL . SBVLC: Secure barcode-based visible HANG , B., [43] Z , T. Compositing digital images. In , T., [27] P ORTER AND D UFF Proc. of light communication for smartphones. In ACM SIGGRAPH Computer Graphics (1984). (2014). INFOCOM , P., AND , N., L R OWE , A. Visual light AJAGOPAL [28] R AZIK HANG , H., K , S. W. MOLIAR S AND , A., [44] Z ANKANHALLI landmarks for mobile devices. In Proc. of IPSN (2014). Automatic partitioning of full-motion video. Multimedia , P. L. Thresholding for change detection. In [29] R OSIN , 1 (1993), 10–28. Systems 1 Computer Vision, 1998. Sixth International Conference on (1998), IEEE, pp. 274–279.

Related documents

GoPro Studio 2.5 User Manual for Windws Operating Systems

GoPro Studio 2.5 User Manual for Windws Operating Systems

GoPro Studio 2.0 User Manual for Windows Operating Systems

More info »
1c8e852bb3d840e9b4f813adb4671ad5

1c8e852bb3d840e9b4f813adb4671ad5

GE Healthcare MICT CT, PET/CT & Nuclear Medicine U.S. Accessories & Supplies Rev. 10/18

More info »
Fusion9 Tool Reference

Fusion9 Tool Reference

1 Contents Tool Reference Manual Fusion 9 July 2017

More info »
htkbook

htkbook

The HTK Book Steve Young Gunnar Evermann Dan Kershaw Gareth Moore Julian Odell Dave Ollason Dan Povey Valtcho Valtchev Phil Woodland The HTK Book (for HTK Version 3.2) c © COPYRIGHT 1995-1999 Microsof...

More info »
Pedestrian Environmental Quality Index Part I

Pedestrian Environmental Quality Index Part I

Pedestrian Environmental Quality Index (P.E.Q.I.) San Francisco Department of Public Health The PEQI is a quantitative observational tool that was originally developed in 2008 by the to assess the qua...

More info »
DaVinci Resolve 15 New Features Guide

DaVinci Resolve 15 New Features Guide

New Features Guide DaVinci Resolve 15 PUBLIC BETA

More info »