MobileHCI '20: 22nd International Conference on Human-Computer Interaction with Mobile Devices and ServicesFull Citation in the ACM Digital Library
SECTION: Mobile Interaction I
Page Load Time (PLT) is critical in measuring web page load performance. However, the existing PLT metrics are designed to measure the Web page load performance on desktops/laptops and do not consider user interactions on mobile browsers. As a result, they are ill-suited to measure mobile page load performance from the perspective of the user. In this work, we present the Mobile User-Centered Page Load Time Estimator (muPLTest), a model that estimates the PLT of users on Web pages for mobile browsers. We show that traditional methods to measure user PLT for desktops are unsuited to mobiles because they only consider the initial viewport, which is the part of the screen that is in the user’s view when they first begin to load the page. However, mobile users view multiple viewports during the page load process since they start to scroll even before the page is loaded. We thus construct the muPLTest to account for page load activities across viewports. We train our model with crowdsourced scrolling behavior from live users. We show that muPLTest predicts ground truth user-centered PLT, or the muPLT, obtained from live users with an error of 10-15% across 50 Web pages. Comparatively, traditional PLT metrics perform within 44-90% of the muPLT. Finally, we show how developers can use the muPLTest to scalably estimate changes in user experience when applying different Web optimizations.
Wearable devices, such as smart watches and fitness trackers are growing in popularity, creating a need for application developers to adapt or extend a UI, typically from a smartphone, onto these devices. Wearables generally have a smaller form factor than a phone; thus, porting an app to the watch necessarily involves reworking the UI. An open problem is identifying best practices for adapting UIs to wearable devices.
This paper contributes a study and data set of the state of practice in UI adaptation for wearables. We automatically extract UI designs from a set of 101 popular Android apps that have both a phone and watch version, and manually label how each UI element, as well as how screens in the app, are translated from the phone to the wearable. The paper identifies trends in adaptation strategies and presents design guidelines.
We expect that the UI adaptation strategies identified in this paper can have wide-ranging impacts for future research and identifying best practices in this space, such as grounding future user studies that evaluate which strategies improve user satisfaction or automatically adapting UIs.
For graphical user interface (UI) design, it is important to understand what attracts visual attention. While previous work on saliency has focused on desktop and web-based UIs, mobile app UIs differ from these in several respects. We present findings from a controlled study with 30 participants and 193 mobile UIs. The results speak to a role of expectations in guiding where users look at. Strong bias toward the top-left corner of the display, text, and images was evident, while bottom-up features such as color or size affected saliency less. Classic, parameter-free saliency models showed a weak fit with the data, and data-driven models improved significantly when trained specifically on this dataset (e.g., NSS rose from 0.66 to 0.84). We also release the first annotated dataset for investigating visual saliency in mobile UIs.
Non-linear perspectives have the potential to improve 3D scene perception by increasing the information bandwidth of 3D contents. As with the example of the Mercator projection of earth, they can reduce occlusions by showing more of the shape of an object than classical perspectives. However, an ill-advised construction of such “usually static” perspectives could make the original shape difficult to understand, drastically reducing the scene comprehension. Yet, despite of their potential, these perspectives are rarely used. In this paper we aim at making non-linear perspectives more widely usable on mobile devices. We propose to solve the understanding issue by allowing the user to control the transition between linear and non-linear perspectives in real-time with bending gestures. Using this approach, we present the first user study that investigates real-time manipulation of non-linear perspectives in an exploration task. Results show significant benefits of the approach, and give insights on the best bending gestures and configurations.
Inertial Measurement Units (IMUs) with gyroscopic sensors are standard in today's mobile devices. We show that these sensors can be co-opted for vibroacoustic data reception. Our approach, called VibroComm, requires direct physical contact to a transmitting (i.e., vibrating) surface. This makes interactions targeted and explicit in nature, making it well suited for contexts with many targets or requiring and intent. It also offers an orthogonal dimension of physical security to wireless technologies like Blue-tooth and NFC. Using our implementation, we achieve a transfer rate over 2000 bits/sec with less than 5% packet loss – an order of magnitude faster than prior IMU-based approaches at a quarter of the loss rate, opening new, powerful and practical use cases that could be enabled on mobile devices with a simple software update.
SECTION: Health and Wellbeing
Fitness trackers encourage users to set goals to improve personal wellbeing, but these goals sometimes remain unmet. Understanding how improved ways of communicating failure to meet fitness goals could help prevent negative thought cycles (rumination) and avoid reduced motivation for physical activity. To address this challenge, we studied how unmet goals can be presented in apps. We designed prototypes that showed unmet fitness goals. Radial and regular bar charts, single-coloured and multicoloured were used in the study. In a survey (n = 165), we compared the four versions and a textual description of the unmet goals. Then, we conducted follow-up interviews (n = 20) to gain a detailed understanding of the perceptions and feelings evoked by the prototypes. We found that bar graphs offered a significantly better potential for reflection and multicoloured charts triggered significantly more rumination. We contribute in-depth insights into designing systems that use goals and avoid potential negative effects of personal tracking.
Strength training improves overall health, well-being, physical appearance, and sports performance, with training programs specifying variables such as sets, repetitions, rest time, weight, and tempo. The repetitive nature of strength training, typically performed at fixed tempo, has made it challenging to develop entertaining exergames for common strength training exercises. We present StrengthGaming, which uses scaling and shuffling technique to provide tempo variations to strength training while preserving the training volume and training goals. It affords game designers more flexibility in designing strength training-based exergames. We developed a prototype game, inspired by FlappyBird, that uses a wearable orientation sensor to track the repetition tempo to control a flying character, using both fixed tempo design and dynamic tempo design. Results from our 24-person user study showed that dynamic tempo was significantly more entertaining than fixed tempo (p<0.01), and was preferred by participants.
Medical images taken with mobile phones by patients, i.e. medical selfies, allow screening, monitoring and diagnosis of skin lesions. While mobile teledermatology can provide good diagnostic accuracy for skin tumours, there is little research about emotional and physical aspects when taking medical selfies of body parts. We conducted a survey with 100 participants and a qualitative study with twelve participants, in which they took images of eight body parts including intimate areas. Participants had difficulties taking medical selfies of their shoulder blades and buttocks. For the genitals, they prefer to visit a doctor rather than sending images. Taking the images triggered privacy concerns, memories of past experiences with body parts and raised awareness of the bodily medical state. We present recommendations for the design of mobile apps to address the usability and emotional impacts of taking medical selfies.
Cognitive performance fluctuates throughout the day partly due to our “inner clock”. Systems that are aware of these fluctuations can adapt to their users’ momentary capacity and adjust task difficulty accordingly. Here, we evaluate the use of performance measures obtained from mobile game data to estimate players’ varying alertness. We developed a smartphone game that emulates three validated tasks for cognitive performance and conducted an in-the-wild study over two weeks with 30 participants to collect performance measures and subjective ground truth on alertness levels. Our results show that players’ performance can be explained by a generative model which is based on two established alertness-modulating processes, namely the homeostatic process and circadian alertness fluctuations. Our method can be used to implicitly model players’ alertness levels as a basis for cognition-aware applications.
Chronic respiratory diseases refer to a group of lung diseases that affect the airways and cause difficulty in breathing. Respiratory diseases are one of the leading causes of death and negatively impact the patients’ quality of life. Early detection and regular monitoring of lung functions might reduce the risk of death; however, lung function assessment requires the active supervision of a medical professional in a clinical setting. To make lung function tests more accessible and ubiquitous, researchers started leveraging mobile devices, which still require active supervision and demand extraneous effort from the user. In this work, we propose a convenient mobile-based approach that uses a monosyllabic voice segment called ‘A-vowel’ sound or ‘Aaaa...’ sound to estimate lung function. We conducted two studies (a lab study and an in-clinic study) with 201 participants to develop a detection model detecting ‘A-vowel’ sound from other acoustic events and a prediction model to estimate the lung function using the detected A-vowel sound. Our study shows that A-vowel sounds can be detected with 93% accuracy, and A-vowel sounds can estimate lung functions with 7.4-11.35% mean absolute error. We also conducted a validation study with 10 participants in a noisy environment and able to detect A-vowel segments with 71% F1-Score. Our results show auspicious directions to expand the horizon of mobile-based lung assessment.
SECTION: Mobile Interaction II
Public bookcases offer the opportunity to serendipitously discover books and to anonymously share books with others. The set of available books as well as the sharing patterns are highly dynamic, as anybody can freely take or donate books. This makes it difficult for users to see what is available or of interest to them. To support book sharing via public bookcases we developed a mobile AR application that highlights relevant books in the camera viewfinder and that facilitates searching for specific books. The application recognizes books via text and color features on the spine. In a lab study with 15 participants we evaluated our book recognition algorithm and found that it outperforms unaided visual search. We interviewed users of public bookcases and analyzed the bookcases’ setup and rate of change. A subsequent field evaluation of the AR application on nine public bookcases found a recognition accuracy of 80 % for 450 books under different conditions. The proposed approach provides the basis for effectively sharing books via public bookcases.
This article introduces the novel concept of distance-dependent barcodes, which provide users with different data based on their scanning distance. These barcodes employ color blending as the key technique to achieve distance-dependence. A simple yet robust encoding scheme is devised accordingly to distinguish between near and far users. Through several experimental results, the proposed technique is shown to be effective (in terms of clear separation between near and far scanners), reliable (as to successful scans), and practical (it can be used in off-the-shelf smartphones). A few representative use cases are then presented to establish distance-dependent barcodes as an enabling technology for context-aware mobile applications. They include casual interactions with public displays, where the role of users is determined based on their distance from a screen, and augmented reality in retail, where distance-dependent barcodes provide information on available goods with different granularities. Finally, distance-dependent barcodes are shown to be user-friendly and effective through a user study.
Adding a mid-air pen to Handheld Augmented Reality creates a new kind of bimanual interaction for which many fundamental interaction design questions have not been answered yet. In particular, menus are an essential component in most visual interfaces, but it is unclear how to best interact with them in this setting: using the pen in mid-air or on a surface, using the touchscreen, or by moving the smartphone itself. We compared basic menus for these methods by analyzing success rates, selection times, device movement, and subjective ratings. Our results indicate that interacting with a mid-air menu using the pen, and operating a menu with the hand holding the smartphone, are sufficiently competitive to the current standard of two-handed touchscreen interaction, so that interaction designers can freely choose among them based on the interaction context of their application.
Foldable handheld displays have the potential to offer a rich interaction space, particularly as they fold into a convex form factor, for collocated multi-user interactions. In this paper, we explore Tent mode, a convex configuration of a foldable device partitioned into a primary and a secondary display, as well as a tertiary, Edge display that sits at the intersection of the two. We specifically explore the design space for a wide range of scenarios, such as co-browsing a gallery or co-planning a trip. Through a first collection of interviews, end-users identified a suite of apps that could leverage Tent mode for multi-user interactions. Based on these results we propose an interaction design space that builds on unique Tent mode properties, such as folding, flattening or tilting the device, and the interplay between the three sub-displays. We examine how end-users exploit this rich interaction space when presented with a set of collaborative tasks through a user study, and elicit potential interaction techniques. We implemented these interaction techniques and report on the preliminary user feedback we collected. Finally, we discuss the design implications for collocated interaction in Tent mode configurations.
This work introduces the concept of camera handling for a 360 camera and proposes Director-360, a 360 camera enhanced with two novel handling techniques. Pointer and field-of-view (FoV) are designed to explicitly and implicitly capture the 360 photographer’s subject of interest within the 360 media at capture time. Pointer lets users specify a subject of interest about the scene by directly pointing the 360 camera as if using the camera as a flashlight, while FoV captures the user’s subject of interest within the 360 scene by mapping the user’s face direction to the 360 media. We described an implementation using deep-learning algorithms. We also presented the Director-360 Editor, which incorporates the handling data to streamline the post-editing process. To understand how Director-360 helps to compose 360 media, a pilot study was carried out to create video storytelling in three target scenarios. The results and user feedback from a pilot study were reported.
SECTION: Touch, Gestures & Tangible Interaction
We present MAGHair, a novel wearable technique that provides subtle haptic sensation by stimulating the body hair without touching the skin. Our approach builds on previous research in magnetic hair stimulation and magnetic locomotion. We use magnetic cosmetics to augment the body hair, which can then be stimulated by a wearable apparatus that combines electromagnets and permanent magnets. We provide technical insights on the implementation of a fully functional wrist-worn form factor and early adaptations into other form factors. In addition, we provide a workflow for evaluating and characterizing the magnetic cosmetic recipes. Finally, we evaluate MAGHair, which demonstrated that users could detect the sensation of hair movement that they described as gentle and unique.
We present Headbang, an interaction technique that enriches touch input on handheld devices through slight head movement gestures. This way, users can easily execute shortcuts, like Copy, Paste, or Share, to on-screen targets while touching them. Headbang utilizes the capabilities of commodity smartphones to track the user’s head with their front facing cameras. We evaluated Headbang in two studies and show that the system can be reliably used while sitting and walking and offers a similar accuracy and speed as touch interaction.
Multiple fingers are often used for efficient interaction with handheld computing devices. Currently, any tactile feedback provided is felt on the finger pad or the palm with coarse granularity. In contrast, we present a new tactile feedback technique, Active PinScreen, that applies localised stimuli on multiple fingers with fine spatial and temporal resolution. The tactile screen uses an array of solenoid-actuated magnetic pins with millimetre scale form-factor which could be deployed for back-of-device handheld use without instrumenting the user. As well as presenting a detailed description of the prototype, we provide the potential design configurations and the applications of the Active PinScreen and evaluate the human factors of tactile interaction with multiple fingers in a controlled user evaluation. The results of our study show a high recognition rate for directional and patterned stimulation across different grip orientations as well as within- and between- fingers. We end the paper with a discussion of our main findings, limitations in the current design and directions for future work.
Physio-Stacks: Supporting Communication with Ourselves and Others via Tangible, Modular Physiological Devices
Our physiological activity reflects our inner workings. However, we are not always aware of it in full detail. Physiological devices allow us to monitor and create adaptive systems and support introspection. Given that these devices have access to sensitive data, it is vital that users have a clear understanding of the internal mechanisms (extrospection), yet the underlying processes are hard to understand and control, resulting in a loss of agency. In this work, we focus on bringing the agency back to the user, by using design guidelines based on principles of honest communication and driven by positive activities. To this end, we conceived a tangible, modular approach for the construction of physiological interfaces that can be used as a prototyping toolkit by designers and researchers, or as didactic tools by educators and pupils. We show the potential of such an approach with a set of examples, supporting introspection, dialog, music creation, and play.
Is Implicit Authentication on Smartphones Really Popular? On Android Users’ Perception of “Smart Lock for Android”
Implicit authentication (IA) on smartphones has gained a lot of attention from the research community over the past decade. IA leverages behavioral and contextual data to identify users without requiring explicit input, and thus can alleviate the burden of smartphone unlocking. The reported studies on users’ perception of IA have painted a very positive picture, showing that more than 60% of their respective participants are interested in adopting IA, should it become available on their devices. These studies, however, have all been done either in lab environments, or with low- to medium-fidelity prototypes, which limits their generalizability and ecological validity. Therefore, the question of “how would smartphone users perceive a commercialized IA scheme in a realistic setting?” remains unanswered. To bridge this knowledge gap, we report on the findings of our qualitative user study (N = 26) and our online survey (N = 343) to understand how Android users perceive Smart Lock (SL). SL is the first and currently only widely-deployed IA scheme for smartphones. We found that SL is not a widely adopted technology, even among those who have an SL-enabled phone and are aware of the existence of the feature. Conversely, we found unclear usefulness, and perceived lack of security, among others, to be major adoption barriers that caused the SL adoption rate to be as low as 13%. To provide a theoretical framework for explaining SL adoption, we propose an extended version of the technology acceptance model (TAM), called SL-TAM, which sheds light on the importance of factors such as perceived security and utility on SL adoption.
SECTION: Transport I: Pedestrians & Cyclists
Listening to music while being on the move is common in our headphone society. However, if we want assistance in navigation from our smartphone, existing approaches either demand exclusive playback through the headphones or impact the listening experience of the music. We present a field evaluation of Attracktion, a spatial audio navigation system that leverages the access to single stems in a multi-track recording to minimize the impact on the listening experience. We compared Attracktion against current turn-by-turn navigation instructions in a field-study with 22 users and found that users perceived acoustic overlays with additional navigation information to have no impact on the listening experience. In terms of path efficiency, errors, and mental workload, Attracktion is on par with spoken turn-by-turn navigation instructions, and users liked it for the aspect of serendipity.
Using a smartphone while walking in urban traffic is dangerous. Pedestrians might become distracted and have to split their attention between traffic, walking and using the mobile device. The increasing level of automation in vehicles introduces novel challenges and opportunities for pedestrian-vehicle interaction, e.g external displays attached to automated vehicles. However, these approaches are hardly scalable and fail to provide clear information in a multi-user environment. We investigate whether a smartphone app could provide individual guidance to enhance pedestrian safety in future traffic. To this end, we tested three app concepts in a user study (N=24) and found that on-screen guidance increases the frequency of successful crossing decisions significantly. In addition, all participants indicated that they would use the proposed system, preferably with an unobtrusive colored bar indicating the safest crossing decision in real-time. Integrating smartphones into the interaction between vehicles and pedestrians could increase situational awareness while crossing roads, solve the scalability problem and thus foster pedestrian safety in future traffic scenarios.
The more people commute by bicycle, the higher is the number of cyclists using their smartphones while cycling and compromising traffic safety. We have designed, implemented and evaluated two prototypes for smartphone control devices that do not require the cyclists to remove their hands from the handlebars—the three-button device Tribike and the rotation-controlled Brotate. The devices were the result of a user-centred design process where we identified the key features needed for a on-bike smartphone control device. We evaluated the devices in a biking exercise with 19 participants, where users completed a series of common smartphone tasks. The study showed that Brotate allowed for significantly more lateral control of the bicycle and both devices reduced the cognitive load required to use the smartphone. Our work contributes insights into designing interfaces for cycling.
Visual as-the-crow-flies (ATCF) navigation methods are an increasingly popular alternative to existing turn-by-turn (TBT) navigation for cyclists. To better understand how people use them in everyday navigation and how they cope with the novel navigation method in challenging situations, we studied two main issues posed by ATCF navigation: knowing whether one is on the right route to their destination and knowing whether a turn leads into a dead end or detour. To investigate these two problems, we compared visual ATCF navigation against (1) TBT navigation and (2) an improved ATCF+ navigation system in two successive studies. We found that users encountered problems riding in the opposite direction to the destination and were often turning around as a result using the ATCF method. Using colour cues in the ATCF user interface we were able to reinforce correct route choices. Additionally, we found that unsuccessful route progression negatively correlates with user confidence.
Facebook’s “Mark Yourself Safe” or Google Person Finder are quite popular nowadays. Such applications generate crisis maps based on crowdsourced information during or after disasters. Crisis maps are inevitably an extremely effective digital dashboard application for rescue and reliefs. But what if there is even a partial Internet blackout after the disaster strikes? This is indeed a common scenario, but today’s crisis mapping solutions heavily depend on the Internet. In this paper, we discuss a thorough background study and design details of Soteria, an end-to-end solution for smartphone-based opportunistic crisis mapping in the fate of Internet blackouts. Soteria uses intelligent and energy-efficient mechanisms for opportunistic ad-hoc information collection and filtering along with data summarization and dashboard application for crisis mapping over end-users’ smartphones. The smartphone application intelligently incorporates and tunes the existing network systems and services at the backend to make the system work even when the conventional network infrastructure fails. We evaluate the performance of Soteria from multiple field-trials for over five years, and the observed quantitative and qualitative performance is extremely promising for its mass-scale adoption at the disaster-prone areas.
SECTION: Text & Messaging
The unique limitations of mobile environments make content creation and editing difficult. Microtasking—breaking down complex tasks into subtasks—requires shorter attention spans and quick interactions, making it suitable for mobile usage scenarios. Writing is an ideal process for mobile microtasking because of its many subgoals, but little is known about how writers can use this decomposition through the evolution of a document. In this paper we present findings from a controlled, week long study to characterize how writers use mobile microtasks while authoring a document. We found that writers created microtasks for editing and inserting information that generally required minimal writing. These tasks were especially well suited for mobile devices with writers completing tasks on commutes or while waiting for meetings. Writers who microtasked found it easy to interact with their document and complete tasks, writing and editing their document more overall compared to writers who instead edited their document directly on their phone.
In head-mounted display (HMD) interaction, text entry is frequently supported via some form of virtual touch, controller, or ray casting keyboard. While these options effectively support text entry, they often incur costs of additional external hardware, awkward movements, and hand encumbrance. We propose STAT, a low-cost, mobile, touch typing technique that leverages a smartphone screen located at the thigh, to support both tap and word gesture text input for HMDs. Through a controlled laboratory study, we explore the efficacy of our technique – including a comparison of typing in and out of an enclosed pocket – and present design recommendations for the opportunistic use of a personal touchscreen device positioned at a user’s thigh for HMD text entry.
While people primarily communicate with text in mobile chat applications, they are increasingly using visual elements such as images, emojis, and memes. Using such visual elements could help users communicate clearly and make chatting experience enjoyable. However, finding and inserting contextually appropriate images during the chat can be both tedious and distracting. We introduce MilliCat, a real-time image suggestion system that recommends images that match the chat content within a mobile chat application (i.e., autocomplete with images). MilliCat combines natural language processing (e.g., keyword extraction, dependency parsing) and mobile computing (e.g., resource and energy-efficiency) techniques to autonomously make image suggestions when users might want to use images. Through multiple user studies, we investigated the effectiveness of our design choices, the frequency and motivation of image usage by the participants, and the impact of MilliCat on mobile chat experiences. Our results indicate that MilliCat’s real-time image suggestion enables users to quickly and conveniently select and display images on mobile chat by significantly reducing the latency in the image selection process (3.19 × improvement) and consequently more frequent image usage (1.8 ×) than existing solutions. Our study participants reported that they used images more often with MilliCat as the images helped them convey information more effectively, emphasize their opinion, express emotions, and have fun chatting experience.
Touch-based devices, despite their mainstream availability, do not support a unified and efficient command selection mechanism, available on every platform and application. We advocate that hotkeys, conventionally used as a shortcut mechanism on desktop computers, could be generalized as a command selection mechanism for touch-based devices, even for keyboard-less applications. In this paper, we investigate the performance and usage of soft keyboard shortcuts or hotkeys (abbreviated SoftCuts) through two studies comparing different input methods across sitting, standing and walking conditions. Our results suggest that SoftCuts not only are appreciated by participants but also support rapid command selection with different devices and hand configurations. We also did not find evidence that walking deters their performance when using the Once input method.
Media (e.g. videos, images, and text) shared on social platforms such as Facebook and WeChat are often visually enriched through digital content (e.g. emojis, stickers, animal faces) increasing joy, personalization, and expressiveness. While voice messages (VMs) are experiencing a high frequent usage, they currently lack any form of digital augmentation. This work is the first to present and explore the concept of augmented VMs. Inspired by visual augmentations we designed and implemented an editor, allowing users to enhance VMs with background sounds, voice changers, and sound stickers. In a first evaluation (N = 15) we found that participants used augmentations frequently (2.73 per message on average) and rated augmented VMs to be expressive, personal and more fun than ordinary VMs. In a consecutive step, we analyzed the 45 augmented VMs recorded during the study and identified three distinct message types (decoration, composition and integrated) that inform about potential usage.
SECTION: Audio Input & Output
We transform traditional experience writing into in-situ voice-based multimedia authoring. Documenting experiences digitally in blogs and journals is a common activity that allows people to socially connect with others by sharing their experiences (e.g. travelogue). However, documenting such experiences can be time-consuming and cognitively demanding as it is typically done OUT-OF-CONTEXT (after the actual experience). We propose in-situ voice-based multimedia authoring (IVA), an alternative workflow to allow IN-CONTEXT experience documentation. Unlike the traditional approach, IVA encourages in-context content creations using voice-based multimedia input and stores them in multi-modal “snippets”. The snippets can be rearranged to form multimedia articles and can be published with light copy-editing. To improve the output quality from impromptu speech, Q&A scaffolding was introduced to guide the content creation. We implement the IVA workflow in an android application, LiveSnippets - and qualitatively evaluate it under three scenarios (travel writing, recipe creation, product review). Results demonstrated that IVA can effectively lower the barrier of writing with acceptable trade-offs in multitasking.
Characterizing the Effect of Audio Degradation on Privacy Perception And Inference Performance in Audio-Based Human Activity Recognition
Audio has been increasingly adopted as a sensing modality in a variety of human-centered mobile applications and in smart assistants in the home. Although acoustic features can capture complex semantic information about human activities and context, continuous audio recording often poses significant privacy concerns. An intuitive way to reduce privacy concerns is to degrade audio quality such that speech and other relevant acoustic markers become unintelligible, but this often comes at the cost of activity recognition performance. In this paper, we employ a mixed-methods approach to characterize this balance. We first conduct an online survey with 266 participants to capture their perception of privacy qualitatively and quantitatively with degraded audio. Given our findings that privacy concerns can be significantly reduced at high levels of audio degradation, we then investigate how intentional degradation of audio frames can affect the recognition results of the target classes while maintaining effective privacy mitigation. Our results indicate that degradation of audio frames can leave minimal effects for audio recognition using frame-level features. Furthermore, degradation of audio frames can hurt the performance to some extend for audio recognition using segment-level features, though the usage of such features may still yield superior recognition performance. Given the different requirements on privacy mitigation and recognition performance for different sensing purposes, such trade-offs need to be balanced in actual implementations.
Conversational agents are rich in content today. However, they are entirely oblivious to users’ situational context, limiting their ability to adapt their response and interaction style. To this end, we explore the design space for a context augmented conversational agent, including analysis of input segment dynamics and computational alternatives. Building on these, we propose a solution that redesigns the input segment intelligently for ambient context recognition, achieved in a two-step inference pipeline. We first separate the non-speech segment from acoustic signals and then use a neural network to infer diverse ambient contexts. To build the network, we curated a public audio dataset through crowdsourcing. Our experimental results demonstrate that the proposed network can distinguish between 9 ambient contexts with an average F1 score of 0.80 with a computational latency of 3 milliseconds. We also build a compressed neural network for on-device processing, optimised for both accuracy and latency. Finally, we present a concrete manifestation of our solution in designing a context-aware conversational agent and demonstrate use cases.
See What I’m Saying? Comparing Intelligent Personal Assistant Use for Native and Non-Native Language Speakers
Limited linguistic coverage for Intelligent Personal Assistants (IPAs) means that many interact in a non-native language. Yet we know little about how IPAs currently support or hinder these users. Through native (L1) and non-native (L2) English speakers interacting with Google Assistant on a smartphone and smart speaker, we aim to understand this more deeply. Interviews revealed that L2 speakers prioritised utterance planning around perceived linguistic limitations, as opposed to L1 speakers prioritising succinctness because of system limitations. L2 speakers see IPAs as insensitive to linguistic needs resulting in failed interaction. L2 speakers clearly preferred using smartphones, as visual feedback supported diagnoses of communication breakdowns whilst allowing time to process query results. Conversely, L1 speakers preferred smart speakers, with audio feedback being seen as sufficient. We discuss the need to tailor the IPA experience for L2 users, emphasising visual feedback whilst reducing the burden of language production.
Adoption and use of smartphone-based asynchronous voice messaging has increased substantially in recent years. However, this communication channel has a strong tendency to polarize. To provide an understanding of this modality, we started by conducting an online survey (n=1,003) exploring who is using voice messages, their motives, and utilization. In a consecutive field study (n=6), we analyzed voice messaging behavior of six avid voice message users in a two-week field study, followed by semi-structured interviews further exploring themes uncovered in our survey. Conducting a thematic analysis, we identified four themes driving voice messaging usage: convenience, para-linguistic features, situational constraints and the receiver. Voice messaging helps to overcome issues of mobile communication, through ease of use, asynchronous implementation, and voices’ rich emotional context. It also was perceived as enabling more efficient communication, helps to handle secondary occupations, and better facilitates maintenance of close relationships. Despite the increased effort required to listen to a voice message, they complement communication with people we care about.
SECTION: Transport II: Automotive
Autonomous vehicles are complex systems that may behave in unexpected ways. From the drivers’ perspective, this can cause stress and lower trust and acceptance of autonomous driving. Prior work has shown that explanation of system behavior can mitigate these negative effects. Nevertheless, it remains unclear in which situations drivers actually need an explanation and what kind of interaction is relevant to them. Using thematic analysis of real-world experience reports, we first identified 17 situations in which a vehicle behaved unexpectedly. We then conducted a think-aloud study (N = 26) in a driving simulator to validate these situations and enrich them with qualitative insights about drivers’ need for explanation. We identified six categories to describe the main concerns and topics during unexpected driving behavior (emotion and evaluation, interpretation and reason, vehicle capability, interaction, future driving prediction and explanation request times). Based on these categories, we suggest design implications for autonomous vehicles, in particular related to collaboration insights, user mental models and explanation requests.
In this paper we present use cases for affective user interfaces (UIs) in cars and how they are perceived by potential users in China and Germany. Emotion-aware interaction is enabled by the improvement of ubiquitous sensing methods and provides potential benefits for both traffic safety and personal well-being. To promote the adoption of affective interaction at an international scale, we developed 20 mobile in-car use cases through an inter-cultural design approach and evaluated them with 65 drivers in Germany and China. Our data shows perceived benefits in specific areas of pragmatic quality as well as cultural differences, especially for socially interactive use cases. We also discuss general implications for future affective automotive UI. Our results provide a perspective on cultural peculiarities and a concrete starting point for practitioners and researchers working on emotion-aware interfaces.
Today’s developments in the automotive industry are moving towards automated driving. At the highest levels, the driver becomes the passenger, which presents a new challenge for human-computer interaction. People not only have to trust in the automated system but are also confronted with increased complexity, as it is often not clear what the automated vehicle is about to do. An ambient light display is one way to give the driver a clearer picture of the car’s intentions and to keep complexity low. We have examined the impact on trust and user experience in more detail using two concepts for an ambient light display. One design provides information about detected potential conflicts in the current trajectory. The other design also highlights the future trajectory. We implemented both concepts as virtual light bars at the bottom of the screens. We evaluated them in a fixed-base driving simulator with 18 participants against each other and a baseline condition without additional information. Our scenario is a fully automated journey (SAE Level 5) through a German town. Although the two concepts do not differ much from each other, only the display showing both information – possible conflicts and future driving route – provided a clear added value for the users.
Personalization of user experience has a long history of success in the HCI community. More recently the community has focused on adaptive user interfaces, supported by machine learning, that reduce interaction efforts and improves user experience by collapsing transactions and pre-filtering results. However, generally, these more recent results have only been demonstrated in the laboratory environment. In this paper, we share the case of a deployed mobile transit app that adapts based on users’ previous usage. We examine the impact of adaptation, both good and bad, and user abandonment rates. We conducted an 18-month assessment where 2,616 participants (with and without vision impairments) were recruited and participated in an A/B study. Finally, we draw some insights on some unusual effects that appear over the long term.