共同図形配置課題を対象としたモダリティと社会的関係性の共通基盤構築への影響分析, 古谷 優樹・齋藤 光輝・小倉 功裕・緑川 詠介・光田 航・東中 竜一郎・高汐 一紀, 知能と情報, 2025, 37 巻, 3 号, p. 662-670, 2025年8月

ロボットやバーチャルエージェントがユーザと自然に対話するためには,共通基盤の構築が不可欠である.人間同士の対話における共通基盤の構築過程を調査した先行研究は少なく,多くはテキストチャットを対象としている.そこで,本研究では,対話のモダリティ(音声・映像)と話者間の社会的関係性(初対面・知人)が共通基盤構築に与える影響を調査した.具体的には,これらの条件下で共通基盤が構築される対話の収集と分析を行った.その結果,モダリティの拡張や知人関係が共通基盤の構築を加速させることが明らかになった.言語行動の分析により,モダリティの拡張が共感の意図伝達を容易にし,知人関係がタスク進行に必要な発話を増加させることが判明した.これにより,ロボットやバーチャルエージェントの設計には,共感を伝える非言語行動や知人同士の発話行動の再現が重要であると示唆された.

Empirical Evaluation of Healthcare Communication Robot Encouraging Self-Disclosure of Chronic Pain, Airi Shimada・Kazunori Takashio, The 2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN),2025年8月

Self-disclosure of pain is essential to communicate pain, which is a subjective sensation, to a third party. However,many elderly people, especially those with chronic pain, are hesitant to communicate their pain. As a result, many patients do not receive appropriate treatment at the right time. It is
important to detect small discomfort in daily life and not overlook pain occurrence or changes. The ultimate goal of this study is to create a robot for people with chronic pain that notices the user ’s discomfort through many modalities in daily interactions, and tell the recorded information to a
hospital or family if necessary. In this study, we conducted fieldwork at Nichinan Hospital and based on the findings, we propose and verify the dialogue system that encourages selfdisclosure of pain. In this paper, we implemented a system that detects discomfort based on the user ’s utterance about pain and the action of rubbing, and asks detailed questions about the pain. We conducted a demonstration experiment with patients of Nichinan Hospital, and the content of the dialogue was evaluated by a physical therapist. The proposed method received significantly higher ratings for the naturalness of the conversation, the ease of use of the system, and the length of the conversation. The physical therapist’s evaluation suggested that the ability of the dialogue system to ”notice” the user ’s discomfort or unusualness had a positive effect on facilitating pain communication and encouraging self-disclosure. The results suggest that it is possible to realize a dialogue system that facilitates self-disclosure of pain by users who suffer from chronic pain.

Impact Analysis of Switching Pause Synchronization for Spoken Dialogue Systems, Yosuke Ujigawa・Kazunori Takashio, The 2025 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN), 2025年8月

Each individual has a unique mental tempo (referred to as personal tempo), and the alignment of this tempo plays a crucial role in facilitating smooth interactions with spoken dialogue systems. This study focuses on the “switching pause,” a key component of conversational tempo that is established during interaction. Using a dialogue corpus, we analyzed the impact of switching pauses on dialogue and the process of synchronization. Through the analysis of synchronization between pairs, we examined dialogues with high similarity in switching pauses to elucidate the impact of this synchronization on goal achievement and cooperativity in dialogue. Furthermore, we conducted a time-series analysis within pairs to investigate the synchronization process and proposed a method for determining switching pauses for implementation in dialogue systems. These findings contribute significantly to the investigation of individual differences in users and the identification of personal factors that enable effective dialogue with dialogue systems

Exploring the Impact of Modalities on Building Common Ground Using the Collaborative Scene Reconstruction Task Yosuke Ujigawa・Asuka Shiotani・Masato Takizawa・Eisuke Midorikawa・Ryuichiro Higashinaka・Kazunori Takashio, IWSDS2025, 2025年5月

To deepen our understanding of verbal and non-verbal modalities in establishing commonground, this study introduces a novel “collaborative scene reconstruction task.” In this task, pairs of participants, each provided with distinct image sets derived from the same video, work together to reconstruct the sequence of the original video. The level of agreement between the participants on the image order—quantified using Kendall’s rank correlation coefficient—serves as a measure of common ground construction. This approach enables the analysis of how various modalities contribute to the construction of commonground. A corpus comprising 40 dialogues from 20 participants was collected and analyzed. The findings suggest that specific gestures play a significant role in fostering common ground, offering valuable insights for thedevelopment of dialogue systems that leverage multimodal information to enhance the user construction of common ground.

Transparent Barriers: Natural Language Access Control Policies for XR-Enhanced Everyday Objects, Kentaro Taninaka・Rahul Jain・Jingyu Shi・Kazunori Takashio・Karthik Ramani, CHI2025, 2025年4月

Extended Reality (XR)-enabled headsets that overlay digital content onto the physical world, are gradually finding their way into our daily life. This integration raises significant concerns about privacy and access control, especially in shared spaces where XR applications interact with everyday objects. Such issues remain subtle in the absence of widespread applications of XR and studies in shared spaces are required for a smooth progress. This study evaluated a prototype system facilitating natural language policy creation for flexible, context-aware access control of personal objects. We assessed its usability, focusing on balancing precision and user effort in creating access control policies. Qualitative interviews and task-based interactions provided insights into users’ preferences and behaviors, informing future design directions. Findings revealed diverse user needs for controlling access to personal items in various situations, emphasizing the need for flexible, user-friendly access control in XR-enhanced shared spaces that respects boundaries and considers social contexts.

交替潜時の同調が対話に与える影響分析と決定モデルの構築,Yosuke Ujigawa ・Kazunori Takashio ,Human-Agent Interaction Symposium 2025,2025年2月

人間は固有の精神テンポ(パーソナルテンポ)を持ち,テンポの一致がシステムとの円滑な対話において重要な役割を果たす.本研究は,対話におけるテンポの同調,特に交替潜時と発話速度の関係性に着目し,これらが効果的なコミュニケーションに果たす役割を分析した.動的時間伸縮法を用いた分析により,交替潜時の同調が対話に与える影響を明らかにした.さらに,発話速度と交替潜時の相関関係を明らかにし,音声対話システムのための発話タイミング決定モデルを構築した.この研究成果は,より自然で適応的な対話システムの開発に寄与するものである.

Face Robot Performing Interaction with Emphasis on Eye Blink Entrainment,Iimori, Masato・Furuya, Yuki・Takashio, Kazunori(Keio Univ),2023 32nd IEEE International Conference on Robot and Human Interactive Communication (RO-MAN),2023年8月

Eyes play a significant role in human-human interaction, and blinking is particularly important as it can indicate a pause in the conversation and even lead to eye blink entrainment. However, most communication robots cannot reproduce eye blink movements due to cost constraints. Thus, our aim is to create a low-cost robot that can physically reproduce eye blink movements and induce eye blink entrainment. In this paper, we describe the implementation of the robot and evaluate the subjective impression of the robot’s eye blink movements. Our results suggest that the robot’s blinking behavior at pauses in the conversation facilitated the participants’ understanding of the robot’s speech. Our findings also suggest that simulating eye blink entrainment movement can increase the participant’s affinity and acceptance towards the robot in certain cases, and if the blinking is not well designed, affinity may be adversely affected

Synchronization of Speech Rate to User’s Personal Tempo in Dialogue Systems and Its Effects,Yosuke Ujigawa・Kazunori Takashio(Keio Univ),2024 IEEE International Conference on Robot and Human Interactive Communication (RO-MAN),2024年8月

Every individual lives in daily lives in own unique tempo, called Personal Tempo. Tempo is also highly important in dialogue situations, and it is thought that if the tempo can be matched with the conversational partner, it will lead to smoother communication with a higher level of comprehension. Spoken-dialogue systems have been used in many situations, and by personalizing dialogue on the basis of the user’s tempo, it is thought that dialogue will be able to make it easier to speak and make people want to speak. Previous research has focused on methods for encouraging users to change their tempo to be in tune with the tempo of their dialogue partner. However, a conversation that differs from the user’s tempo can be stressful and burdensome for the user in the process of tuning in.Therefore, we define personal tempo as speech speed, which is the number of moras divided by the duration of speech and propose a speech-speed control method for spoken-dialogue systems. We implemented our method in a spoken-dialogue system that synchronizes speech with the user. We verified the effectiveness of the proposed method by analyzing its impact on the comprehension of speech and user impressions of the spoken-dialogue system. The results indicate that significant differences were obtained with the proposed method between impression and comprehension of the speech content.

植物×ARエージェントによる一人暮らしの中での発話促進,戸沢実・高汐一紀(慶大),電子情報通信学会技術研究報告,vol. 124, no. 143,2024年8月

Since the pandemic, the decrease in face-to-face communication and the increase in feelings of loneliness have become
significant issues. Houseplants, which are often displayed as “green amenities” in rooms, are the most familiar plants that
humans can easily form attachments . Research that considers houseplants as conversation partners aims to draw out positive
emotions and improve mental health while maintaining privacy. By utilizing Mixed Reality technology, this approach promotes
self-care through interaction with plants. It contributes to the reduction of loneliness by building trust and controlling negative
emotions. It is expected that the agents, by encouraging anthropomorphism and self-dialogue, will have a positive impact on daily life.

言語モデルを用いた発話内容に基づくFACS生成モデルの提案,小橋龍人・宇治川遥祐・高汐一紀(慶大),電子情報通信学会技術研究報告,vol. 124, no. 143,2024年8月

This study proposes a model for generating facial expressions from speech text. While previous research has focused on generating facial animation from audio, this study concentrates on directly generating expressions from text. The output utilizes Action Units (AUs) based on the Facial Action Coding System (FACS). To reduce computational complexity and enhance model scalability, the proposed architecture employs only the encoder component of the Transformer, omitting the decoder.
The model is trained using a sliding window approach, enabling generation of expressions for each token in temporal sequence. The dataset for training was constructed by collecting publicly available videos from the web, performing facial expression detection, and transcribing the speech content.