Every individual lives in daily lives in own unique tempo, called Personal Tempo. Tempo is also highly important in dialogue situations, and it is thought that if the tempo can be matched with the conversational partner, it will lead to smoother communication with a higher level of comprehension. Spoken-dialogue systems have been used in many situations, and by personalizing dialogue on the basis of the user’s tempo, it is thought that dialogue will be able to make it easier to speak and make people want to speak. Previous research has focused on methods for encouraging users to change their tempo to be in tune with the tempo of their dialogue partner. However, a conversation that differs from the user’s tempo can be stressful and burdensome for the user in the process of tuning in.Therefore, we define personal tempo as speech speed, which is the number of moras divided by the duration of speech and propose a speech-speed control method for spoken-dialogue systems. We implemented our method in a spoken-dialogue system that synchronizes speech with the user. We verified the effectiveness of the proposed method by analyzing its impact on the comprehension of speech and user impressions of the spoken-dialogue system. The results indicate that significant differences were obtained with the proposed method between impression and comprehension of the speech content.
植物×ARエージェントによる一人暮らしの中での発話促進,戸沢実・高汐一紀(慶大),電子情報通信学会技術研究報告,vol. 124, no. 143,2024年8月
Since the pandemic, the decrease in face-to-face communication and the increase in feelings of loneliness have become
significant issues. Houseplants, which are often displayed as “green amenities” in rooms, are the most familiar plants that
humans can easily form attachments . Research that considers houseplants as conversation partners aims to draw out positive
emotions and improve mental health while maintaining privacy. By utilizing Mixed Reality technology, this approach promotes
self-care through interaction with plants. It contributes to the reduction of loneliness by building trust and controlling negative
emotions. It is expected that the agents, by encouraging anthropomorphism and self-dialogue, will have a positive impact on daily life.
言語モデルを用いた発話内容に基づくFACS生成モデルの提案,小橋龍人・宇治川遥祐・高汐一紀(慶大),電子情報通信学会技術研究報告,vol. 124, no. 143,2024年8月
This study proposes a model for generating facial expressions from speech text. While previous research has focused on generating facial animation from audio, this study concentrates on directly generating expressions from text. The output utilizes Action Units (AUs) based on the Facial Action Coding System (FACS). To reduce computational complexity and enhance model scalability, the proposed architecture employs only the encoder component of the Transformer, omitting the decoder.
The model is trained using a sliding window approach, enabling generation of expressions for each token in temporal sequence. The dataset for training was constructed by collecting publicly available videos from the web, performing facial expression detection, and transcribing the speech content.