Prof. Sotiris Manitsaris: “AI can redefine how music is performed”

Prof. Sotiris Manitsaris: “AI can redefine how music is performed”
Image générée à l'aide de Midjourney (IA).

Can machines use AI to understand and predict human behavior? This was one of the big questions Professor Sotiris Manitsaris, Deputy Director of the Robotics Center at MINES ParisTech, tackled during the SKEMA Centre for Artificial Intelligence (SCAI) seminar series. After the event, SCAI Director Margherita Pagani continued the discussion with him.

Human-robot collaboration

You work on machines anticipating human behavior to respond accordingly. How do you address unpredictability in human actions?

Sotiris Manitsaris, at SKEMA Business School

When I talk about anticipation, I am referring primarily to professional contexts rather than everyday life. My work focuses on scenarios like manufacturing, where operators perform industrial tasks through specific actions and gestures in space and time.

In such controlled environments, actions are often predictable to some extent. For example, when assembling a car door in a collaborative workspace with a robot, both the human and the robot need to be perfectly synchronized. Synchronization requires the robot to anticipate the human’s rhythm—how fast or slowly he is going—and actions in space to deliver, for example, a screwdriver at the right place and time

But even in a controlled environment, unpredictability still exists…

Yes, unpredictability still exists. If a worker changes their movement unexpectedly, the robot activates what we call a “collision avoidance system.” This means that the algorithm will recalculate alternative trajectories to avoid collisions, ensuring safety while maintaining workflow. It is similar to how a GPS recalculates the route when you deviate from the proposed path.

This is called “collaborative robotics”; robots are equipped with sensors that can feel distance and human presence in a room and an artificial intelligence (AI) part that constantly calculates optimal pathways. Our robots are reactive, meaning they can adapt both temporally and spatially to the operator’s behavior.

“Variability increases depending on individual characteristics, cultural background, and personal movement styles

sotiris manitsaris

What about variability in gestures and body language, based on individual differences?

This is indeed a significant challenge in human-robot collaboration. If you ask one person to perform the same gesture ten times, you will notice slight variations. With multiple individuals, the variability increases even more due to individual characteristics, cultural background, and personal movement styles. To address this, we need a diverse set of users to collect as much data as possible. This involves recording repetitions of gestures from users with different characteristics to train the AI effectively.

However, relying solely on data collection is not always feasible or sufficient. In these cases, we employ mathematical models to describe the actions in ways that are invariant. This means identifying the core elements of a movement that remain consistent regardless of individual variations.

For example, when an operator reaches for a tool on an assembly line, the fundamental action—reaching for a tool—can be broken down into primitives. Primitives are basic units of movement that have a common basis, like assuming a posture or grabbing a tool. By focusing on these primitives, rather than replicating exact movements, robots can understand and anticipate human actions more accurately, and collaborate effectively with a wide range of users.

So how do you ensure the AI system works across different demographics and applications?

Generalization is a complex issue. The performance of an AI system is heavily influenced by the data it is trained on. If the training data predominantly represent one demographic, the AI may not perform well with others.

In industrial applications, where data is often confidential, we prioritize precision over generalization. We aim to create models tailored to specific demographics or tasks. This ensures high accuracy and reliability within that controlled environment.


Watch also: Prof. Christopher Tucci: « Most companies are too slow in adopting AI »


In contrast, for applications in public spaces, like airports or museums, the objective is the opposite. We want the system to understand the gestures of everybody—aiming for broader inclusivity and generalization. This requires collecting a wide variety of training data.

However, balancing precision and generalization is challenging. Since it is not always possible to get access to a large amount of data and confidentiality prevents reusing data from different sources. We continuously work on innovative methods to gather diverse data sets and develop algorithms that can adapt to various demographics and applications, ensuring the AI systems are both precise and inclusive.

“AI allows us to expand musical possibilities by using the entire body, not just fingers or hands”

sotiris manitsaris

What ethical considerations do you consider when designing AI systems for analysing and predicting human behaviour?

There are several important ethical considerations. One key aspect is transparency. Users should have a clear understanding of how their data will be used. When people notice their gestures being recorded, they may worry about being replaced by robots. To address this, it is crucial to explain the purpose of the AI system clearly. In my work, the goal is not to replace human workers but to assist them. For example, by helping workers perform certain tasks more efficiently, we can reduce health risks and ergonomic problems, thereby improving their work experience and well-being.

Another significant challenge is ensuring that the algorithms are fair and not biased, and that they do not function as “black boxes.” It is important to develop systems that can explain their internal processes. This involves creating algorithms that are interpretable and providing clear documentation on how decisions are made.

Privacy and consent are also important considerations. Users should give informed consent for data collection and use. This means providing detailed information about what data will be collected, how it will be used, and who will have access. Ensuring data anonymization and secure storage are also important to protect privacy.

AI and artistic creation

During the seminar, you mentioned using AI to enhance musical instruments. How does this work?

AI allows us to expand musical possibilities by using the whole body, not just fingers or hands. For instance, we can design systems where a dancer’s movements generate sound in real-time, transforming physical gestures into music. Unlike traditional instruments, which primarily rely on finger movements, these systems enable a more immersive and expressive musical experience.


Read also: Is ChatGPT really a threat to the world of education?


Translating movements into sounds is called “movement sonification” and is achieved through a combination of sensors and AI. Sensors capture the performer’s movements to extract meaningful information like speed, direction, and intensity. AI algorithms help filter and interpret this data to identify specific gestures or patterns. The processed data is then mapped to various sound parameters—for example, the speed of a movement might control the pitch. This mapping is often customizable, allowing performers to tailor the sonification to their artistic vision. Finally, the mapped data generates sound in real-time using synthesizers, samplers, or other sound generation techniques.

Can you give examples of application in real life?

AI can be used to develop new instruments and enhance musical creativity. For example, consider a tabletop instrument where you play notes with one hand while using the other to modulate sound in real-time, adding depth and expressivity. This redefines how music is created and performed, introducing 3D gesture-tracking instruments.

Movement sonification can also be extended for learning purposes to help teach skills. For instance, a glass blowing expert’s gestures can be translated into music. The learner, when mimicking, can hear when their movements deviate, as incorrect motions produce dissonant or disharmonious sounds. This immediate auditory feedback helps refine their technique. This can similarly be applied in other contexts as well, such as physical therapy, sports training, etc.

Can AI create something entirely new, or is it limited by design?

The concept of newness is subjective and difficult to define mathematically. While AI can produce novel combinations of existing data, it cannot create something entirely original. Its outputs are inherently based on the data it has been trained on. In other words, what is has been taught. For instance, if you train an AI on a specific walking style, it can only imitate or generate variations of that style; it cannot invent something from scratch. A golden rule that we always keep in mind is that you need your data to represent what you want to recognize or generate in the end.

Margherita PaganiDirector of SKEMA Centre for Artificial Intelligence

All author's posts

Close Menu