3rd Week Report – WEBM&DIA

In the second part of the week, they continue to explore the idea that we had at the end of the previous week and we decided to go ahead with it, changing some details.

Idea:

Creation of a multi-agent system that manages the transmission of cameras present in a podcast, considering a maximum of 4 participants distributed by microphones on a table (usual organization of a podcast).

In the center of the table, there would be a camera in order to detect and record the participant who is speaking. It will be adjusted through a pan/tilt servo motor (360-degree rotation and 180-degree tilt). In order to decide who this camera should record, it will be connected by USB to the PC together with the 4 microphones. The PC should detect which microphone is being used to know which participant is speaking. Then it will send instructions via HTTP or MQTT to an Arduino microcontroller that is connected to the servo motor to do the first part of the approximation of capturing the image for the person in question. This approximation is not perfect, it just points the camera at fixed angles of the microphones, eg 90 degrees. The second part of the orientation involves receiving the images from the camera and detecting the person in it, through an algorithm like YOLO that runs on the computer. The PC will then constantly send more precise adjustment indications to the microcontroller, based on the detection made by the algorithm, best approximating the rotation and inclination angles of the servo motor so that the face of the speaker is at the center of the image.

In addition to the central camera, there will also be an external camera also connected to the PC that will be activated if a recurring conversation between 2 or more participants is detected. We can detect a conversation if 2 or more microphones activate within 10 seconds for example. In this case, the transmission will be switched to this camera.

The system must manage which of the cameras to use taking into account what is happening in the podcast and guide the central camera in the best way, without human intervention.