A video conferencing
server (100) receives and combines video streams captured by cameras of plural video clients (101) and generates immersive video streams (124, 125) for delivery to and play-out by these video clients (101). A
cut-out module (102) in the video conferencing
server (100) generates a foreground
mask (122) for a video frame (121) received from a conferencing
client (101). A camera shake
detector (103) determines a displacement vector (123) for a subset of features in the video frame (121). The displacement vector (123) represents a two-dimensional motion of the subset of features between a background
mask and a previous background
mask for a previous video frame received from the same conferencing
client (101). A camera shake correcting module (102, 104) applies a displacement opposite to the displacement vector (123) to the foreground mask (122) before use thereof in the immersive video streams (124, 125) for conferencing clients (101), and a signalling unit (104) generates a shake indication (311, 312) into the immersive video
stream (124) delivered to the conferencing
client (101) whose camera is shaking.