Users do not have to replace existing conference phones or use a different microphone for video conferences versus audio conferences. For conference calls, it becomes very easy to mix some participants with audio-only and others who have audio and video. Moreover, if the IP network used by the video channel is heavily congested and video quality is affected, communication can continue uninterrupted as audio-only. Unfortunately using the conventional telephone network for the audio channel of a video conference makes lip-sync much harder to achieve.
It is no longer practical to use the prior art technique of adding timestamps to both channels because the POTS Plain Old Telephone System telephone network is extremely bandwidth limited and will only reliably carry signals within the range of human hearing. Thus adding time stamps to the audio channel would substantially interfere with the audio portion of the teleconference, which is undesirable. We will describe techniques suitable for use with embodiments of our new video conferencing system which enable synchronisation of audio and a digital data stream, for example a video data stream, in embodiments without interfering with or modifying the telephone audio in any way.
Embodiments of the system enable automatic establishment of a video conference call over an IP internet protocol computer network, automatically detecting establishment of the audio conference call set up and then setting up the corresponding video call between the computer equipment connected to the respective nodes, where possible. The teleconference may have just two nodes, that is it may be a point-to-point call, or may have more than two connected nodes. In embodiments a node unit includes code to detect connection of the unit to a potential conference call.
Product | Voice and Video Conferencing Fundamentals
The outgoing audio announce message is transmitted following such potential connection detection. More particularly in embodiments the node sends a connection detection message to the server identifying detection of the potential conference call and receives a response to this message from the server for use in transmission of the outgoing audio announce message.
Thus broadly speaking, in embodiments a node unit announces connection of the phone to which is connected into a potential conference call, by sending an audio message, for example a tone, tone combination or tone sequence. A node unit also detects such audio messages sent from one or more other node units connected to a conference call, the audio messages being sent over the telephone network.
However in embodiments, whilst an incoming audio announce message does identify a remote node, it does not identify it to the local node—instead the local node passes it on and the system control server does the identification. In a similar way, in embodiments the transmitted identifier for a node unit identifies the node unit to the server.
In embodiments a node receiving an incoming audio announce message decodes the node identification data and sends this on to the server via the computer network, but the local does not need to know which remote node the message came from: in embodiments the directory information is held in the server. When a node unit already on a conference call detects another node unit phone joining, it then sends to the system control server information which tells the server that the two identified node units are on a shared audio call.
The server then enables set-up of a digital connection between the computer equipment connected to the respective node units, for example by providing connection data to the connected computer equipment, in embodiments via the node units, so that they can set up the digital connection. In embodiments the connection data comprises sharing identifier data for connecting the computer equipment of each respective node unit to a digital media sharing service. The respective node units may receive the same or complimentary sharing identified data to enable set up of the digital connection.
In embodiments the computer equipment is connected to the computer IP network via the node units and the node units manage the forwarding of the sharing identifier data to the respective computer equipment.
- US9225938B2 - Video conferencing systems - Google Patents.
- New Book Voice and Video Conferencing Fundamentals!
- What Is Video Conferencing and How Does It Work?!
- USB2 - Video conferencing systems - Google Patents.
- Browse more videos.
However, potentially, the computer equipment associated with each node unit may have a separate connection to the computer network and set up of the digital call may nonetheless be managed by the system control server by passing connect messages over the computer network. In embodiments, three or more node units may each be connected into a pair of audio and digital conference calls which are separate in the sense that they operate over separate audio analogue or VOIP and digital computer networks but which are synchronised so as to give the impression to a user of a single combined audio and digital conference call, giving the impression that the call is operating seamlessly over a single network.
In some preferred embodiments, for security, when a node detects connection to a potential conference call it connects to the server and receives a temporary identifier which is used in place of the identifier it uses initially to connect to the server and, optionally, this initial connection may be secure.
In this way each node unit may be allocated a temporary identifier, similar to a one time password, which may be employed to conceal the true identities of the node units connecting to the conference call, to restrict the possibility of an eavesdropping attack on the digital call. Scope for such attack is nonetheless limited because of the separate use of the telephone network for signalling to set up the digital call. In embodiments, therefore, a node comprises code to request and receive from the server a temporary identifier, for example a random, unique-in-time value, and this in encoded into the outgoing audio announce message.
When another node unit receives this message and decodes the code this temporary identifier may be linked with the receiving nodes identifier and the two together passed to the control server, which is then able to identify the true identities of the two nodes since the control server maintains a table or other relationship linking the temporary and permanent or true node identities.
In embodiments the messages passed between a node and the server include an IP address of the node and an encrypted identifier of the node. In general a node unit will store such a unique identifier internally in non-volatile memory. The transformation may be implemented, using, for example, a look up table or a hash function. Where transformation or encryption of a true node identifier is employed, this may be lossless to reduce or avoid address conflicts.
The code in a node may treat any dialled number as a potential conference call but in some preferred embodiments to reduce the server load a node is configured to apply a first pass filter to screen out calls which are determined not to be conference calls. Such a filter may be based on one or more of whether or not the dialled number is an internal extension.
The country code of the dialled number, the area code and so forth. Additionally or alternatively the server may perform similar screening. Thus in embodiments the system includes code to monitor the telephone network audio connection to identify a phone number which may include a PIN dialled by a phone connected to the node unit. However it will be appreciated that it is not essential to be able to detect whether or not a number is a conference all as preferably after filtering all calls may be treated as potential conference calls so that later joining nodes may be connected.
In embodiments a node includes code to detect and resolve a conflict between audio announce messages, for example by randomly backing off transmission of an outgoing audio announce message. In embodiments where the digital connection comprises a streamed video connection, preferably the system, more particularly a node, includes code to synchronise the audio and video carried over the two separate networks, for example by controlling delay of the audio data between the phone network and phone.
Related aspects of the invention provide, independently, a node unit and a system control server as described above, and corresponding processor control code. In preferred embodiments the server is configured to store data linking or mapping an outgoing audio announce message as described above to the node unit transmitting the audio announce message. This may be achieved, for example, by the server sending data for inclusion the audio announce message to the node in the first place, or the node may employ an announce message comprising data that is permanently unique to the node, thus allowing the node transmitting the message to be identified.
In some preferred embodiments the method further comprises using the audio signalling to announce that a second or subsequent phone has joined the audio conference call, detecting this, via the audio telephone network, at a first phone and then reporting, from a load associated with the first phone, that the first and second phones are on a shared audio conference call so that the system control server can set up a corresponding digital link between the computer equipment associated with the respective nodes.
Thus in embodiments audio signalling over the phone network comprises sending an audio signal for setting up the digital streamed media conference call from a node unit over the phone network, and receiving a response to the audio signal at the node via the computer network. Although the node unit may hear an audio signal from another node unit it does not listen to or act on this signalling, but instead uses the computer network as a return signal path to receive a call setup signal, bearing call setup data, from the server: Whilst a node can receive the incoming announce message, it does not have a concept of who it came from; in preferred embodiments the directory information is held in the server.
The invention further provides processor control code to implement the above-described systems, devices and methods, for example on a general purpose computer system or on a digital signal processor DSP. Flash or read-only memory Firmware. The teleconference may have just two nodes, that is, it may be a point-to-point call, or may have more than two connected nodes. In embodiments the first digital data stream may comprise a video data stream from a video camera, captured at the first node.
Depending upon the number of other nodes connected to the teleconference system, synchronisation may either be achieved by locally delaying, for example, the received audio to align this with the received video at a receiving node or, in a system with more than two nodes, by delaying the transmitted audio from each of the nodes so that when the audio is mixed together in a telephone conference bridge, all audio channels are in time synchronisation with each other.
Basics of Video Conferencing
Once this has been established, the combined audio received from the conference bridge and separate video channels received from the other nodes can be aligned by adding appropriate time delay offsets at the receiving node noting that in embodiments a node receives the video streams separately from the other nodes.
In some preferred embodiments the time offset at the receiving node is determined by applying a corresponding audio characterising function to the received audio as that used to generate the audio characterising data at the transmitting node; this audio characterising data can then aligned to determine a time offset, for example by simple comparison, correlation or other techniques. In embodiments the audio characterising data includes data identifying a pattern of sound level variation, and the comparing or similar disregards sound levels below a threshold level, or at least weights these to have a reduced significance.
This facilitates alignment where multiple audio streams are present simultaneously on a single audio line. The audio provided to the telephone network may be provided via an audio characterising module or it may be provided directly to the phone network and an audio characterising module may listen in on the audio to determine the audio characterising data.
In embodiments the telephone system comprises three or more nodes, and corresponding techniques to those described above are employed to determine time offsets between each video stream received at the local node and the combined audio stream received from the audio conference bridge.
Any of a range of techniques may be used to determine the time offset between each video stream and every other video stream, for example by comparing embedded timestamps against a master reference transmitted over a control channel. From these results, the time offset between the multiple audio streams mixed together in the audio received from the conference bridge can be determined.
These calculated time offsets are communicated to each of the remote nodes over the computer network. Each remote node is then able to delay its outgoing audio stream so as to ensure that all audio streams arrive at the conference bridge in sync with each other.
- [PDF] Voice and Video Conferencing Fundamentals - Semantic Scholar.
- Transformation Groups and Representation Theory!
- Years Best SF 5 (Years Best Science Fiction).
- Forensic Art Essentials: A Manual for Law Enforcement Artists.
- Ibn Rushds Metaphysics: A Translation With Introduction of Ibn Rushds Commentary on Aristotles Metaphysics (Islamic Philosophy, Theology and Science - Texts and Studies).
Where the digital data stream comprises a video data stream, in embodiments the audio characterising data changes no faster than every half a frame duration of the video in the video data stream, for example no greater than 15 Hz where the video is at 30 frames per second, to inhibit aliasing. The invention further provides processor control code to implement the above-described systems, nodes, and methods, for example on a general purpose computer system or on a digital signal processor DSP. In embodiments this controllable delay controls a delay in outgoing or transmitted audio from the phone to the telephone network.
In embodiments the node unit also includes a controllable received audio delay, also coupled between the telephone network interface and the phone interface, and having a delay control input, and, in embodiments, a controllable digital data stream video delay coupled between the computer network connection and a digital data output of the node unit.
Both the received audio, and video controllable delays may be coupled to a skew time offset controller to control one or both of these delays, in particular in response to a time offset determined by comparing audio characterising data extracted from the digital data stream with corresponding audio characterising data generated from the received audio. The skilled person will appreciate that depending upon the implementation, for example, whether there are two or more than two video nodes in the system, not all of these node unit modules may be required.
These and other aspects of the intention will now be further described, by way of example only, with reference to the accompanying figures in which:. Broadly speaking we will describe a video conferencing system in which video is added to a pre-existing ordinary telephone call.
Get this edition
Thus embodiments of the video conferencing system add video transmitted over an IP network to a telephone call placed using pre-existing telephone equipment and infrastructure. We will describe procedures for automatically establishing such a video conference, and also techniques for use in such a video conference for ensuring that the video and audio are matched in time, so that participant's lips move with the sound of their voices. The approach conveys a number of benefits: The audio part of the call retains the simplicity, familiarity and reliability of an ordinary phone call.
Teleconference service providers have local phone numbers in many geographies to reduce the cost of participation. Having dialled a normally local number, plus a PIN, the participants are able to talk to each other in a multi-way audio conference. Often, the audio participants will want to share other electronic information with each other, which could be presentations, white-board sketches or streaming video. Currently this is either done by email e. WebEx , or using a unified communications provider where such services are integrated with the audio e.
Voice and Video Conferencing Fundamentals
Whilst many people are familiar with audio conferences, accessing associated digital content to accompany the audio is often a time-consuming process which often delays the start of multi-party meetings. This is especially noticeable in video telephony. At a high level, therefore, we will describe techniques which enable parties to access a hosted audio conference using their existing telephone equipment and conference provider, and having accessed said audio conference will automatically connect an associated piece of IT equipment with similar pieces of IT equipment in the rooms of the other audio conference participants to facilitate the fast and easy sharing of associated digital media.
The phone tap or node unit has two telephone interface; one to the local telephone and one to the exchange PBX. It also has a network interface in order to communicate with the system control server, and an interface specific to the IT equipment being connected to the hosted audio conference.
When a user wishes to join an audio conference, they take the phone off-hook and dial as they would normally do. The Phone Tap acts as a pass-through, enabling normal 2-way audio communication whilst being capable of recognizing tones played on the audio conference, received from the PBX. The control sever optionally sends a unique-in-time code back to the Phone Tap, which on receipt the Phone Tap will play as tones out to the PBX.
If the Phone Tap hears any tones being played while it is playing its tones, it will back-off and retry the code until it is certain that any other audio participants can have unambiguously received the code. On detecting a series of tones coming from the PBX, the Phone Tap will encode these and report the code back to the control or rendezvous server. Using this information, the Rendezvous Server is able to determine which Phone Taps are connected to the same audio conference without Phone Taps having to reply to each others codes over the audio channel.
Having made this determination, the rendezvous server can send a unique-in-time identifier to each Phone Tap, which the Phone Tap passes back to the associated IT equipment.
Related Voice and Video Conferencing Fundamentals
Copyright 2019 - All Right Reserved