Abstract
Conference call services are an essential telecommunication tool in today’s business world. In a modernized world, audio conferencing multi-parties can reach out and communicate with people located anywhere in the world. Call conferencing is to stay as one of the most costs operative’s means of statement because of the real time interactions, reducing in time and transportable costs and the acceleration of decision making. Conference calls are inherently mobile because anybody can join from their smartphone, either through a steadfast mobile app or by simply ringing the conference line openly. Quality of voice is the major issue in conference calls since factors like low volume, audio noise, audio delays, line noise and feedback, audio echoes etc., have to be detached to improve the voice clarity of a speaker. Voice quality has been made by classy and time overriding. Different atmospheres have different needs, and each conferencing atmosphere has its own acoustical challenge which has to be given attention. This article addresses the issues and challenges for reassuring clear and well-timed communication within contacted phone calls. In this context multi – party conference has been very commonly used now-a-days despite the last mile which could be a VoIP Device, Telephone, WLL based Phone, Cellular Phone (2G, 3G and 4G) etc. With every advanced telecommunication network, problems like popping, static, echo and tapping occurs commonly which results in synchronization of speech processing. It is always difficult for people to process the information when it is less credible. When the voice quality deteriorates, naturally the information being communicated becomes less reliable. One of the critical issues pertaining to communication is poor voice quality when multiple people interact with each other through conference call. Especially in a corporate environment where people operate with heterogeneous communication devices poor voice quality often creates a lousy experience for people involved in the conference call. The situation gets worsened when the number of participants increases and the wide range of connectivity increases due to varied nature of last mile devices and networks involved. Typically, voice signals are represented as continuous signals which can be further decomposed as scaled and shifted delta functions. Various existing methods from literature review are discussed and identified for improving the multi-user voice conference signals, especially the quality of speech in the conference call. Audio quality is essential to both phone and web conference calls, and yet it is the element of conference calls that seems to cause the most trouble and complaints because of poor voice quality. The thesis addresses the problem of quality of voice in conference call using DWT (Discrete Wavelet Transform) techniques and for reverberation and noise reduction using Lynn filter technique. The implementation has been simulated in MATLAB R2013a with simulation results and discussions.
Introduction
Voice quality is one of the foremost problems in conference call in all communication devices. Speech is flexible as the way in which language and other semiotic means are used for communicating the speaker and the listener [1]. It can have a huge effect on people’s ability to work and their quality of life. Vocal issues can also lead to depression and poor self-assurance.
The supreme risks are measured to be vocal loading factors in the atmosphere, such as loud background noise, poor room acoustics and poor air quality [2]. These issues are overcome by using VoIP.
Speech is a sequence of ever-changing sounds. Speech depends upon the sounds to encode the content of the message. Speech signal is created at the vocal chords and passes through the vocal tract which is produced through voice production by the speaker. Typically, the characteristics of speech depends highly on the vocal chords, its position and the environment of the articulator [1]. Speech processing can be done with the help of Digital Signal Processing where in speech signal is processed and analyzed for better communication. Speech signal is fundamentally represented in analog form as acoustic waveform where the signal would vary with respect to time or space. These signals can be processed as digitalsignals [1, 2].
Speech Processing can be classified as follows.
Speech processing applications
Considering the speech signals as messages sent and received from one end to another end, these messages with sequence of sounds represented are assumed as physical realizations of discrete symbols which its transmitted in bits/seconds [2].
Speech processing application.
In 1928, the roots of VoIP are technologically formed, the first electronic voice synthesizer is created, known as the Vocoder. In 1969, the small computers are interconnected by modems using ARPANET. In 1973, first voice data packet is transmitted. It uses something called an LPC, or linear predictive coder, to transmit voice. In 1975, CompuServe is born, the first instance of every day users using the computer to communicate with each other. In 1988, The First Wideband Audio Codec is introduced.
In 1991, First VoIP Application is released as Public Domain. The technology at the time requires a speed of 64 Kb/s in order to run voice applications. After introducing this application, that is able to diminish the necessary bandwidth to just 32 Kb/s. In 1993, First Video Telepresence System was invented, and 1994 Free World Dialup was introduced. Free World Dialup, also known as FWD, is free, and may be just a proof-of-concept.
In 1995, First For-Profit VoIP Application, Vocal Tec Internet Phone, is considered to be the first viable VoIP application. In 1996, The First Hosted PBX Solution is invented. In 1996, Development of SIP is initially developed to invite people to multi-point conferences on the Internet Multicast Pillar, SIP in this point has nothing to do with VoIP. In 1999, Mark Spencer created his own IP-PBX calling Asterisk. The open-source program, is established over the years by thousands of computer operators who each contribute to the development of the software.
In 2003, Skype Peer-to-Peer Internet Calling is introduced. Skype permits free communication within its network and charges for calls made to the PSTN. Skype is a hybrid P2P and client-server system. In 2004 they introduced 3 techniques i.e., Vonage, FCC and VoIP Users Skyrockets. In 2005 First Dual Wi-Fi Cellular Phone was introduced. In 2006, First Mobile VoIP App Released a company designed to make mobile VoIP apps for smart phones. The app is first obtainable for Nokia phones, but soon after is released for the iPhone, Android and BlackBerry platforms.
In 2012, VoIP is Mainstream. In 2013–2014, VoIP & Beyond was introduced. In 2018 the present tendencies for innovativeness in VoIP Mobile UC: Expanded Mobile Integration and UC Functionality, Artificial Intelligence (AI) - Advancements in VoIP Operations, R. I. P POTS (Plain Old Telephone Systems), Security is Still a Priority, Internet of Things (IoT) and VoIP.
Shaw et al. [11] introduced new phone services based on the communication of voice over packet switched IP networks. VoIP can be realized on any data network that uses IP, like the Internet, Intranets and LAN. Voice over IP (VoIP) has the possibility to provide interactive communication services like video and voice conferencing. VoIP helps to transmit the data which are difficult to transfer over circuit-switched wired and wireless networks. This is a clear cram on the protocols used to sustain VoIP technology, the threats that may happen in a VoIP communication and the protection procedures taken to avoid these threats.
Zigunovsa et al. [16] presented on unresolved audio playback latency problems in mobile devices with Android operating systems. It defines the problems and its application of audio playback latency soundtrack and measuring methods and audio playback latency decreasing methods are used to achieve a smallest possible audio playback delay. It can be enhanced by reducing measurement errors using factory setting Android operation system which is free from additionally installed and running software, turned off communication modules and full battery connected to the charging network.
Park et al. [14] shows the low noise output stage of oversampling audio digital to analog converter (DAC). The deglitch method is used. The design approach makes it attractive for the output stage with a SC-DAC to achieve low noise performance. The proposed techniques have been validated by the rigorous with a 0.13 um CMOS technology.
In Athina et al. (2014), assessment is based on delay and loss measurements taken over wide-area backbone networks and uses subjective voice quality measures capturing the various impairments incurred. VoIP perceived quality. Such faces include the network loss and the delay variability that should be appropriately handled by the play out scheduling at the receiver. These findings indicate that although voice services can be sufficiently be provided by some ISPs, a significant number of Internets backbone paths lead to poor performance. The results from various subjective testing studies are compared and developed a methodology for assessing the perceived quality of a telephone call.
Beritelli and Rametta [4] state that the perceived speech quality, which is the crucial aspect of a forensic scenario, depends on the QoS guaranteed by the wireless data service of a specific telephone operator and it is mainly due to the radio link quality, packet delay/loss, and guaranteed bandwidth. He defines the proposed approach in reducing the packet loss rate, the number of cuts the speech signal is affected by, and, last but not least, improving the PESQ index.
The voice signal in Voice over Internet protocol (VoIP) system is deal with through the greatest effort policy based on IP network, which leads to the network degradations including delay, packet loss jitter [1]. The usage of the distributed arithmetic algorithm in FIR filters for signal quality enhancement of the VoIP speech signal. The improvement in the VoIP speech signal was attained virtually up to the equal quality level.
H. Singh and S. Singh [5] show the execution of finite impulse response (FIR) filter for voice quality enhancement in the VoIP system over distributed arithmetic (DA) algorithm. VoIP signal is evaluated using the PESQ dimension for narrowband signal. The reduction in the computational complexity in the system and significant improvement in the quality of the VoIP voice signal. The implementation of the IFIR filter is applied as post processor after the decoding in the proposed system for speech signal. VoIP speech signal not only improves the speech quality but also try to retain the spectral shape of the original signal.
The general methods adopted for speech enhancement are removal of background noise, echo suppression and externally bringing in certain frequencies into the speech. Especially echo suppression would be essential in conference room. The progress towards 3G/4G networks wideband speech has largely adopted to improve quality and intelligibility [22].
Atmaja et al. (2016) focus on pertaining speech enhancement for voice recording on extending Non-Negative Matrix Factorization (NMF) which was measured and compared for better performance than the existing results using Spectrogram and PESQ evaluation. The future implementation focus was recommended for computation time analysis and for evaluating potential Realtime implementation.
Speech enhancement method depending on time and scale adaptation of wavelet threshold was proposed with the apriori knowledge of the SNR. Speech recognition based on Hidden Markov Model was proposed and the experiments were conducted on AURORA2 database and proved that the proposed method improves the speech recognition rates for low SNRs. It has also concluded that performance would vary based on the noise on the applications.
Review of speech processing in conference call
In general conference calls of today technology gives various possible applications like muting the participants, recording the conversations, adding additional call participants, providing moderator role for one of the participants which gives more flexibility in terms of the conference itself. These conferences can be done through heterogeneous end-point devices such as Laptop, Cell phones, VoIP Devices, Telephones, internet connected speaker phones, etc. [8, 9].
Algorithms used for speech quality enhancement
There are a number of algorithms that are put forward by researchers for enhancing quality of speech. A brief discussion of each of these is presented below.
Play late algorithm
For speech transmission, sharing of bandwidth is important because in a typical conversation, active speech only occurs about half the time. Thus, during silence, the amount of the data that needs to be sent is significantly less, with the help of new codec’s, jitter buffers and other mechanisms, the quality of a VoIP call can, potentially, exceed that of ordinary GSM phones or be comparable to that of a PSTN, in terms of user-perceived voice quality.
Jitter buffer act as an important part in Voice over IP (VoIP) applications because it provides a key mechanism for achieving brilliant speech quality to meet scientific and viable requirements, for signal processing features [3]. In this paper the new algorithm is used, called Play Late Algorithm. It alters the play out delay inside a speech talk spurt without introducing unnecessary extra end-to-end delay. The gist of the algorithm is sketched in below steps.
The experimental results conducted on live environment using different network conditions show that the algorithm achieves better performance under different network conditions when compared to conventional jitter buffer algorithms, both static and adaptive.
Regression based speech enhancement using deep neural network
Xu et al. proposed a novel regression-based technique for enhancing speech using Deep Neural Network (DNN) [27]. In DNN algorithm, learning process is done by training a large set of samples. This is done to achieve mapping of non-linear noisy speech to a clear signal. Such a learning resulted in an improvement in speech continuity which was clearly separated from background noise.
The results of DNN algorithm were evaluated both subjectively and objectively. Subjective assessment revealed that out of a population of 10 listeners, 76
Non-negative matrix factorization (NMF) with kullback-leibler divergence algorithm
Sun et al. [28] proposed a novel method of noise and speech estimation, where noise can be removed, thereby enhancing the quality of speech. A novel non-negative matrix factorization (NMF) with Kullback-Leibler divergence technique is put forward. This algorithm aims at estimation of noise and speech in an unsupervised fashion. This is done by decomposing input noisy magnitude spectrogram into two portions, viz., low-rank noise part and sparse speech like portion.
This enables to set a regularized version of NFM. This in turn reconstructs noise and speech spectrogram by estimating a speech dictionary on the fly. This unsupervised technique was compared with conventional supervised techniques and the results were evaluated on five metrics namely, Perceptual Evaluation of Speech Quality (PESQ), Signal to Distortion Ratio (SDR), Signal to Noise Ratio (SNR), Short Time Objective Intelligibility (STOI) and Overall Quality of Speech (OVR). It was found that the unsupervised algorithm outperformed supervised traditional algorithms.
STFT phase reconstruction for improved single channel speech enhancement
Krawczyk and Gerkmann proposed a novel method [29] to reconstruct the spectral phase of voiced speech noisy observation and fundamental frequency. According to their study, most speech enhancement algorithms rely on modifying spectral amplitude and not on spectral phase. Thus, their novel spectral phase algorithm came as a better solution to speech enhancement which has an additional advantage of combining with STFT (Short Time discrete Fourier Transform) amplitude estimators. The evaluation of the algorithm proved that the algorithm improved the overall quality of speech and metrics like SNR and PESQ were used to prove the results of experiments.
Tapping noise suppression with magnitude weighted phase based detection
Sugiyama and Miyahara [30] proposed this algorithm for enhancing speech quality with a new phase-based detection and suppression of noise. The input noisy signal’s phase slope is compared with an ideal phase slope which is derived from average of intra-frame slopes along the axis of frequency. At each frequency point, phase values are weighted with magnitude. This is done to overcome the problems with heavily lowpass characteristics of tapping the noise spectrum. Phase unwrapping problem is then limited by using a rotation vector of frequent domain com00.0ponents. The result of such an algorithm is enhanced quality of speech.
Intelligent and machine learning algorithms
Intelligent and machine learning algorithms such as,
Decision tree algorithm. Neural network algorithm. Sequential minimal optimization algorithm. Bayesian algorithm.
The above algorithms are able to find the quality of the Voice over IP communication based on network parameters of a specific period of time. It defines the initial database of network parameters and the quality score for each scenario was performed the different scenarios were done leaving the default parameters fixed and varying the following parameters:
The mean one-way delay. The packet-loss probability. The type of codec is related to the parameters: encoding rate, Ie (equipment impairment factor) and Bpl (packet-loss robustness factor).
The values of Ie and Bpl are dependent on the vocoder used.
Table 1 presents the values of these codec parameters that have been tested by the ITU-T recommendation G. 107. In test scenarios, codecs G. 711 and G. 729 were employed.
The given values of Ie and Bpl for voice codecs codecs
The given values of Ie and Bpl for voice codecs codecs
Quality of the VOIP protocol communication is measured based on network parameters with a specific period of time. The quality score of the network parameters are performed on different scenarios considering the ITU-T Recommendation G. 107 with the type of codec related to the parameters of ie (equipment impairment factor) and bpl (packet loss robust factor) an important metrics for measuring the quality which are dependent on the vocoder. The mathematical model provides a value R which ranges from 0 to 93.2 which is considered to be the highest value for better quality of the signal. The equipment impairment factor and packet loss robust factor are the important factors for measuring the quality of the transmitted signal. The equipment impairment factor are represented in default value of zero and the permitted range is about 0 to 56 and the packet loss robust factor are represented by a default value of 4.3 and the permitted value ranges from 4.3 to 7.3 which are considered for measuring the better quality of the signal. Table 1 represents the value of ie and bpl to determine the quality of the signal under ITU-T Recommendation G.107 with codecs G.711 and G.729 respectively.
This classification permits obtaining the file that will work as a training file to the algorithms that regulates the quality of the communication.
Channelization happens over the network accordingly for the subscriber choice of network provisioned such as CDMA, TDMA or FDMA. Since the medium could differ such as wired or wireless, the point-to-point communication is taken care in Signal Enhancement. CCITT framing format is used in the process of channelization. Various metrics such as Spectral Bandwidth, SNR, and Bit Error Rate are some of the widely used metrics performance measure with respect to the channels.
Error detection would be dealt according to the choice of communication channel or the appropriate layer. Key aspect of error detection will track and detect the correct errors, bit level and packet level during the communication from one device to the other device. Voice transmission can include various errors like jitter, cross talk, thermal noise, impulse noise etc., the handling as additional bits to the signal will help to detect the errors. If necessary, this would also enable to retransmit the signal.
The corrected signals are transmitted after error correction, yet the need to be transmitted without the loss of quality or degradation of the signal. Repeaters in the telecommunication networks takes care of the amplification to ensure its good condition and they sequence it accordingly to the destination.
Speech enhancement
Quality of the speech signals center on the vector space decompose as the telephony communication channel which has frequency response of 300 Hz to 3 KHz. Filtering out the noise component is established through conversion of noisy speech covariance matrix into a subspace of vectors. With the help of perceptual scale rather than a mathematically linear filter, better speech enhancement results can be achieved.
VOIP (Voice over internet protocol)
Voice over IP is a technology that allows the users to transmit voice over the internet or a local area network using internet protocol (IP). This technology affords benefits such as low cost compared to the traditional Public Switched Telephone Network (PSTN) [3, 21].
Voice over IP (VoIP) is becoming a widely deployed service in data networks, and it will penetrate from the fixed network domain into wireless network domain [4, 22]. The characteristics of fixed networks and wireless networks are fundamentally different, which will impact the performance of services. The highest benefits of VoIP are low calling costs, low assembly cost, easy scalability and good voice quality [5].
Voice communication quality
Voice communication quality signifies a main component of the overall communication quality alleged by a user and is concerned with the voice transmission from a speaker to a listener [18, 22].
Voice communication quality is the most important aspect. Voice quality refers to the clarity of the participants. Voice quality has been performed by classy and time overriding subjective listening tests [19, 20].
Conference bridge
Single port
Participants are commonly able to call into the conference call themselves by ringing a mobile number that attached to a “conference bridge”. Conference bridges permit a number of remotely located telephone subscribers to hold a conference [6]. A new bridge using solid-state circuitry and a plug-in module-type equipment arrangement has been designed. A conference bridge allows a set of persons to share in a mobile call. The most mutual form of bridge which allows contestants dial into a virtual meeting room from their own handset [7]. The number of participants in a conference call differs. The ability and flexibility of conference call have been achieved by designing a 30-port bridge that can be sub sectioned into five separate 6-port groups. These groups can then be combined to service large or small groups of subscribers [8].
Multiport
Multiport Conference Bridge was intended in 1967. Multiport Conference Bridge was affluence of process and economic apparatus arrangements. The multiport conference bridge has basic communication requirements. Zero communication loss, High loss against interior echo, and Security against noise and external echoes [8]. The following figure is an example of multiport.
Structure needed for conference call
IP Gateway
It is a node (router) in a network. The gateways are able to communicate and send data back and forward. Without gateways the Internet wouldn’t be any use to us (as well as a lot of other hardware and software). Gateway is used in connecting the networks, various networks that uses different protocols and technologies [9].
Wi-fi
Wireless Fidelity (WiFi) is a wireless local area network over a particular area. It aims to connect devices such as Personal Computers, PDAs, laptops, printers, etc. WiFi permits connectivity in infrastructure mode, in this mode the WiFi stations communicate through the access point (AP) [10].
WiMAX (Worldwide interoperability for microwave access)
Worldwide Interoperability for Microwave Access (WiMAX) is a Wireless Metropolitan Access Network (WMAN) [10]. It is used widely to define wireless systems based on IEEE 802.16. WiMAX is also generally labeled 4G network. WiMAX offers high data rates and large area coverage. It supports fixed and mobile broadband access [18].
Mobile network
The mobile network can make it possible for users to make telephone calls. VoIP is the real-time transfer of voice signals using the Internet Protocol (IP) over the Internet or a private network. A mobile network is a communication network where the last link is wireless [10].
PSTN (public switched telephone network)
At present, utmost of the voice traffics are handled by PSTN. It’s also recognized as the Plain Old Telephone Service (POTS) [11]. But certain operators have started replacement of PSTN with Access Gateway (AG) for VoIP applications [12].
Softphone
A softphone (software telephone) is an application program. Softphone allows voice over Internet Protocol (VoIP) telephone calls from computing devices such as smart phone, laptop, palmtop, PC etc. [11]. Softphones are like mobile phones that permit us to make phone calls directly from a PC that has an internet connection. A softphone is just like a normal mobile phone just with the difference that the connection is coming from PC [13].
Challenges in conference call while using mobile phone
Low volume
High quality volume is required for modern digital devices such as smart phones, laptops etc. The volume is very low on their communication broadcasting problem. This problem can affect communication in several ways. Partakers cannot communicate with each other properly [14].
Audio noise
Most of the teleconferencing call problem is background noise. This is also called as audio noise. Most of the time it’s not do not habitually block the background noise. This background noises may distract the listeners and failed to follow the conversations between the listener’s [15, 23].
During the conference call the delay audio for few seconds between the speaker and the receiver. This is determined. It helps to prevent the audio problems. Malfunctions also create additional audio delays. This problem leads to long dawdles between the speaker and the receiver [15, 23].
Line noise and feedback
This is similar to background noise. A line noise is the interference in conference call. These types of problem may be happening some times because the partakers cannot block their devices that are causing audio delays [15, 22]. Once a partaker calls into the audio conference call by the same mobile device on same room if a partakers device too close to their headphone or microphone, the audio-conferencing software does not chunk the resultant noise [17].
Audio echo’s
Audio echoes are most common problem on conference calls and habitually occur because two people on the call are taking the conference call from the same room. If two or more people in the same room joint a conference call, the software doesn’t cancel out the echo from one partaker’s phone to another partakers. echo’s can be intrusion of both the speaker and the listener. Sometimes it is very hard to continuing a call when they hear themselves echoing [17, 21].
Result and conclusion
Simulation results
In this section validation is preferred against simulation for a wide range of signal gaining factor, different values of file sizes and different average carrier availability and availability ratios is classified in the MAT file. Five call node scenarios, related to the user’s mobility and voice signal are focused and efficiency is increased by gaining signals variations in the mat file for sampling the call node. It is approached with noise removal technique called Lynn filter and by carrier signal gaining factor which apply DWT frequency amplification method.
Call 2 disconnected output
Call 2 disconnected signal.
It shows the disconnected signal of 2
Call 3 disconnected signal.
It shows the disconnected signal of 3
Frequency of voice problems in Conference call is discussed where quality of speech is the most important aspect in conference call. It describes the various internationally standardized that are based on ratings by humans were presented on Conference call and the impact of delay on VoIP calls was studied. Voice quality states to the clarity of a speaker’s voice as apparent by a listener. Its extent offers a means of adding the human end user’s perception to traditional ways of performing network management assessment of voice telephony facilities. The separablecharacter of a partaker’s voice is communicated through the choices they make over which signs highlight and portray those aspects about themselves that they wish to express. This paper presented the issues of the Quality of voice during the communication through Conference Call. It indicates that the prevalence of voice problems and which are occupational groups known to have high prevalence of voice problems. The proposed research will be extended as analysis of study for improving the quality of conference call regardless to the total number of speakers participating during the phone calls.
