Maps of RSS from a wireless transmitter can be used for positioning or for planning wireless infrastructure. The RSS values measured at a single point are not always the same, but follow some distribution, which vary from point to point. In existing approaches in the literature this variation is neglected or its mapping requires making many measurements at every point, which makes the measurement collection very laborious. We propose to use GMs for modeling joint distributions of the position and the RSS value. The proposed model is more versatile than methods found in the literature as it models the joint distribution of RSS measurements and the location space. This allows us to model the distributions of RSS values in every point of space without making many measurement in every point. In addition, GMs allow us to compute conditional probabilities and posteriors of position in closed form. The proposed models can model any RSS attenuation pattern, which is useful for positioning in multifloor buildings. Our tests with WLAN signals show that positioning with the proposed algorithm provides accurate position estimates. We conclude that the proposed algorithm can provide useful information about distributions of RSS values for different applications.
Research output: Contribution to journal › Article › Scientific › peer-review
Decline in respiratory regulation demonstrates the primary forewarning for the onset of physiological aberrations. In clinical environment, the obtrusive nature and cost of instrumentation have retarded the integration of continuous respiration monitoring for standard practice. Photoplethysmography (PPG) presents a non-invasive, optical method of assessing blood flow dynamics in peripheral vasculature. Incidentally, respiration couples as a surrogate constituent in PPG signal, justifying respiratory rate (RR) estimation. The physiological processes of respiration emerge as distinctive oscillations that are fluctuations in various parameters extracted from PPG signal. We propose a novel algorithm designed to account for intermittent diminishment of the respiration induced variabilities (RIV) by a fusion-based enhancement of wavelet synchrosqueezed spectra. We have combined the information on intrinsic mode functions (IMF) of five RIVs to enhance mutually occurring, instantaneous frequencies of the spectra. The respiration rate estimate is obtained by tracking the spectral ridges with a particle filter. We have evaluated the method with a dataset recorded from 29 young adult subjects (mean: 24.17 y, SD: 4.19 y) containing diverse, voluntary, and periodically metronome-assisted respiratory patterns. Bayesian inference on fusion-enhanced Respiration Induced Frequency Variability (RIFV) indicated MAE and RMSE of 1.764 and 3.996 BPM, respectively. The fusion approach was deemed to improve MAE and RMSE of RIFV by 0.185 BPM (95% HDI: 0.0285-0.3488, effect size: 0.548) and 0.250 BPM (95% HDI: 0.0733-0.431, effect size: 0.653), respectively, with further pronounced improvements to other RIVs. We conclude that the fusion of variability signals proves important to IMF localization in the spectral estimation of RR.
INT=bmte,"Pirhonen, Mikko"
Research output: Contribution to journal › Article › Scientific › peer-review
Audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a spectrogram inversion algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has been exploited successfully in several recent works. However, this algorithm suffers from two drawbacks, which we address in this letter. First, it has originally been introduced in a heuristic fashion: we propose here a rigorous optimization framework in which MISI is derived, thus proving the convergence of this algorithm. Besides, while MISI operates offline, we propose here an online version of MISI called oMISI, which is suitable for low-latency source separation, an important requirement for e.g., hearing aids applications. oMISI also allows one to use alternative phase initialization schemes exploiting the temporal structure of audio signals. Experiments conducted on a speech separation task show that oMISI performs as well as its offline counterpart, thus demonstrating its potential for real-time source separation.
EXT="Magron, Paul"
Research output: Contribution to journal › Article › Scientific › peer-review
Facial pacing systems aim to reanimate paralyzed facial muscles with electrical stimulation. To aid the development of such systems, the frontalis muscle responsible for eyebrow raising was transcutaneously stimulated in 12 healthy participants using four waveforms: square wave, square wavelet, sine wave, and sinusoidal wavelet. The aim was to investigate the effects of the waveform on muscle activation magnitude, perceived discomfort, and the relationship between the stimulus signal amplitude and the magnitude of evoked movement. The magnitude of movement was measured offline using video recordings and compared to the magnitude of maximum voluntary movement (MVM) of eyebrows. Results showed that stimulations evoked forehead movement at a magnitude comparable to the MVM in 67% of the participants and close to comparable (80% of the MVM) in 92%. All the waveforms were equally successful in evoking movements. Perceived discomfort did not differ between the waveforms in relation to the movement magnitude, but some individual preferences did exist. Further, regression analysis showed a statistically significant linear relation between stimulation amplitudes and the evoked movement in 98% of the cases. As the waveforms performed equally well in evoking muscle activity, the waveform in pacing systems could be selected by emphasizing technical aspects such as the possibility to suppress stimulation artifacts from simultaneous electromyography measurement.
DUPL=53532026
Research output: Contribution to journal › Article › Scientific › peer-review
The authors consider the problem of compressive sensed video recovery via iterative thresholding algorithm. Traditionally, it is assumed that some fixed sparsifying transform is applied at each iteration of the algorithm. In order to improve the recovery performance, at each iteration the thresholding could be applied for different transforms in order to obtain several estimates for each pixel. Then the resulting pixel value is computed based on obtained estimates using simple averaging. However, calculation of the estimates leads to significant increase in reconstruction complexity. Therefore, the authors propose a heuristic approach, where at each iteration only one transform is randomly selected from some set of transforms. First, they present simple examples, when block-based 2D discrete cosine transform is used as the sparsifying transform, and show that the random selection of the block size at each iteration significantly outperforms the case when fixed block size is used. Second, building on these simple examples, they apply the proposed approach when video block-matching and 3D filtering (VBM3D) is used for the thresholding and show that the random transform selection within VBM3D allows to improve the recovery performance as compared with the recovery based on VBM3D with fixed transform.
EXT="Belyaev, Evgeny"
Research output: Contribution to journal › Article › Scientific › peer-review
This paper studies vehicle attribute recognition by appearance. In the literature, image-based target recognition has been extensively investigated in many use cases, such as facial recognition, but less so in the field of vehicle attribute recognition. We survey a number of algorithms that identify vehicle properties ranging from coarse-grained level (vehicle type) to fine-grained level (vehicle make and model). Moreover, we discuss two alternative approaches for these tasks, including straightforward classification and a more flexible metric learning method. Furthermore, we design a simulated real-world scenario for vehicle attribute recognition and present an experimental comparison of the two approaches.
Research output: Contribution to journal › Article › Scientific › peer-review
Efficient mitigation of power amplifier (PA) nonlinear distortion in multi-user hybrid precoding based broadband mmWave systems is an open research problem. In this article, we carry out detailed signal and distortion modeling in broadband multi-user hybrid MIMO systems, with a bank of nonlinear PAs in each subarray, while also take the inevitable crosstalk between the antenna/PA branches into account. Building on the derived models, we adopt and describe an efficient closed-loop (CL) digital predistortion (DPD) solution that utilizes only a single-input DPD unit per transmit chain or subarray, despite crosstalk, providing thus substantial complexity-benefit compared to the state-of-the art multi-dimensional DPD solutions. We show that under spatially correlated multipath propagation, each single-input DPD unit can provide linearization towards every intended user, or more generally, towards all spatial directions where coherent propagation is taking place, and that the adopted CL DPD system is robust against crosstalk. Extensive numerical results building on practical measurement-based mmWave PA models are provided, demonstrating and verifying the excellent linearization performance of the overall DPD system in different evaluation scenarios.
EXT="Abdelaziz, Mahmoud"
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we propose a novel method for projecting data from multiple modalities to a new subspace optimized for one-class classification. The proposed method iteratively transforms the data from the original feature space of each modality to a new common feature space along with finding a joint compact description of data coming from all the modalities. For data in each modality, we define a separate transformation to map the data from the corresponding feature space to the new optimized subspace by exploiting the available information from the class of interest only. We also propose different regularization strategies for the proposed method and provide both linear and non-linear formulations. The proposed Multimodal Subspace Support Vector Data Description outperforms all the competing methods using data from a single modality or fusing data from all modalities in four out of five datasets.
EXT="Iosifidis, Alexandros"
Research output: Contribution to journal › Article › Scientific › peer-review
Shearlet Transform (ST) has been instrumental for the Densely-Sampled Light Field (DSLF) reconstruction, as it sparsifies the underlying Epipolar-Plane Images (EPIs). The sought sparsification is implemented through an iterative regularization, which tends to be slow because of the time spent on domain transformations for dozens of iterations. To overcome this limitation, this letter proposes a novel self-supervised DSLF reconstruction method, CycleST, which employs ST and cycle consistency. Specifically, CycleST is composed of an encoder-decoder network and a residual learning strategy that restore the shearlet coefficients of densely-sampled EPIs using EPI-reconstruction and cycle-consistency losses. CycleST is a self-supervised approach that can be trained solely on Sparsely-Sampled Light Fields (SSLFs) with small disparity ranges (⩽ 8 pixels). Experimental results of DSLF reconstruction on SSLFs with large disparity ranges (16-32 pixels) demonstrate the effectiveness and efficiency of the proposed CycleST method. Furthermore, CycleST achieves ∼ 9x speedup over ST, at least.
Research output: Contribution to journal › Article › Scientific › peer-review
We propose a novel classifier accuracy metric: the Bayesian Area Under the Receiver Operating Characteristic Curve (CBAUC). The method estimates the area under the ROC curve and is related to the recently proposed Bayesian Error Estimator. The metric can assess the quality of a classifier using only the training dataset without the need for computationally expensive cross-validation. We derive a closed-form solution of the proposed accuracy metric for any linear binary classifier under the Gaussianity assumption, and study the accuracy of the proposed estimator using simulated and real-world data. These experiments confirm that the closed-form CBAUC is both faster and more accurate than conventional AUC estimators.
EXT="Tohka, Jussi"
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we propose, describe, and test a modification of the K-SVD algorithm. Given a set of training data, the proposed algorithm computes an overcomplete dictionary by minimizing the β-divergence (β>=1) between the data and its representation as linear combinations of atoms of the dictionary, under strict sparsity restrictions. For the special case β=2, the proposed algorithm minimizes the Frobenius norm and, therefore, for β=2 the proposed algorithm is equivalent to the original K-SVD algorithm. We describe the modifications needed and discuss the possible shortcomings of the new algorithm. The algorithm is tested with random matrices and with an example based on speech separation.
Research output: Contribution to journal › Article › Scientific › peer-review
The partial shading conditions significantly affect the functionality of solar power plants despite the presence of multiple maximum power point tracking systems. The primary cause of this problem is the presence of local maxima in the power–current and/or power–voltage characteristic curves that restrict the functionality of the conventional maximum power point tracking systems. The present article proposes a modified algorithm based on the simplified equivalent circuit of solar cells to improve the functionality of traditional maximum power point tracking systems. This algorithm provides a method for regularly monitoring the photo-current of each solar module. The upper and lower boundaries of the regulating parameter such as current or voltage are decided very precisely, which is helpful to find the location of the global maximum. During a sequential search, the control system accurately determines the lower and upper boundaries of the global maximum. Simultaneously, the maximum power point tracking system increases the photovoltaic current up to one of these boundaries and applies one of the conventional algorithms. Additionally, the control system regularly monitors the photovoltaic characteristics and changes the limits of regulating parameter concerning any change in global maximum location. This proposed method is fast and precise to locate the global maximum boundaries and to track global maximum even under fast-changing partial shading conditions. The improved performance and overall efficiency are validated by simulation study for variable solar irradiance.
Research output: Contribution to journal › Article › Scientific › peer-review
In this work, we consider the problem of single-query 6-DoF camera pose estimation, i.e. estimating the position and orientation of a camera by using reference images and a point cloud. We perform a systematic comparison of three state-of-the-art strategies for 6-DoF camera pose estimation: feature-based, photometric-based and mutual-information-based approaches. Two standard datasets with self-driving setups are used for experiments, and the performance of the studied methods is evaluated in terms of success rate, translation error and maximum orientation error. Building on the analysis of the results, we evaluate a hybrid approach that combines feature-based and mutual-information-based pose estimation methods to benefit from their complementary properties for pose estimation. Experiments show that (1) in cases with large appearance change between query and reference, the hybrid approach outperforms feature-based and mutual-information-based approaches by an average increment of 9.4% and 8.7% in the success rate, respectively; (2) in cases where query and reference images are captured at similar imaging conditions, the hybrid approach performs similarly as the feature-based approach, but outperforms both photometric-based and mutual-information-based approaches with a clear margin; (3) the feature-based approach is consistently more accurate than mutual-information-based and photometric-based approaches when at least 4 consistent matching points are found between the query and reference images.
EXT="Matas, Jiri"
Research output: Contribution to journal › Article › Scientific › peer-review
This paper presents a novel multi-sensor non-stationary EEG model; it is obtained by combining state of the art mono-sensor newborn EEG simulators, a multilayer newborn head model comprised of four homogeneous concentric spheres, a multi-sensor propagation scheme based on array processing and optical dispersion to calculate inter-channel attenuation and delay, and lastly, a multi-variable optimization paradigm using particle swarm optimization and Monte-Carlo simulations to validate the model for optimal conditions. Multi-sensor EEG of 7 newborns, comprised of seizure and background epochs, are analyzed using time-space, time-frequency, power maps and multi-sensor causality techniques. The outcomes of these methods are validated by medical insights and serve as a backbone for any assumptions and as performance benchmarks for the model to be evaluated against. The results obtained with the developed model show 85.7% averaged time-frequency correlation (which is the selected measure for similarity with real EEG)with 5.9% standard deviation, and the averaged error obtained is 34.6% with 8% standard deviation. The resulting performances indicate that the proposed model provides a suitable matching fit with real EEG in terms of their probability density function, inter-sensor attenuation and translation, and multi-sensor causality. They also demonstrate the model flexibility to generate new unseen samples by utilizing user-defined parameters, making it suitable for other relevant applications.
Research output: Contribution to journal › Article › Scientific › peer-review
As the Internet of Vehicles matures and acquires its social flavor, novel wireless connectivity enablers are being demanded for reliable data transfer in high-rate applications. The recently ratified New Radio communications technology operates in millimeter-wave (mmWave) spectrum bands and offers sufficient capacity for bandwidth-hungry services. However, seamless operation over mmWave is difficult to maintain on the move, since such extremely high frequency radio links are susceptible to unexpected blockage by various obstacles, including vehicle bodies. As a result, proactive mode selection, that is, migration from infrastructure- to vehicle-based connections and back, is becoming vital to avoid blockage situations. Fortunately, the very social structure of interactions between the neighboring smart cars and their passengers may be leveraged to improve session continuity by relaying data via proximate vehicles. This paper conceptualizes the socially inspired relaying scenarios, conducts underlying mathematical analysis, continues with a detailed 3-D modeling to facilitate proactive mode selection, and concludes by discussing a practical prototype of a vehicular mmWave platform.
Research output: Contribution to journal › Article › Scientific › peer-review
Given the recent surge in developments of deep learning, this paper provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e., audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.
Research output: Contribution to journal › Article › Scientific › peer-review
Successful fine-grained image classification methods learn subtle details between visually similar (sub-)classes, but the problem becomes significantly more challenging if the details are missing due to low resolution. Encouraged by the recent success of Convolutional Neural Network (CNN) architectures in image classification, we propose a novel resolution-aware deep model which combines convolutional image super-resolution and convolutional fine-grained classification into a single model in an end-to-end manner. Extensive experiments on multiple benchmarks demonstrate that the proposed model consistently performs better than conventional convolutional networks on classifying fine-grained object classes in low-resolution images.
Research output: Contribution to journal › Article › Scientific › peer-review
ALMARVI is a collaborative European research project funded by Artemis involving 16 industrial as well as academic partners across 4 countries, working together to address various computational challenges in image and video processing in 3 application domains: healthcare, surveillance and mobile. This paper is an editorial for a special issue discussing the integrated system created by the partners to serve as a cross-domain solution for the project. The paper also introduces the partner articles published in this special issue to discuss the various technological developments achieved within ALMARVI spanning all system layers, from hardware to applications. We illustrate the challenges faced within the project based on use cases from the three targeted application domains, and how these can address the 4 main project objectives addressing 4 challenges faced by high performance image and video processing systems: massive data rate, low power consumption, composability and robustness. We present a system stack composed of algorithms, design frameworks and platforms as a solution to these challenges. Finally, the use cases from the three different application domains are mapped on the system stack solution and are evaluated based on their performance for each of the 4 ALMARVI objectives.
Research output: Contribution to journal › Article › Scientific › peer-review
Due to the increased popularity of augmented (AR) and virtual (VR) reality experiences, the interest in representing the real world in an immersive fashion has never been higher. Distributing such representations enables users all over the world to freely navigate in never seen before media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today's networks. Thus, efficient compression technologies are in high demand. This paper proposes an approach to compress 3D video data utilizing 2D video coding technology. The proposed solution was developed to address the needs of "tele-immersive" applications, such as VR, AR, or mixed reality with "Six Degrees of Freedom" capabilities. Volumetric video data is projected on 2D image planes and compressed using standard 2D video coding solutions. A key benefit of this approach is its compatibility with readily available 2D video coding infrastructure. Furthermore, objective and subjective evaluation shows significant improvement in coding efficiency over reference technology. The proposed solution was contributed and evaluated in international standardization. Although it is was not selected as the winning proposal, as very similar solution has been selected developed since then.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we present a high data rate implementation of a digital predistortion (DPD) algorithm on a modern mobile multicore CPU containing an on-chip GPU. The proposed implementation is capable of running in real-time, thanks to the execution of the predistortion stage inside the GPU, and the execution of the learning stage on a separate CPU core. This configuration, combined with the low complexity DPD design, allows for more than 400 Msamples/s sample rates. This is sufficient for satisfying 5G new radio (NR) base station radio transmission specifications in the sub-6 GHz bands, where signal bandwidths up to 100 MHz are specified. The linearization performance is validated with RF measurements on two base station power amplifiers at 3.7 GHz, showing that the 5G NR downlink emission requirements are satisfied.
INT=comp,"Meirhaeghe, Alexandre"
Research output: Contribution to journal › Article › Scientific › peer-review
This paper considers a scenario when we have multiple pre-trained detectors for detecting an event and a small dataset for training a combined detection system. We build the combined detector as a Boolean function of thresholded detector scores and implement it as a binary classification cascade. The cascade structure is computationally efficient by providing the possibility to early termination. For the proposed Boolean combination function, the computational load of classification is reduced whenever the function becomes determinate before all the component detectors have been utilized. We also propose an algorithm, which selects all the needed thresholds for the component detectors within the proposed Boolean combination. We present results on two audio-visual datasets, which prove the efficiency of the proposed combination framework. We achieve state-of-the-art accuracy with substantially reduced computation time in laughter detection task, and our algorithm finds better thresholds for the component detectors within the Boolean combination than the other algorithms found in the literature.
Research output: Contribution to journal › Article › Scientific › peer-review
The increasing number of cores in System on Chips (SoC) has introduced challenges in software parallelization. As an answer to this, the dataflow programming model offers a concurrent and reusability promoting approach for describing applications. In this work, a runtime for executing Dataflow Process Networks (DPN) on multicore platforms is proposed. The main difference between this work and existing methods is letting the operating system perform Central processing unit (CPU) load-balancing freely, instead of limiting thread migration between processing cores through CPU affinity. The proposed runtime is benchmarked on desktop and server multicore platforms using five different applications from video coding and telecommunication domains. The results show that the proposed method offers significant improvements over the state-of-art, in terms of performance and reliability.
Research output: Contribution to journal › Article › Scientific › peer-review
Filtering and smoothing algorithms for linear discrete-time state-space models with skew-t-distributed measurement noise are proposed. The algorithms use a variational Bayes based posterior approximation with coupled location and skewness variables to reduce the error caused by the variational approximation. Although the variational update is done suboptimally using an expectation propagation algorithm, our simulations show that the proposed method gives a more accurate approximation of the posterior covariance matrix than an earlier proposed variational algorithm. Consequently, the novel filter and smoother outperform the earlier proposed robust filter and smoother and other existing low-complexity alternatives in accuracy and speed. We present both simulations and tests based on real-world navigation data, in particular GPS data in an urban area, to demonstrate the performance of the novel methods. Moreover, the extension of the proposed algorithms to cover the case where the distribution of the measurement noise is multivariate skew-t is outlined. Finally, the paper presents a study of theoretical performance bounds for the proposed algorithms.
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
Managing the water quality of freshwaters is a crucial task worldwide. One of the most used methods to biomonitor water quality is to sample benthic macroinvertebrate communities, in particular to examine the presence and proportion of certain species. This paper presents a benchmark database for automatic visual classification methods to evaluate their ability for distinguishing visually similar categories of aquatic macroinvertebrate taxa. We make publicly available a new database, containing 64 types of freshwater macroinvertebrates, ranging in number of images per category from 7 to 577. The database is divided into three datasets, varying in number of categories (64, 29, and 9 categories). Furthermore, in order to accomplish a baseline evaluation performance, we present the classification results of Convolutional Neural Networks (CNNs) that are widely used for deep learning tasks in large databases. Besides CNNs, we experimented with several other well-known classification methods using deep features extracted from the data.
Research output: Contribution to journal › Article › Scientific › peer-review
We introduce a paradigm for nonlocal sparsity reinforced deep convolutional neural network denoising. It is a combination of a local multiscale denoising by a convolutional neural network (CNN) based denoiser and a nonlocal denoising based on a nonlocal filter (NLF), exploiting the mutual similarities between groups of patches. CNN models are leveraged with noise levels that progressively decrease at every iteration of our framework, while their output is regularized by a nonlocal prior implicit within the NLF. Unlike complicated neural networks that embed the nonlocality prior within the layers of the network, our framework is modular, and it uses standard pretrained CNNs together with standard nonlocal filters. An instance of the proposed framework, called NN3D, is evaluated over large grayscale image datasets showing state-of-the-art performance.
Research output: Contribution to journal › Article › Scientific › peer-review
Automatically generating a summary of sports video poses the challenge of detecting interesting moments, or highlights, of a game. Traditional sports video summarization methods leverage editing conventions of broadcast sports video that facilitate the extraction of high-level semantics. However, user-generated videos are not edited, and thus traditional methods are not suitable to generate a summary. In order to solve this problem, this work proposes a novel video summarization method that uses players' actions as a cue to determine the highlights of the original video. A deep neural network-based approach is used to extract two types of action-related features and to classify video segments into interesting or uninteresting parts. The proposed method can be applied to any sports in which games consist of a succession of actions. Especially, this work considers the case of Kendo (Japanese fencing) as an example of a sport to evaluate the proposed method. The method is trained using Kendo videos with ground truth labels that indicate the video highlights. The labels are provided by annotators possessing different experience with respect to Kendo to demonstrate how the proposed method adapts to different needs. The performance of the proposed method is compared with several combinations of different features, and the results show that it outperforms previous summarization methods.
Research output: Contribution to journal › Article › Scientific › peer-review
Both internal and boundary feedback exponential stabilization to trajectories for semilinear parabolic equations in a given bounded domain are addressed. The values of the controls are linear combinations of a finite number of actuators which are supported in a small region. A condition on the family of actuators is given which guarantees the local stabilizability of the control system. It is shown that a linearization-based Riccati feedback stabilizing controller can be constructed. The results of numerical simulations are presented and discussed.
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
This article investigates digital predistortion (DPD) linearization of hybrid beamforming large-scale antenna transmitters. We propose a novel DPD processing and learning technique for an antenna sub-array, which utilizes a combined signal of the individual power amplifier (PA) outputs in conjunction with a decorrelation-based learning rule. In effect, the proposed approach results in minimizing the nonlinear distortions in the direction of the intended receiver. This feature is highly desirable, since emissions in other directions are naturally weak due to beamforming. The proposed parameter learning technique requires only a single observation receiver, and therefore supports simple hardware implementation. It is also shown to clearly outperform the current state-of-the-art technique which utilizes only a single PA for learning. Analysis of the feedback network amplitude and phase imbalances reveals that the technique is robust even to high levels of such imbalances. Finally, we also show that the array system out-of-band emissions are well-behaving in all spatial directions, and essentially below those of the corresponding single-antenna transmitter, due to the combined effects of the DPD and beamforming.
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
Public evaluation campaigns and datasets promote active development in target research areas, allowing direct comparison of algorithms. The second edition of the challenge on Detection and Classification of Acoustic Scenes and Events (DCASE 2016) has offered such an opportunity for development of state-of-the-art methods, and succeeded in drawing together a large number of participants from academic and industrial backgrounds. In this paper, we report on the tasks and outcomes of the DCASE 2016 challenge. The challenge comprised four tasks: acoustic scene classification, sound event detection in synthetic audio, sound event detection in real-life audio, and domestic audio tagging. We present in detail each task and analyse the submitted systems in terms of design and performance. We observe the emergence of deep learning as the most popular classification method, replacing the traditional approaches based on Gaussian mixture models and support vector machines. By contrast, feature representations have not changed substantially throughout the years, as mel frequency-based representations predominate in all tasks. The datasets created for and used in DCASE 2016 are publicly available and are a valuable resource for further research.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we model the salient object detection problem under a probabilistic framework encoding the boundary connectivity saliency cue and smoothness constraints into an optimization problem. We show that this problem has a closed form global optimum solution, which estimates the salient object. We further show that along with the probabilistic framework, the proposed method also enjoys a wide range of interpretations, i.e. graph cut, diffusion maps and one-class classification. With an analysis according to these interpretations, we also find that our proposed method provides approximations to the global optimum to another criterion that integrates local/global contrast and large area saliency cues. The proposed unsupervised approach achieves mostly leading performance compared to the state-of-the-art unsupervised algorithms over a large set of salient object detection datasets including around 17k images for several evaluation metrics. Furthermore, the computational complexity of the proposed method is favorable/comparable to many state-of-the-art unsupervised techniques.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper we propose a method for separation of moving sound sources. The method is based on first tracking the sources and then estimation of source spectrograms using multichannel non-negative matrix factorization (NMF) and extracting the sources from the mixture by single-channel Wiener filtering. We propose a novel multichannel NMF model with time-varying mixing of the sources denoted by spatial covariance matrices (SCM) and provide update equations for optimizing model parameters minimizing squared Frobenius norm. The SCMs of the model are obtained based on estimated directions of arrival of tracked sources at each time frame. The evaluation is based on established objective separation criteria and using real recordings of two and three simultaneous moving sound sources. The compared methods include conventional beamforming and ideal ratio mask separation. The proposed method is shown to exceed the separation quality of other evaluated blind approaches according to all measured quantities. Additionally, we evaluate the method's susceptibility towards tracking errors by comparing the separation quality achieved using annotated ground truth source trajectories.
Research output: Contribution to journal › Article › Scientific › peer-review
In this letter, we propose an iterative Kalman type algorithm based on posterior linearization. The proposed algorithm uses a nested loop structure to optimize the mean of the estimate in the inner loop and update the covariance, which is a computationally more expensive operation, only in the outer loop. The optimization of the mean update is done using a damped algorithm to avoid divergence. Our simulations show that the proposed algorithm is more accurate than existing iterative Kalman filters.
EXT="Raitoharju, Matti"
Research output: Contribution to journal › Article › Scientific › peer-review
Time-division duplex (TDD) based massive MIMO systems rely on the reciprocity of the wireless propagation channels when calculating the downlink precoders based on uplink pilots. However, the effective uplink and downlink channels incorporating the analog radio front-ends of the base station (BS) and user equipments (UEs) exhibit non-reciprocity due to non-identical behavior of the individual transmit and receive chains. When downlink precoder is not aware of such channel non-reciprocity (NRC), system performance can be significantly degraded due to NRC induced interference terms. In this work, we consider a general TDD-based massive MIMO system where frequency-response mismatches at both the BS and UEs, as well as the mutual coupling mismatch at the BS large-array system all coexist and induce channel NRC. Based on the NRC-impaired signal models, we first propose a novel iterative estimation method for acquiring both the BS and UE side NRC matrices and then also propose a novel NRC-aware downlink precoder design which utilizes the obtained estimates. Furthermore, an efficient pilot signaling scheme between the BS and UEs is introduced in order to facilitate executing the proposed estimation method and the NRC-aware precoding technique in practical systems. Comprehensive numerical results indicate substantially improved spectral efficiency performance when the proposed NRC estimation and NRC-aware precoding methods are adopted, compared to the existing state-of-the-art methods.
Research output: Contribution to journal › Article › Scientific › peer-review
In unsupervised circumstances, multi-view learning seeks a shared latent representation by taking the consensus and complementary principles into account. However, most existing multi-view unsupervised learning approaches do not explicitly lay stress on the predictability of the latent space. In this paper, we propose a novel multi-view predictive latent space learning (MVP) model and apply it to multi-view clustering and unsupervised dimension reduction. The latent space is forced to be predictive by maximizing the correlation between the latent space and feature space of each view. By learning a multi-view graph with adaptive view-weight learning, MVP effectively combines the complementary information from multi-view data. Experimental results on benchmark datasets show that MVP outperforms the state-of-the-art multi-view clustering and unsupervised dimension reduction algorithms.
Research output: Contribution to journal › Article › Scientific › peer-review
This paper presents a model-based design method and a corresponding new software tool, the HTGS Model-Based Engine (HMBE), for designing and implementing dataflow-based signal processing applications on multi-core architectures. HMBE provides complementary capabilities to HTGS (Hybrid Task Graph Scheduler), a recently-introduced software tool for implementing scalable workflows for high performance computing applications on compute nodes with high core counts and multiple GPUs. HMBE integrates model-based design approaches, founded on dataflow principles, with advanced design optimization techniques provided in HTGS. This integration contributes to (a) making the application of HTGS more systematic and less time consuming, (b) incorporating additional dataflow-based optimization capabilities with HTGS optimizations, and (c) automating significant parts of the HTGS-based design process using a principled approach. In this paper, we present HMBE with an emphasis on the model-based design approaches and the novel dynamic scheduling techniques that are developed as part of the tool. We demonstrate the utility of HMBE via two case studies: an image stitching application for large microscopy images and a background subtraction application for multispectral video streams.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper the concept of sparsity for complex-valued variables is introduced in the following three types: directly in complex domain and for two real-valued pairs phase/amplitude and real/imaginary parts of complex variables. The nonlocal block-matching technique is used for sparsity implementation and filter design for each type of sparsity. These filters are complex domain generalizations of the Block Matching 3D collaborative (BM3D) filter based on the high-order singular value decomposition (HOSVD) in order to generate group-wise adaptive analysis/synthesis transforms. Complex domain denoising is developed and studied as a test-problem for comparison of the designed filters as well as the different types of sparsity modeling.
Research output: Contribution to journal › Article › Scientific › peer-review
There has been a great effort to transfer linear discriminant techniques that operate on vector data to high-order data, generally referred to as Multilinear Discriminant Analysis (MDA) techniques. Many existing works focus on maximizing the inter-class variances to intra-class variances defined on tensor data representations. However, there has not been any attempt to employ class-specific discrimination criteria for the tensor data. In this paper, we propose a multilinear subspace learning technique suitable for applications requiring class-specific tensor models. The method maximizes the discrimination of each individual class in the feature space while retains the spatial structure of the input. We evaluate the efficiency of the proposed method on two problems, i.e. facial image analysis and stock price prediction based on limit order book data.
INT=sgn,"Thanh Tran, Dat"
Research output: Contribution to journal › Article › Scientific › peer-review
In this editorial a short introduction to the special issue on Big Media Data Analysis is given. The scope of this Editorial is to briefly present methodologies, tasks and applications of big media data analysis and to introduce the papers of the special issue. The special issue includes six papers that span various media analysis application areas like generic image description, medical image and video analysis, distance calculation acceleration and data collection.
EXT="Tefas, Anastasios"
Research output: Contribution to journal › Editorial › Scientific › peer-review
Denoising is often addressed via sparse coding with respect to an overcomplete dictionary. There are two main approaches when the dictionary is composed of translates of an orthonormal basis. The first, traditionally employed by techniques such as wavelet cycle spinning, separately seeks sparsity w.r.t. each translate of the orthonormal basis, solving multiple partial optimizations and obtaining a collection of sparse approximations of the noise-free image, which are aggregated together to obtain a final estimate. The second approach, recently employed by convolutional sparse representations, instead seeks sparsity over the entire dictionary via a global optimization. It is tempting to view the former approach as providing a suboptimal solution of the latter. In this letter, we analyze whether global sparsity is a desirable property, and under what conditions the global optimization provides a better solution to the denoising problem. In particular, our experimental analysis shows that the two approaches attain comparable performance in case of natural images and global optimization outperforms the simpler aggregation of partial estimates only when the image admits an extremely sparse representation. We explain this phenomenon by separately studying the bias and variance of these solutions, and by noting that the variance of the global solution increases very rapidly as the original signal becomes less and less sparse.
EXT="Carrera, Diego"
EXT="Boracchi, Giacomo"
Research output: Contribution to journal › Article › Scientific › peer-review
In this study, we propose an unsupervised method for dictionary learning in audio signals. The new method, called binary nonnegative matrix deconvolution (BNMD), is developed and used to discover patterns from magnitude-scale spectrograms. The BNMD models an audio spectrogram as a sum of delayed patterns having binary gains (activations). Only small subsets of patterns can be active for a given spectrogram excerpt. The proposed method was applied to speaker identification and separation tasks. The experimental results show that dictionaries obtained by the BNMD bring much higher speaker identification accuracies averaged over a range of SNRs from -6 dB to 9 dB (91.3%) than the NMD-based dictionaries (37.8-75.4%). The BNMD also gives a benefit over dictionaries obtained using vector quantization (87.8%). For bigger dictionaries the difference between the BNMD and the vector quantization (VQ) is getting smaller. For the speech separation task the BNMD dictionary gave a slight improvement over the VQ.
EXT="Hurmalainen, Antti"
Research output: Contribution to journal › Article › Scientific › peer-review
This paper presents an integrated self-aware computing model mitigating the power dissipation of a heterogeneous reconfigurable multicore architecture by dynamically scaling the operating frequency of each core. The power mitigation is achieved by equalizing the performance of all the cores for an uninterrupted exchange of data. The multicore platform consists of heterogeneous Coarse-Grained Reconfigurable Arrays (CGRAs) of application-specific sizes and a Reduced Instruction-Set Computing (RISC) core. The CGRAs and the RISC core are integrated with each other over a Network-on-Chip (NoC) of six nodes arranged in a topology of two rows and three columns. The RISC core constantly monitors and controls the performance of each CGRA accelerator by adjusting the operating frequencies unless the performance of all the CGRAs is optimally balanced over the platform. The CGRA cores on the platform are processing some of the most computationally-intensive signal processing algorithms while the RISC core establishes packet based synchronization between the cores for computation and communication. All the cores can access each other’s computational and memory resources while processing the kernels simultaneously and independently of each other. Besides general-purpose processing and overall platform supervision, the RISC processor manages performance equalization among all the cores which mitigates the overall dynamic power dissipation by 20.7 % for a proof-of-concept test.
Research output: Contribution to journal › Article › Scientific › peer-review
The paper is addressed to 2D phase and amplitude estimation of complex-valued signals – that is, in particular, to estimation of modulo-2π interferometric phase images from periodic and noisy observations. These degradation mechanisms make phase image estimation a challenging problem. A sparse nonlocal data-adaptive imaging formalized in complex domain is used for phase and amplitude image reconstruction. Following the procedure of patch-based technique, the image is partitioned into small overlapping square patches. Block Matching Three Dimensional (BM3D) technique is developed for forming complex domain sparse spectral representations of complex-valued data. High Order Singular Value Decomposition (HOSVD) applied to BM3D groups enables the design of the orthonormal complex domain 3D transforms which are data adaptive and different for each BM3Ds group. An iterative version of the complex domain BM3D is designed from variational formulation of the problem. The convergence of this algorithm is shown. The effectiveness of the new sparse coding based algorithms is illustrated in simulation experiments where they demonstrate the state-of-the-art performance.
Research output: Contribution to journal › Article › Scientific › peer-review
In the last few years, large-scale image retrieval has attracted a lot of attention from the multimedia community. Usual approaches addressing this task first generate an initial ranking of the reference images using fast approximations that do not take into consideration the spatial arrangement of local features in the image (e.g., the bag-of-words paradigm). The top positions of the rankings are then re-estimated with verification methods that deal with more complex information, such as the geometric layout of the image. This verification step allows pruning of many false positives at the expense of an increase in the computational complexity, which may prevent its application to large-scale retrieval problems. This paper describes a geometric method known as neighborhood matching (NM), which revisits the keypoint matching process by considering a neighborhood around each keypoint and improves the efficiency of a geometric verification step in the image search system. Multiple strategies are proposed and compared to incorporate NM into a large-scale image retrieval framework. A detailed analysis and comparison of these strategies and baseline methods have been investigated. The experiments show that the proposed method not only improves the computational efficiency, but also increases the retrieval performance and outperforms state-of-the-art methods in standard datasets, such as the Oxford 5 k and 105 k datasets, for which the spatial verification step has a significant impact on the system performance.
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters and one class of FSMs are studied.
Research output: Contribution to journal › Article › Scientific › peer-review
We consider the discrete form of the one-dimensional phase retrieval (1-D DPhR) problem from the point of view of input magnitude data. The direct method can provide a solution to the 1-D DPhR problem if certain conditions are satisfied by the input magnitude data, namely the corresponding trigonometric polynomial must be nonnegative. To test positivity of a trigonometric polynomial a novel DFT-based criterion is proposed. We use this DFT criterion for different sets of input magnitude data to evaluate whether the direct method applied to the 1-D DPhR problem leads to a solution in all explored cases.
EXT="Rusu, Corneliu"
Research output: Contribution to journal › Article › Scientific › peer-review
Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing power compared to basestations. In this paper, we present high data rate implementations of broadband DPD on modern embedded processors, such as mobile GPU and multicore CPU, by taking advantage of emerging parallel computing techniques for exploiting their computing resources. We further verify the suppression effect of DPD experimentally on real radio hardware platforms. Performance evaluation results of our DPD design demonstrate the high efficacy of modern general purpose mobile processors on accelerating DPD processing for a mobile transmitter.
Research output: Contribution to journal › Article › Scientific › peer-review
Efficient sample rate conversion is of widespread importance in modern communication and signal processing systems. Although many efficient kinds of polyphase filterbank structures exist for this purpose, they are mainly geared toward serial, custom, dedicated hardware implementation for a single task. There is, therefore, a need for more flexible sample rate conversion systems that are resource-efficient, and provide high performance. To address these challenges, we present in this paper an all-software-based, fully parallel, multirate resampling method based on graphics processing units (GPUs). The proposed approach is well-suited for wireless communication systems that have simultaneous requirements on high throughput and low latency. Utilizing the multidimensional architecture of GPUs, our design allows efficient parallel processing across multiple channels and frequency bands at baseband. The resulting architecture provides flexible sample rate conversion that is designed to address modern communication requirements, including real-time processing of multiple carriers simultaneously.
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow programming has received increasing attention in the age of multicore and heterogeneous computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimization and deployment of applications. A great advance in dataflow programming has been the recent introduction of the RVC-CAL language. Having been standardized by the ISO, the RVC-CAL dataflow language provides a solid basis for the development of tools, design methodologies and design flows. This paper proposes a novel design flow for mapping RVC-CAL dataflow programs to parallel and heterogeneous execution platforms. Through the proposed design flow the programmer can describe an application in the RVC-CAL language and map it to multi- and many-core platforms, as well as GPUs, for efficient execution. The functionality and efficiency of the proposed approach is demonstrated by a parallel implementation of a video processing application and a run-time reconfigurable filter for telecommunications. Experiments are performed on GPU and multicore platforms with up to 16 cores, and the results show that for high-performance applications the proposed design flow provides up to 4 × higher throughput than the state-of-the-art approach in multicore execution of RVC-CAL programs.
Research output: Contribution to journal › Article › Scientific › peer-review
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3x and 1.8x speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.
Research output: Contribution to journal › Article › Scientific › peer-review
This paper proposes a complete lossless compression method for exploiting the redundancy of rectified light-field data. The light-field data consists of an array of rectified subaperture images, called for short views, which are segmented into regions according to an optimized partition of the central view. Each region of a view is predictively encoded using a specifically designed sparse predictor, exploiting the smoothness of each color component in the current view, and the cross-similarities with the other color components and already encoded neighbor views. The views are encoded sequentially, using a spiral scanning order, each view being predicted based on several similar neighbor views. The essential challenge for each predictor becomes choosing the most relevant regressors, from a large number of possible regressors belonging to the neighbor views. The proposed solution here is to couple sparse predictor design and minimum description length (MDL) principle, where the data description length is measured by an implementable code length, optimized for a class of probability models. The paper introduces a region merging segmentation under MDL criterion for partitioning the views into regions having their own specific sparse predictors. In experiments, several fast sparse design methods are considered. The proposed scheme is evaluated over a database of plenoptic images, achieving better lossless compression ratios than straightforward usage of standard image and video compression methods for the spiral sequence of views.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, modern CPU architecture with several different cache levels is described and current CPU performance limitations such as frequency increase bounds are discussed. As changes to the currently existing architecture are usually proposed as a way of increasing CPU performance, data rates of the internal and external CPU interfaces must be known. This information would help to assess the applicability of proposed solutions and to optimize them. This paper is aimed at obtaining real values of traffic on an L2–L3 cache interface inside a CPU and a CPU–RAM bus load, as well as showing the dependences of the total traffic on the studied interfaces on the number of active cores, CPU frequency, and test type. A measurement methodology using an Intel Performance Counter Monitor is provided and the equations used to obtain data rates from the internal CPU counters are explained. Both real-life and synthetic tests are described. The dependence of total traffic on the number of active cores and the dependence of total traffic on CPU frequency are provided as plots. The dependence of total traffic on test type is provided as a bar plot for multiple CPU frequencies.
INT=elt,"Komar, M. S."
Research output: Contribution to journal › Article › Scientific › peer-review
Tracking algorithms have important applications in detection of humans and vehicles for border security and other areas. For large-scale deployment of such algorithms, it is critical to provide methods for their cost- and energy-efficient realization. To this end, commodity mobile devices have significant potential for use as prototyping and testing platforms due to their low cost, widespread availability, and integration of advanced communications, sensing, and processing features. Prototypes developed on mobile platforms can be tested, fine-tuned, and demonstrated in the field and then provide reference implementations for application-specific disposable sensor node implementations that are targeted for deployment. In this paper, we develop a novel, adaptive tracking system that is optimized for energy-efficient, real-time operation on off-the-shelf mobile platforms. Our tracking system applies principles of dynamic data-driven application systems (DDDAS) to periodically monitor system operating characteristics and apply these measurements to dynamically adapt the specific classifier configurations that the system employs. Our resulting adaptive approach enables powerful optimization of trade-offs among energy consumption, real-time performance, and tracking accuracy based on time-varying changes in operational characteristics. Through experiments employing an Android-based tablet platform, we demonstrate the efficiency of our proposed tracking system design for multimode detection of human and vehicle targets.
Research output: Contribution to journal › Article › Scientific › peer-review
The standard median filter based on a symmetric moving window has only one tuning parameter: the window width. Despite this limitation, this filter has proven extremely useful and has motivated a number of extensions: weighted median filters, recursive median filters, and various cascade structures. The Hampel filter is a member of the class of decsion filters that replaces the central value in the data window with the median if it lies far enough from the median to be deemed an outlier. This filter depends on both the window width and an additional tuning parameter t, reducing to the median filter when t=0, so it may be regarded as another median filter extension. This paper adopts this view, defining and exploring the class of generalized Hampel filters obtained by applying the median filter extensions listed above: weighted Hampel filters, recursive Hampel filters, and their cascades. An important concept introduced here is that of an implosion sequence, a signal for which generalized Hampel filter performance is independent of the threshold parameter t. These sequences are important because the added flexibility of the generalized Hampel filters offers no practical advantage for implosion sequences. Partial characterization results are presented for these sequences, as are useful relationships between root sequences for generalized Hampel filters and their median-based counterparts. To illustrate the performance of this filter class, two examples are considered: one is simulation-based, providing a basis for quantitative evaluation of signal recovery performance as a function of t, while the other is a sequence of monthly Italian industrial production index values that exhibits glaring outliers.
Research output: Contribution to journal › Article › Scientific › peer-review
This paper presents an algorithm for unsupervised single-channel source separation of audio mixtures. The approach specifically addresses the challenging case of separation where no training data are available. By representing mixtures in the modulation spectrogram (MS) domain, we exploit underlying similarities in patterns present across frequency. A three-dimensional tensor factorization is able to take advantage of these redundant patterns, and is used to separate a mixture into an approximated sum of components by minimizing a divergence cost. Furthermore, we show that the basic tensor factorization can be extended with convolution in time being used to improve separation results and provide update rules to learn components in such a manner. Following factorization, sources are reconstructed in the audio domain from estimated components using a novel approach based on reconstruction masks that are learned using MS activations, and then applied to a mixture spectrogram. We demonstrate that the proposed method produces superior separation performance to a spectrally based nonnegative matrix factorization approach, in terms of source-to-distortion ratio. We also compare separation with the perceptually motivated interference-related perceptual score metric and identify cases with higher performance.
Research output: Contribution to journal › Article › Scientific › peer-review
The classification of Human Epithelial (HEp-2) cells images, acquired through Indirect Immunofluorescence (IIF) microscopy, is an effective method to identify staining patterns in patient sera. Indeed it can be used for diagnostic purposes, in order to reveal autoimmune diseases. However, the automated classification of IIF HEp-2 cell patterns represents a challenging task, due to the large intra-class and the small inter-class variability. Consequently, recent HEp-2 cell classification contests have greatly spurred the development of new IIF image classification systems.Here we propose an approach for the automatic classification of IIF HEp-2 cell images by fusion of several texture descriptors by ensemble of support vector machines combined by sum rule. Its effectiveness is evaluated using the HEp-2 cells dataset used for the "Performance Evaluation of Indirect Immunofluorescence Image Analysis Systems" contest, hosted by the International Conference on Pattern Recognition in 2014: the accuracy on the testing set is 79.85%.The same dataset was used to test an ensemble of ternary-encoded local phase quantization descriptors, built by perturbation approaches: the accuracy on the training set is 84.16%. Finally, this ensemble was validated on 14 additional datasets, obtaining the best performance on 11 datasets.Our MATLAB code is available at https://www.dei.unipd.it/node/2357.
Research output: Contribution to journal › Article › Scientific › peer-review
Image interpolation offers an efficient way to compose a high-resolution (HR) image from the observed low-resolution (LR) image. Advanced interpolation techniques design the interpolation weighting coefficients by solving a minimum mean-square-error (MMSE) problem in which the local geometric similarity is often considered. However, using local geometric similarities cannot usually make the MMSE-based interpolation as reliable as expected. To solve this problem, we propose a robust interpolation scheme by using the nonlocal geometric similarities to construct the HR image. In our proposed method, the MMSE-based interpolation weighting coefficients are generated by solving a regularized least squares problem that is built upon a number of dual-reference patches drawn from the given LR image and regularized by the directional gradients of these patches. Experimental results demonstrate that our proposed method offers a remarkable quality improvement as compared to some state-of-the-art methods, both objectively and subjectively.
Research output: Contribution to journal › Article › Scientific › peer-review
Background and objectives Due to development of imaging systems the amount of digital images obtained in the biological field has been growing in recent years. These images contain information that is not directly measurable, e.g. the area covered by a single cell. In most of the current imaging programs the regions of interest (ROI), e.g. individual cells, need to be manually outlined. Automation of processing and analyzing the images would ease researchers’ workload and provide results that are more reliable. In this work our goal was to write software that automatically segments human cardiomyocytes from images, calculates their areas and variations in the direction of the largest and smallest spread. Results We developed software that eased the workload of biomedical laboratory personnel such that they do not have to do manual image segmentation or learn to use software that requires programming skills. The software made a correct segmentation in most of the cases and outperformed the intensity oriented baseline method written in ImageJ in 95% of comparisons. The baseline method estimated cell- and background areas by averaging dark background and bright foreground areas. Conclusions Our software can be used in the calculation of cell areas and extents in the case where immunolabeled cells are imaged with a fluorescent microscope. In the future the functionality of the program could be extended with machine learning methods that use the user actions as teaching material in the cases where automatic segmentation fails.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we describe a method for the determination of a subspace of the feature space in kernel methods, which is suited to large-scale learning problems. Linear model learning in the obtained space corresponds to a nonlinear model learning process in the input space. Since the obtained feature space is determined only by exploiting properties of the training data, this approach can be used for generic nonlinear pattern recognition. That is, nonlinear data mapping can be considered to be a pre-processing step exploiting nonlinear relationships between the training data. Linear techniques can be subsequently applied in the new feature space and, thus, they can model nonlinear properties of the problem at hand. In order to appropriately address the inherent problem of kernel learning methods related to their time and memory complexities, we follow an approximate learning approach. We show that the method can lead to considerable operation speed gains and achieve very good performance. Experimental results verify our analysis.
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
We denoise Poisson images with an iterative algorithm that progressively improves the effectiveness of variance-stabilizing transformations (VST) for Gaussian denoising filters. At each iteration, a combination of the Poisson observations with the denoised estimate from the previous iteration is treated as scaled Poisson data and filtered through a VST scheme. Due to the slight mismatch between a true scaled Poisson distribution and this combination, a special exact unbiased inverse is designed. We present an implementation of this approach based on the BM3D Gaussian denoising filter. With a computational cost at worst twice that of the noniterative scheme, the proposed algorithm provides significantly better quality, particularly at low signal-to-noise ratio, outperforming much costlier state-of-the-art alternatives.
Research output: Contribution to journal › Article › Scientific › peer-review
Multirate filter banks can be implemented efficiently using fast-convolution (FC) processing. The main advantage of the FC filter banks (FC-FB) compared with the conventional polyphase implementations is their increased flexibility, that is, the number of channels, their bandwidths, and the center frequencies can be independently selected. In this paper, an approach to optimize the FC-FBs is proposed. First, a subband representation of the FC-FB is derived. Then, the optimization problems are formulated with the aid of the subband model. Finally, these problems are conveniently solved with the aid of a general nonlinear optimization algorithm. Several examples are included to demonstrate the proposed overall design scheme as well as to illustrate the efficiency and the flexibility of the resulting FC-FB.
Research output: Contribution to journal › Article › Scientific › peer-review
The visual voice activity detection (V-VAD) problem in unconstrained environments is investigated in this paper. A novel method for V-VAD in the wild, exploiting local shape and motion information appearing at spatiotemporal locations of interest for facial video segment description and the bag of words model for facial video segment representation, is proposed. Facial video segment classification is subsequently performed using the state-of-The-Art classification algorithms. Experimental results on one publicly available V-VAD dataset denote the effectiveness of the proposed method, since it achieves better generalization performance in unseen users, when compared to the recently proposed state-of-The-Art methods. Additional results on a new unconstrained dataset provide evidence that the proposed method can be effective even in such cases in which any other existing method fails.
Research output: Contribution to journal › Article › Scientific › peer-review
This letter is devoted to the problem of rotation invariant texture classification. Novel rotation invariant feature, symmetric dense microblock difference (SDMD), is proposed which captures the information at different orientations and scales. N-fold symmetry is introduced in the feature design configuration, while retaining the random structure that provides discriminative power. The symmetry is utilized to achieve a rotation invariance. The SDMD is extracted using an image pyramid and encoded by the Fisher vector approach resulting in a descriptor which captures variations at different resolutions without increasing the dimensionality. The proposed image representation is combined with the linear SVM classifier. Extensive experiments are conducted on four texture data sets [Brodatz, UMD, UIUC, and Flickr material data set (FMD)] using standard protocols. The results demonstrate that our approach outperforms the state of the art in texture classification. The MATLAB code is made available.1 1Matlab Code: http://www.cs.tut.fi/~mehta/symdmd.
Research output: Contribution to journal › Article › Scientific › peer-review
Modern communication systems have frequency bands that are shared with multiple channels. A typical front-end transceiver is responsible for processing multiple channels simultaneously within the band. Although it is simpler to design a dedicated sub-transceiver for each channel, the overall cost of implementation is prohibitive. It is more desirable to design a single system that can process many channels at the same time. However, it is difficult to realize such a transceiver in wireless communication systems, since it needs to accommodate different standards, data rates, sampling rates, etc. In this paper, we address the challenges of flexible, cost-effective, multi-channel implementation of wideband receiver systems. In particular, we develop a novel implementation by applying graphics processing unit (GPU) technology to a wideband receiver that processes multiple channels at the same time, including channelization and arbitrary resampling. Our proposed new receiver architecture is flexible and reconfigurable via software, while providing high throughput and low latency.
Research output: Contribution to journal › Article › Scientific › peer-review
Nanoscale electromechanical wireless communication with on-off keying in the very high frequency (VHF) band (30-300 MHz) is studied for a receiver using a carbon nanotube (CNT). Previous studies on this topic have only considered continuous wave (CW) on-off keying which suffers from spectral widening due to sharp changes in the signal. Effects of the inter-symbol interference (ISI), the co-channel interference, and the adjacent channel interference on the received signal statistics have not been analyzed. The rise- and fall-times associated with the filtering of the incoming signal by the mechanical frequency response of the receiver's CNT have also been ignored. In this paper, Fourier-series based modeling and statistical analysis of decision variables are performed. The results and modeling in this study enable performance evaluation of CNT based receivers with an arbitrary number of interfering signals with arbitrary pulse shapes, and fully incorporates the transient signal components. Received signal statistics under interference are derived using the developed model. Numerical results are presented for Hanning pulse and trapezoidal pulse (which includes rectangular pulses corresponding to CW as a special case). The required guard intervals between pulses to mitigate ISI, required frequency separation between channels, and required spatial separation of co-channel networks (frequency reuse distance) are shown. These results show that large frequency reuse distance is required, limiting efficient spectrum utilization. However, the ISI and adjacent channel interference can be controlled more easily with a proper selection of parameters.
INT=elt,"Bicen, A. Ozan"
Research output: Contribution to journal › Article › Scientific › peer-review
A multi-wavelengths analysis for pulse waveform extraction using laser speckle is conducted. The proposed system consists of three coherent light sources (532 nm, 635 nm, 850 nm). A bench-test composed of a moving skin-like phantom (silicone membrane) is used to compare the results obtained from different wavelengths. The system is able to identify a skin-like phantom vibration frequency, within physiological values, with a minimum error of 0.5 mHz for the 635 nm and 850 nm wavelengths and a minimum error of 1.3 mHz for the 532 nm light wavelength using a FFT-based algorithm. The phantom velocity profile is estimated with an error ranging from 27% to 9% using a bidimensional correlation coefficient-based algorithm. An in vivo trial is also conducted, using the 532 nm and 635 nm laser sources. The 850 nm light source has not been able to extract the pulse waveform. The heart rate is identified with a minimum error of 0.48 beats per minute for the 532 nm light source and a minimal error of 1.15 beats per minute for the 635 nm light source. Our work reveals that a laser speckle-based system with a 532 nm wavelength is able to give arterial pulse waveform with better results than those given with a 635 nm laser.
Research output: Contribution to journal › Article › Scientific › peer-review
As manufacturing industries are transforming towards service orientation, predicting the costs of product-service systems is becoming essential. Simulation is one possibility for evaluating the costs and risks involved in product-service systems, such as extended warranty agreements. We conducted a case study with a globally operating manufacturer of industrial goods who also provides services for the equipment. We created equipment performance simulation (EPSi) models and a tool, EPSitor, for using the models in predicting extended warranty costs. However, reliable simulation results require good quality maintenance and operation data from existing installations. We discovered that it is difficult to collect the data needed for simulations and there were many challenges with data quality. Quality problems were mainly observed in manually collected data. Insufficient data quality leads to a wider margin of error in the simulation models, which increases business risk. Identifying these challenges is the first step in transforming the data collection routines to support equipment performance simulations. The key to long-term business benefits of simulation is to acknowledge the importance of data quality and to establish efficient data collection routines. Future research should find ways to motivate maintenance technicians to collect good quality data. This would contribute to more accurate cost analysis and thus to better profitability of extended warranty contracts.
INT=mei,”Jokinen, Juuso”
Research output: Contribution to journal › Article › Scientific › peer-review
The normalised innovation squared (NIS) test, which is used to assess whether a Kalman filter's noise assumptions are consistent with realised measurements, can be applied online with real data, and does not require future data, repeated experiments or knowledge of the true state. In this work, it is shown that the NIS test is equivalent to three other model criticism procedures, which are as follows: (i) it can be derived as a Bayesian p-test for the prior predictive distribution; (ii) as a nested-model parameter significance test; and (iii) from a recently-proposed filter residual test. A new NIS-like test corresponding to a posterior predictive Bayesian p-test is presented.
Research output: Contribution to journal › Article › Scientific › peer-review
The problem of how to automatically provide a desired (required) visual quality in lossy compression of still images and video frames is considered in this paper. The quality can be measured based on different conventional and visual quality metrics. In this paper, we mainly employ human visual system (HVS) based metrics PSNR-HVS-M and MSSIM since both of them take into account several important peculiarities of HVS. To provide a desired visual quality with high accuracy, iterative image compression procedures are proposed and analyzed. An experimental study is performed for a large number of grayscale test images. We demonstrate that there exist several coders for which the number of iterations can be essentially decreased using a reasonable selection of the starting value and the variation interval for the parameter controlling compression (PCC). PCC values attained at the end of the iterative procedure may heavily depend upon the coder used and the complexity of the image. Similarly, the compression ratio also considerably depends on the above factors. We show that for some modern coders that take HVS into consideration it is possible to give practical recommendations on setting a fixed PCC to provide a desired visual quality in a non-iterative manner. The case when original images are corrupted by visible noise is also briefly studied.
Research output: Contribution to journal › Article › Scientific › peer-review
Wireless standards are evolving rapidly due to the exponential growth in the number of portable devices along with the applications with high data rate requirements. Adaptable software based signal processing implementations for these devices can make the deployment of the constantly evolving standards faster and less expensive. The flagship technology from the IEEE WLAN family, the IEEE 802.11ac, aims at achieving very high throughputs in local area connectivity scenarios. This article presents a software based implementation for the Multiple Input and Multiple Output (MIMO) transmitter and receiver baseband processing conforming to the IEEE 802.11ac standard which can achieve transmission bit rates beyond 1Gbps. This work focuses on the Physical layer frequency domain processing. Various configurations, including 2×2 and 4×4 MIMO are considered for the implementation. To utilize the available data and instruction level parallelism, a DSP core with vector extensions is selected as the implementation platform. Then, the feasibility of the presented software-based solution is assessed by studying the number of clock cycles and power consumption of the different scenarios implemented on this core. Such Software Defined Radio based approaches can potentially offer more flexibility, high energy efficiency, reduced design efforts and thus shorter time-to-market cycles in comparison with the conventional fixed-function hardware methods.
ORG=elt,0.5
ORG=tie,0.5
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow modeling offers a myriad of tools for designing and optimizing signal processing systems. A designer is able to take advantage of dataflow properties to effectively tune the system in connection with functionality and different performance metrics. However, a disparity in the specification of dataflow properties and the final implementation can lead to incorrect behavior that is difficult to detect. This motivates the problem of ensuring consistency between dataflow properties that are declared or otherwise assumed as part of dataflow-based application models, and the dataflow behavior that is exhibited by implementations that are derived from the models. In this paper, we address this problem by introducing a novel dataflow validation framework (DVF) that is able to identify disparities between an application’s formal dataflow representation and its implementation. DVF works by instrumenting the implementation of an application and monitoring the instrumentation data as the application executes. This monitoring process is streamlined so that DVF achieves validation without major overhead. We demonstrate the utility of our DVF through design and implementation case studies involving an automatic speech recognition application, a JPEG encoder, and an acoustic tracking application.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we present a novel rotation-invariant and computationally efficient texture descriptor called Dominant Rotated Local Binary Pattern (DRLBP). A rotation invariance is achieved by computing the descriptor with respect to a reference in a local neighborhood. A reference is fast to compute maintaining the computational simplicity of the Local Binary Patterns (LBP). The proposed approach not only retains the complete structural information extracted by LBP, but it also captures the complimentary information by utilizing the magnitude information, thereby achieving more discriminative power. For feature selection, we learn a dictionary of the most frequently occurring patterns from the training images, and discard redundant and non-informative features. To evaluate the performance we conduct experiments on three standard texture datasets: Outex12, Outex 10 and KTH-TIPS. The performance is compared with the state-of-the-art rotation invariant texture descriptors and results show that the proposed method is superior to other approaches.
Research output: Contribution to journal › Article › Scientific › peer-review
Textures are typical elements of natural scene images widely used in pattern recognition and image classification. Noise, often being present in acquired images, deteriorates texture features (characteristics), and it is desirable both to suppress it and to preserve a texture. This task is quite difficult even for the most advanced filters, and the resulting denoising efficiency can be quite low. Due to this, it is desirable to predict a denoising efficiency before filtering to decide whether it is worth filtering a given image or not. In this paper, we analyze several quantitative criteria (metrics) that can characterize filtering efficiency. Prediction strategy is described and its accuracy is studied. Several modern filtering techniques are analyzed and compared. Based on this, practical recommendations are given.
Research output: Contribution to journal › Article › Scientific › peer-review
The design, optimization, and validation of many image processing or image-based analysis systems often requires testing of the system performance over a dataset of images corrupted by noise at different signal-to-noise ratio regimes. A noise-free ground-truth image may not be available, and different SNRs are simulated by injecting extra noise into an already noisy image. However, noise in real-world systems is typically signal-dependent, with variance determined by the noise-free image. Thus, also the noise to be injected shall depend on the unknown ground-truth image. To circumvent this issue, we consider the additive injection of noise in variance-stabilized range, where no previous knowledge of the ground-truth signal is necessary. Specifically, we design a special noise-injection operator that prevents the errors on expectation and variance that would otherwise arise when standard variance-stabilizing transformations are used for this task. Thus, the proposed operator is suitable for accurately injecting signal-dependent noise even to images acquired at very low counts.
Research output: Contribution to journal › Article › Scientific › peer-review
Bayesian networks have become popular for modeling probabilistic relationships between entities. As their structure can also be given a causal interpretation about the studied system, they can be used to learn, for example, regulatory relationships of genes or proteins in biological networks and pathways. Inference of the Bayesian network structure is complicated by the size of the model structure space, necessitating the use of optimization methods or sampling techniques, such Markov Chain Monte Carlo (MCMC) methods. However, convergence of MCMC chains is in many cases slow and can become even a harder issue as the dataset size grows. We show here how to improve convergence in the Bayesian network structure space by using an adjustable proposal distribution with the possibility to propose a wide range of steps in the structure space, and demonstrate improved network structure inference by analyzing phosphoprotein data from the human primary T cell signaling network.
EXT="Lähdesmäki, Harri"
Research output: Contribution to journal › Article › Scientific › peer-review
Green communication and energy saving have been a critical issue in modern wireless communication systems. The concepts of energy harvesting and energy transfer are recently receiving much attention in academic research field. In this paper, we study energy cooperation problems based on save-then-transmit protocol and propose two energy cooperation schemes for different system models: two-node communication model and three-node relay communication model. In both models, all of the nodes transmitting information have no fixed energy supplies and gain energy only via wireless energy harvesting from nature. Besides, these nodes also follow a save-then-transmit protocol. Namely, for each timeslot, a fraction (referred to as save-ratio) of time is devoted exclusively to energy harvesting while the remaining fraction is used for data transmission. In order to maximize the system throughput, energy transfer mechanism is introduced in our schemes, i.e., some nodes are permitted to share their harvested energy with other nodes by means of wireless energy transfer. Simulation results demonstrate that our proposed schemes can outperform both the schemes with half-allocate save-ratio and the schemes without energy transfer in terms of throughput performance, and also characterize the dependencies of system throughput, transferred energy, and save-ratio on energy harvesting rate.
Research output: Contribution to journal › Article › Scientific › peer-review
Discriminative part-based models have become the approach for visual object detection. The models learn from a large number of positive and negative examples with annotated class labels and location (bounding box). In contrast, we propose a part-based generative model that learns from a small number of positive examples. This is achieved by utilizing "privileged information", sparse class-specific landmarks with semantic meaning. Our method uses bio-inspired complex-valued Gabor features to describe local parts. Gabor features are transformed to part probabilities by unsupervised Gaussian Mixture Model (GMM). GMM estimation is robustified for a small amount of data by a randomization procedure inspired by random forests. The GMM framework is also used to construct a probabilistic spatial model of part configurations. Our detector is invariant to translation, rotation and scaling. On part level invariance is achieved by pose quantization which is more efficient than previously proposed feature transformations. In the spatial model, invariance is achieved by mapping parts to an "aligned object space". Using a small number of positive examples our generative method performs comparably to the state-of-the-art discriminative method.
EXT="Riabchenko, Ekaterina"
Research output: Contribution to journal › Article › Scientific › peer-review
In this work, we present a novel method for approximating a normal distribution with a weighted sum of normal distributions. The approximation is used for splitting normally distributed components in a Gaussian mixture filter, such that components have smaller covariances and cause smaller linearization errors when nonlinear measurements are used for the state update. Our splitting method uses weights from the binomial distribution as component weights. The method preserves the mean and covariance of the original normal distribution, and in addition, the resulting probability density and cumulative distribution functions converge to the original normal distribution when the number of components is increased. Furthermore, an algorithm is presented to do the splitting such as to keep the linearization error below a given threshold with a minimum number of components. The accuracy of the estimate provided by the proposed method is evaluated in four simulated single-update cases and one time series tracking case. In these tests, it is found that the proposed method is more accurate than other Gaussian mixture filters found in the literature when the same number of components is used and that the proposed method is faster and more accurate than particle filters.
ORG=ase,0.75
ORG=mat,0.25
Research output: Contribution to journal › Article › Scientific › peer-review
Bayesian inference often requires efficient numerical approximation algorithms, such as sequential Monte Carlo (SMC) and Markov chain Monte Carlo (MCMC) methods. The Gibbs sampler is a well-known MCMC technique, widely applied in many signal processing problems. Drawing samples from univariate full-conditional distributions efficiently is essential for the practical application of the Gibbs sampler. In this work, we present a simple, self-tuned and extremely efficient MCMC algorithm which produces virtually independent samples from these univariate target densities. The proposal density used is self-tuned and tailored to the specific target, but it is not adaptive. Instead, the proposal is adjusted during an initial optimization stage, following a simple and extremely effective procedure. Hence, we have named the newly proposed approach as FUSS (Fast Universal Self-tuned Sampler), as it can be used to sample from any bounded univariate distribution and also from any bounded multi-variate distribution, either directly or by embedding it within a Gibbs sampler. Numerical experiments, on several synthetic data sets (including a challenging parameter estimation problem in a chaotic system) and a high-dimensional financial signal processing problem, show its good performance in terms of speed and estimation accuracy.
Research output: Contribution to journal › Article › Scientific › peer-review
Indoor positioning based on wireless local area network (WLAN) signals is often enhanced using pedestrian dead reckoning (PDR) based on an inertial measurement unit. The state evolution model in PDR is usually nonlinear. We present a new linear state evolution model for PDR. In simulated-data and real-data tests of tightly coupled WLAN-PDR positioning, the positioning accuracy with this linear model is better than with the traditional models when the initial heading is not known, which is a common situation. The proposed method is computationally light and is also suitable for smoothing. Furthermore, we present modifications to WLAN positioning based on Gaussian coverage areas and show how a Kalman filter using the proposed model can be used for integrity monitoring and (re)initialization of a particle filter.
Research output: Contribution to journal › Article › Scientific › peer-review
This article combines algorithm development, thorough analysis and implementation of sign-bit (SB) estimation techniques for symbol timing, carrier frequency offset (CFO) and signal-to-noise ratio (SNR) in orthogonal frequency division multiplexing receivers. The SB estimation is compared in terms of performance and hardware complexity to an equivalent implementation with higher quantization. The techniques are demonstrated by simulation of a SB time/frequency and SB-SNR estimator for 3rd Generation Partnership Project long-term evolution (LTE) cell search in 65-nm technology operating at nominal voltage of 1.2 V. According to post-layout power simulations with toggling information, the architecture estimates the corresponding CFO and SNR for as little as $$479\,\upmu \hbox {W}$$479μW average power for LTE-R8/10, while occupying a silicon area as small as $$0.03\,\hbox {mm}^2$$0.03<sup>mm2</sup>. Even though SB estimation experiences some relative performance penalty when compared to 8-bit quantization, this paper demonstrates various advantages and the potential of employing these techniques in low-complexity terminals.
Research output: Contribution to journal › Article › Scientific › peer-review
Filtering and smoothing algorithms for linear discrete-time state-space models with skewed and heavy-tailed measurement noise are presented. The algorithms use a variational Bayes approximation of the posterior distribution of models that have normal prior and skew-t-distributed measurement noise. The proposed filter and smoother are compared with conventional low-complexity alternatives in a simulated pseudorange positioning scenario. In the simulations the proposed methods achieve better accuracy than the alternative methods, the computational complexity of the filter being roughly 5 to 10 times that of the Kalman filter.
Research output: Contribution to journal › Article › Scientific › peer-review
Accurate fading characterization and channel capacity determination are of paramount importance in both conventional and emerging communication systems. The present work addresses the non-linearity of the propagation medium and its effects on the channel capacity. Such fading conditions are first characterized using information theoretic measures, namely, Shannon entropy, cross entropy and relative entropy. The corresponding effects on the channel capacity with and without power adaptation are then analyzed. Closed-form expressions are derived and validated through computer simulations. It is shown that the effects of nonlinearities are significantly larger than those of fading parameters such as the scattered-wave power ratio, and the correlation coefficient between the in-phase and quadrature components in each cluster of multipath components.
Research output: Contribution to journal › Article › Scientific › peer-review
This paper presents an analysis of the recently proposed sparse extreme learning machine (S-ELM) classifier and describes an optimization scheme that can be used to calculate the network output weights. This optimization scheme exploits intrinsic graph structures in order to describe geometric data relationships in the so-called ELM space. Kernel formulations of the approach operating in ELM spaces of arbitrary dimensions are also provided. It is shown that the application of the optimization scheme exploiting geometric data relationships in the original ELM space is equivalent to the application of the original S-ELM to a transformed ELM space. The experimental results show that the incorporation of geometric data relationships in S-ELM can lead to enhanced performance.
Research output: Contribution to journal › Article › Scientific › peer-review
Exemplar-based speech enhancement systems work by decomposing the noisy speech as a weighted sum of speech and noise exemplars stored in a dictionary and use the resulting speech and noise estimates to obtain a time-varying filter in the full-resolution frequency domain to enhance the noisy speech. To obtain the decomposition, exemplars sampled in lower dimensional spaces are preferred over the full-resolution frequency domain for their reduced computational complexity and the ability to better generalize to unseen cases. But the resulting filter may be sub-optimal as the mapping of the obtained speech and noise estimates to the full-resolution frequency domain yields a low-rank approximation. This paper proposes an efficient way to directly compute the full-resolution frequency estimates of speech and noise using coupled dictionaries: an input dictionary containing atoms from the desired exemplar space to obtain the decomposition and a coupled output dictionary containing exemplars from the full-resolution frequency domain. We also introduce modulation spectrogram features for the exemplar-based tasks using this approach. The proposed system was evaluated for various choices of input exemplars and yielded improved speech enhancement performances on the AURORA-2 and AURORA-4 databases. We further show that the proposed approach also results in improved word error rates (WERs) for the speech recognition tasks using HMM-GMM and deep-neural network (DNN) based systems.
Research output: Contribution to journal › Article › Scientific › peer-review
Regularized linear models are important classification methods for high dimensional problems, where regularized linear classifiers are often preferred due to their ability to avoid overfitting. The degree of freedom of the model dis determined by a regularization parameter, which is typically selected using counting based approaches, such as K-fold cross-validation. For large data, this can be very time consuming, and, for small sample sizes, the accuracy of the model selection is limited by the large variance of CV error estimates. In this paper, we study the applicability of a recently proposed Bayesian error estimator for the selection of the best model along the regularization path. We also propose an extension of the estimator that allows model selection in multiclass cases and study its efficiency with L-1 regularized logistic regression and L-2 regularized linear support vector machine. The model selection by the new Bayesian error estimator is experimentally shown to improve the classification accuracy, especially in small sample-size situations, and is able to avoid the excess variability inherent to traditional cross-validation approaches. Moreover, the method has significantly smaller computational complexity than cross-validation. (C) 2015 Elsevier Ltd. All rights reserved.
EXT="Tohka, Jussi"
Research output: Contribution to journal › Article › Scientific › peer-review
Novel analytic solutions are derived for integrals that involve the generalized Marcum Q -function, exponential functions and arbitrary powers. Simple closed-form expressions are also derived for specific cases of the generic integrals. The offered expressions are both convenient and versatile, which is particularly useful in applications relating to natural sciences and engineering, including wireless communications and signal processing. To this end, they are employed in the derivation of the average probability of detection in energy detection of unknown signals over multipath fading channels as well as of the channel capacity with fixed rate and channel inversion in the case of correlated multipath fading and switched diversity.
Research output: Contribution to journal › Article › Scientific › peer-review
Coordinate descent (CD) is a simple optimization technique suited to low complexity requirements and also for solving large problems. In randomized version, CD was recently shown as very effective for solving least-squares (LS) and other optimization problems. We propose here an adaptive version of randomized coordinate descent (RCD) for finding sparse LS solutions, from which we derive two algorithms, one based on the lasso criterion, the other using a greedy technique. Both algorithms employ a novel way of adapting the probabilities for choosing the coordinates, based on a matching pursuit criterion. Another new feature is that, in the lasso algorithm, the penalty term values are built without knowing the noise level or using other prior information. The proposed algorithms use efficient computations and have a tunable trade-off between complexity and performance through the number of CD steps per time instant. Besides a general theoretical convergence analysis, we present simulations that show good practical behavior, comparable to or better than that of state of the art methods.
Research output: Contribution to journal › Article › Scientific › peer-review
The compressed sensing (CS) theory has been successfully applied to image compression in the past few years as most image signals are sparse in a certain domain. In this paper, we focus on how to improve the sampling efficiency for CS-based image compression by using our proposed adaptive sampling mechanism on the block-based CS (BCS), especially the reweighted one. To achieve this goal, two solutions are developed at the sampling side and reconstruction side, respectively. The proposed sampling mechanism allocates the CS-measurements to image blocks according to the statistical information of each block so as to sample the image more efficiently. A generic allocation algorithm is developed to help assign CS-measurements and several allocation factors derived in the transform domain are used to control the overall allocation in both solutions. Experimental results demonstrate that our adaptive sampling scheme offers a very significant quality improvement as compared with traditional non-adaptive ones.
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow process networks provide a versatile model of computation for specifying signal processing applications in a platform independent fashion. This attractive feature of dataflow has lately been realized in dataflow programming tools that allow synthesizing the same application specification as both fixed hardware circuits and as software for programmable processors. However, in practice, the specification granularity of the dataflow program remains an arbitrary choice of the designer. Dataflow specifications of the same application with equivalent I/O behaviour can range from a single dataflow actor to a very fine grained network composed of elementary processing operations. A very fine grained dataflow specification might result into a high performance implementation when synthesized as hardware, but might perform poorly when executed on a programmable processor. This article presents actor merging as one solution for this performance portability problem of dataflow programs. In contrast to previous work around actor merging, this article presents a methodology that can merge also dynamic dataflow actors. To support these claims, results of experiments on several processing platforms and application examples ranging from telecommunications to video compression are reported.
Research output: Contribution to journal › Article › Scientific › peer-review
Intermodulation products arise as a result of low noise amplifier (LNA) and mixer non-linearities in wideband receivers. In the presence of strong blockers, the intermodulation distortion can deteriorate the spectrum sensing performance by causing false alarms and degrading the detection probability. We theoretically analyze the impact of third-order non-linearities on the detection and false alarm probabilities for both energy detectors and cyclostationary detectors under front-end LNA non-linearities. We show that degradation of the detection performance due to nonlinearities of both energy and cyclostationary detection is strongly dependent on the modulation type of the blockers. We then propose two DSP-enhanced receiver architectures to compensate the impact of nonlinearities. The first approach is a post-processing technique which compensates for nonlinearities effect on the test statistic by adapting the sensing time and detection threshold. The second approach is a pre-processing method that compensates by correcting received samples prior to computing the test statistic. This approach is based on adaptively estimating the intermodulation distortion, weighting it by a scalar constant and subtracting it from the subband of interest. We propose a method to adaptively compute the optimal weighting coefficient and show that it depends on the power and modulation of the blockers. Our results show that the pre-processing sample-based compensation method is more effective and that clear dynamic range extension can be obtained by using intermodulation compensation without resorting to increasing the sensing time. We also study the impact of uncertainties about the knowledge or estimates for nonlinearity parameters.
EXT="Hagh Ghadam, Ali Shahed"
Research output: Contribution to journal › Article › Scientific › peer-review
We construct multidimensional interpolating tensor product multiresolution analyses (MRA's) of the function spaces C<inf>0</inf>(R<sup>n</sup>,K), K = R or K = C, consisting of real or complex valued functions on R<sup>n</sup> vanishing at infinity and the function spaces Cu(R<sup>n</sup>,K) consisting of bounded and uniformly continuous functions on R<sup>n</sup>. We also construct an interpolating dual MRA for both of these spaces. The theory of the tensor products of Banach spaces is used. We generalize the Besov space norm equivalence from the one-dimensional case to our n-dimensional construction.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we discuss the connection of the kernel versions of the ELM classifier with infinite Single-hidden Layer Feedforward Neural networks and show that the original ELM kernel definition can be adopted for the calculation of the ELM kernel matrix for two of the most common activation functions, i.e., the RBF and the sigmoid functions. In addition, we show that a low-rank decomposition of the kernel matrix defined on the input training data can be exploited in order to determine an appropriate ELM space for input data mapping. The ELM space determined from this process can be subsequently used for network training using the original ELM formulation. Experimental results denote that the adoption of the low-rank decomposition-based ELM space determination leads to enhanced performance, when compared to the standard choice, i.e., random input weights generation.
Research output: Contribution to journal › Article › Scientific › peer-review
This paper describes a recently created image database, TID2013, intended for evaluation of full-reference visual quality assessment metrics. With respect to TID2008, the new database contains a larger number (3000) of test images obtained from 25 reference images, 24 types of distortions for each reference image, and 5 levels for each type of distortion. Motivations for introducing 7 new types of distortions and one additional level of distortions are given; examples of distorted images are presented. Mean opinion scores (MOS) for the new database have been collected by performing 985 subjective experiments with volunteers (observers) from five countries (Finland, France, Italy, Ukraine, and USA). The availability of MOS allows the use of the designed database as a fundamental tool for assessing the effectiveness of visual quality. Furthermore, existing visual quality metrics have been tested with the proposed database and the collected results have been analyzed using rank order correlation coefficients between MOS and considered metrics. These correlation indices have been obtained both considering the full set of distorted images and specific image subsets, for highlighting advantages and drawbacks of existing, state of the art, quality metrics. Approaches to thorough performance analysis for a given metric are presented to detect practical situations or distortion types for which this metric is not adequate enough to human perception. The created image database and the collected MOS values are freely available for downloading and utilization for scientific purposes.
Research output: Contribution to journal › Article › Scientific › peer-review
Since video traffic is resource intensive, it is a challenging issue to stream video over low bandwidth networks, whereas video communication over LTE becomes an open research topic nowadays due to LTE’s high throughput capabilities. Indeed, video transmission requires low delay, and wireless channel is time-varying, which result in a scenario when a layer-separated design is replaced by a Cross-Layer Adaptation (CLA) principle. In this paper an efficient analytical model that evaluates the behavior of the downlink LTE channel with CLA is presented. To the best of our knowledge, this is the first time an analytical model using CLA principle has been devised that covers both the transmission process from the eNB to the User Equipment (UE) at the first phase and video decoding process at the UE at the second phase. In order to ensure the cross-layer adaptation in the model, the arrival rate varies based on the received video request, whereas the service probability changes according to the channel quality indicator sent from the UE. In the experimental part the analysis of the main performance measures found from the stationary distribution is conducted.
Research output: Contribution to journal › Article › Scientific › peer-review
A novel adaptive compensation architecture for the frequency response mismatch of 2-channel time-interleaved ADC (TI-ADC) is proposed for developing high-performance self-adaptive systems. The proposed approach overcomes the existing methods in the sense that the TI-ADC mismatch identification can be performed without allocating a region where only the TI-ADC mismatch spurs are present. This is accomplished via mapping the TI-ADC problem into an I/Q mismatch problem which allows deploying complex statistical signal processing. As proof of concept, the compensation architecture is demonstrated and tested on a 16-bit TI-ADC measured hardware data.
Research output: Contribution to journal › Article › Scientific › peer-review
Super Multi-View (SMV) video content is composed of tens or hundreds of views that provide a light-field representation of a scene. This representation allows a glass-free visualization and eliminates many causes of discomfort existing in current available 3D video technologies. Efficient video compression of SMV content is a key factor for enabling future 3D video services. This paper first compares several coding configurations for SMV content and several inter-view prediction structures are also tested and compared. The experiments mainly suggest that large differences in coding efficiency can be observed from one configuration to another. Several ratios for the number of coded and synthesized views are compared, both objectively and subjectively. It is reported that view synthesis significantly affects the coding scheme. The amount of views to skip highly depends on the sequence and on the quality of the associated depth maps. Reported ranges of bitrates required to obtain a good quality for the tested SMV content are realistic and coherent with future 4. K/8. K needs. The reliability of the PSNR metric for SMV content is also studied. Objective and subjective results show that PSNR is able to reflect increase or decrease in subjective quality even in the presence of synthesized views. However, depending on the ratio of coded and synthesized views, the order of magnitude of the effective quality variation is biased by PSNR. Results indicate that PSNR is less tolerant to view synthesis artifacts than human viewers. Finally, preliminary observations are initiated. First, the light-field conversion step does not seem to alter the objective results for compression. Secondly, the motion parallax does not seem to be impacted by specific compression artifacts. The perception of the motion parallax is only altered by variations of the typical compression artifacts along the viewing angle, in cases where the subjective image quality is already low. To the best of our knowledge, this paper is the first to carry out subjective experiments and to report results of SMV compression for light-field 3D displays. It provides first results showing that improvement of compression efficiency is required, as well as depth estimation and view synthesis algorithms improvement, but that the use of SMV appears realistic according to next generation compression technology requirements.
Research output: Contribution to journal › Article › Scientific › peer-review
We provide an overview of matrix and tensor factorization methods from a Bayesian perspective, giving emphasis on both the inference methods and modeling techniques. Factorization based models and their many extensions such as tensor factorizations have proved useful in a broad range of applications, supporting a practical and computationally tractable framework for modeling. Especially in audio processing, tensor models help in a unified manner the use of prior knowledge about signals, the data generation processes as well as available data from different modalities. After a general review of tensor models, we describe the general statistical framework, give examples of several audio applications and describe modeling strategies for key problems such as deconvolution, source separation, and transcription.
Research output: Contribution to journal › Article › Scientific › peer-review
As billions of sensors and smart meters connect to the Internet of Things (IoT), current wireless technologies are taking decisive steps to ensure their sustainable operation. One popular IoT scenario features a smart home service gateway, which becomes the central point of user’s home environment facilitating a multitude of tasks. Given that most IoT devices connected to residential gateway are small-scale and battery-powered, the key challenge is to extend their lifetime without recharging/replacing batteries. To this end, a novel radio technology named Bluetooth low energy (BLE) has recently been completed to enable energy-efficient data transfer. Another inspiring innovation is the capability of sensors to harvest wireless energy in their local environment. In this work, we envision a scenario where many in-home sensors are communicating with a smart gateway over the BLE protocol, while at the same time harvesting RF energy transmitted from the gateway wirelessly via a dedicated radio interface. We thoroughly investigate performance limitations of such wireless energy transfer interface (WETI) with dynamic analytical model and with important practical considerations. Our methodology delivers the upper bound on WETI operation coupled with BLE-based communication, which characterizes ultimate system performance over the class of practical radio and energy resource management algorithms.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, a novel nonlinear subspace learning technique for class-specific data representation is proposed. A novel data representation is obtained by applying nonlinear class-specific data projection to a discriminant feature space, where the data belonging to the class under consideration are enforced to be close to their class representation, while the data belonging to the remaining classes are enforced to be as far as possible from it. A class is represented by an optimized class vector, enhancing class discrimination in the resulting feature space. An iterative optimization scheme is proposed to this end, where both the optimal nonlinear data projection and the optimal class representation are determined in each optimization step. The proposed approach is tested on three problems relating to human behavior analysis: Face recognition, facial expression recognition, and human action recognition. Experimental results denote the effectiveness of the proposed approach, since the proposed class-specific reference discriminant analysis outperforms kernel discriminant analysis, kernel spectral regression, and class-specific kernel discriminant analysis, as well as support vector machine-based classification, in most cases.
Research output: Contribution to journal › Article › Scientific › peer-review
Linear Discriminant Analysis (LDA) and its nonlinear version Kernel Discriminant Analysis (KDA) are well-known and widely used techniques for supervised feature extraction and dimensionality reduction. They determine an optimal discriminant space for (non)linear data projection based on certain assumptions, e.g. on using normal distributions (either on the input or in the kernel space) for each class and employing class representation by the corresponding class mean vectors. However, there might be other vectors that can be used for classes representation, in order to increase class discrimination in the resulted feature space. In this paper, we propose an optimization scheme aiming at the optimal class representation, in terms of Fisher ratio maximization, for nonlinear data projection. Compared to the standard approach, the proposed optimization scheme increases class discrimination in the reduced-dimensionality feature space and achieves higher classification rates in publicly available data sets.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper we propose a novel framework for human action recognition based on Bag of Words (BoWs) action representation, that unifies discriminative codebook generation and discriminant subspace learning. The proposed framework is able to, naturally, incorporate several (linear or non-linear) discrimination criteria for discriminant BoWs-based action representation. An iterative optimization scheme is proposed for sequential discriminant BoWs-based action representation and codebook adaptation based on action discrimination in a reduced dimensionality feature space where action classes are better discriminated. Experiments on five publicly available data sets aiming at different application scenarios demonstrate that the proposed unified approach increases the codebook discriminative ability providing enhanced action classification performance.
Research output: Contribution to journal › Article › Scientific › peer-review
A channelizer is used to separate users or channels in communication systems. A polyphase channelizer is a type of channelizer that uses polyphase filtering to filter, downsample, and downconvert simultaneously. With graphics processing unit (GPU) technology, we propose a novel GPU-based polyphase channelizer architecture that delivers high throughput. This architecture has advantages of providing reduced complexity and optimized parallel processing of many channels, while being configurable via software. This makes our approach and implementation particularly attractive for using GPUs as DSP accelerators for communication systems.
Research output: Contribution to journal › Article › Scientific › peer-review
Modern embedded systems show a clear trend towards the use of Multiprocessor System-on-Chip (MPSoC) architectures in order to handle the performance and power consumption constraints. However, the design and validation of dedicated MPSoCs is an extremely hard and expensive task due to their complexity. Thus, the development of automated design processes is of highest importance to satisfy the time-to-market pressure of embedded systems. This paper proposes an automated co-design flow based on the high-level language-based approach of the Reconfigurable Video Coding framework. The designer provides the application description in the RVC-CAL dataflow language, after which the presented co-design flow automatically generates a network of heterogeneous processors that can be synthesized on FPGA chips. The synthesized processors are Very Long Instruction Word-style processors. Such a methodology permits the rapid design of a many-core signal processing system which can take advantage of all levels of parallelism. The toolchain functionality has been demonstrated by synthesizing an MPEG-4 Simple Profile video decoder to two different FPGA boards. The decoder is realized into 18 processors that decode QCIF resolution video at 45 frames per second on a 50 MHz FPGA clock frequency. The results show that the given application can take advantage of every level of parallelism.
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow languages enable describing signal processing applications in a platform independent fashion, which makes them attractive in today's multiprocessing era. RVC-CAL is a dynamic dataflow language that enables describing complex data-dependent programs such as video decoders. To this date, design automation toolchains for RVC-CAL have enabled creating workstation software, dedicated hardware and embedded application specific multiprocessor implementations out of RVC-CAL programs. However, no solution has been presented for executing RVC-CAL applications on generic embedded multiprocessing platforms. This paper presents a dataflow-based multiprocessor communication model, an architecture prototype that uses it and an automated toolchain for instantiating such a platform and the software for it. The complexity of the platform increases linearly as the number of processors is increased. The experiments in this paper use several instances of the proposed platform, with different numbers of processors. An MPEG-4 video decoder is mapped to the platform and executed on it. Benchmarks are performed on an FPGA board.
Research output: Contribution to journal › Article › Scientific › peer-review
This research investigates a passive wireless antenna sensor designed for strain and crack sensing. When the antenna experiences deformation, the antenna shape changes, causing a shift in the electromagnetic resonance frequency of the antenna. A radio frequency identification (RFID) chip is adopted for antenna signal modulation, so that a wireless reader can easily distinguish the backscattered sensor signal from unwanted environmental reflections. The RFID chip captures its operating power from an interrogation electromagnetic wave emitted by the reader, which allows the antenna sensor to be passive (battery-free). This paper first reports the latest simulation results on radiation patterns, surface current density, and electromagnetic field distribution. The simulation results are followed with experimental results on the strain and crack sensing performance of the antenna sensor. Tensile tests show that the wireless antenna sensor can detect small strain changes lower than 20 με, and can perform well at large strains higher than 10 000 με. With a high-gain reader antenna, the wireless interrogation distance can be increased up to 2.1 m. Furthermore, an array of antenna sensors is capable of measuring the strain distribution in close proximity. During emulated crack and fatigue crack tests, the antenna sensor is able to detect the growth of a small crack.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we present a view-independent action recognition method exploiting a low computational-cost volumetric action representation. Binary images depicting the human body during action execution are accumulated in order to produce the so-called action volumes. A novel time-invariant action representation is obtained by exploiting the circular shift invariance property of the magnitudes of the Discrete Fourier Transform coefficients. The similarity of an action volume with representative action volumes is exploited in order to map it to a lower-dimensional feature space that preserves the action class properties. Discriminant learning is, subsequently, employed for further dimensionality reduction and action class discrimination. By using such an action representation, the proposed approach performs fast action recognition. By combining action recognition results coming from different view angles, high recognition rates are obtained. The proposed method is extended to interaction recognition, i.e., to human action recognition involving two persons. The proposed approach is evaluated on a publicly available action recognition database using experimental settings simulating situations that may appear in real-life applications, as well as on a new nutrition support action recognition database.
Research output: Contribution to journal › Article › Scientific › peer-review
As the variety of off-the-shelf processors expands, traditional implementation methods of systems for digital signal processing and communication are no longer adequate to achieve design objectives in a timely manner. There is a necessity for designers to easily track the changes in computing platforms, and apply them efficiently while reusing legacy code and optimized libraries that target specialized features in single processing units. In this context, we propose an integration workflow to schedule and implement Software Defined Radio (SDR) protocols that are developed using the GNU Radio environment on heterogeneous multiprocessor platforms. We show how to utilize Single Instruction Multiple Data (SIMD) units provided in Graphics Processing Units (GPUs) along with vector accelerators implemented in General Purpose Processors (GPPs). We augment a popular SDR framework (i.e, GNU Radio) with a library that seamlessly allows offloading of algorithm kernels mapped to the GPU without changing the original protocol description. Experimental results show how our approach can be used to efficiently explore design spaces for SDR system implementation, and examine the overhead of the integrated backend (software component) library.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we propose a novel method that performs dynamic action classification by exploiting the effectiveness of the Extreme Learning Machine (ELM) algorithm for single hidden layer feedforward neural networks training. It involves data grouping and ELM based data projection in multiple levels. Given a test action instance, a neural network is trained by using labeled action instances forming the groups that reside to the test sample's neighborhood. The action instances involved in this procedure are, subsequently, mapped to a new feature space, determined by the trained network outputs. This procedure is performed multiple times, which are determined by the test action instance at hand, until only a single class is retained. Experimental results denote the effectiveness of the dynamic classification approach, compared to the static one, as well as the effectiveness of the ELM in the proposed dynamic classification setting.
Research output: Contribution to journal › Article › Scientific › peer-review
RVC-CAL is an actor-based dataflow language that enables concurrent, modular and portable description of signal processing algorithms. RVC-CAL programs can be compiled to implementation languages such as C/C++ and VHDL for producing software or hardware implementations. This paper presents a methodology for automatic discovery of piecewise-deterministic (quasi-static) execution schedules for RVC-CAL program software implementations. Quasi-static scheduling moves computational burden from the implementable run-time system to design-time compilation and thus enables making signal processing systems more efficient. The presented methodology divides the RVC-CAL program into segments and hierarchically detects quasi-static behavior from each segment: first at the level of actors and later at the level of the whole segment. Finally, a code generator creates a quasi-statically scheduled version of the program. The impact of segment based quasi-static scheduling is demonstrated by applying the methodology to several RVC-CAL programs that execute up to 58 % faster after applying the presented methodology.
Research output: Contribution to journal › Article › Scientific › peer-review
In recent work, a graphical modeling construct called "topological patterns" has been shown to enable concise representation and direct analysis of repetitive dataflow graph sub-structures in the context of design methods and tools for digital signal processing systems (Sane et al. 2010). In this paper, we present a formal design method for specifying topological patterns and deriving parameterized schedules from such patterns based on a novel schedule model called the scalable schedule tree. The approach represents an important class of parameterized schedule structures in a form that is intuitive for representation and efficient for code generation. Through application case studies involving image processing and wireless communications, we demonstrate our methods for topological pattern representation, scalable schedule tree derivation, and associated dataflow graph code generation.
Research output: Contribution to journal › Article › Scientific › peer-review
Development of multimedia systems that can be targeted to different platforms is challenging due to the need for rigorous integration between high-level abstract modeling, and low-level synthesis and optimization. In this paper, a new dataflow-based design tool called the targeted dataflow interchange format is introduced for retargetable design, analysis, and implementation of embedded software for multimedia systems. Our approach provides novel capabilities, based on principles of task-level dataflow analysis, for exploring and optimizing interactions across design components; object-oriented data structures for encapsulating contextual information for components; a novel model for representing parameterized schedules that are derived from repetitive graph structures; and automated code generation for programming interfaces and low-level customizations that are geared toward high-performance embedded-processing architectures. We demonstrate our design tool for cross-platform application design, parameterized schedule representation, and associated dataflow graph-code generation using a case study centered around an image registration application.
Research output: Contribution to journal › Article › Scientific › peer-review
In recent years, parameterized dataflow has evolved as a useful framework for modeling synchronous and cyclo-static graphs in which arbitrary parameters can be changed dynamically. Parameterized dataflow has proven to have significant expressive power for managing dynamics of DSP applications in important ways. However, efficient hardware synthesis techniques for parameterized dataflow representations are lacking. This paper addresses this void; specifically, the paper investigates efficient field programmable gate array (FPGA)-based implementation of parameterized cyclo-static dataflow (PCSDF) graphs. We develop a scheduling technique for throughput-constrained minimization of dataflow buffering requirements when mapping PCSDF representations of DSP applications onto FPGAs. The proposed scheduling technique is integrated with an existing formal schedule model, called the generalized schedule tree, to reduce schedule cost. To demonstrate our new, hardware-oriented PCSDF scheduling technique, we have designed a real-time base station emulator prototype based on a subset of long-term evolution (LTE), which is a key cellular standard.
Research output: Contribution to journal › Article › Scientific › peer-review
Video coding technology in the last 20 years has evolved producing a variety of different and complex algorithms and coding standards. So far the specification of such standards, and of the algorithms that build them, has been done case by case providing monolithic textual and reference software specifications in different forms and programming languages. However, very little attention has been given to provide a specification formalism that explicitly presents common components between standards, and the incremental modifications of such monolithic standards. The MPEG Reconfigurable Video Coding (RVC) framework is a new ISO standard currently under its final stage of standardization, aiming at providing video codec specifications at the level of library components instead of monolithic algorithms. The new concept is to be able to specify a decoder of an existing standard or a completely new configuration that may better satisfy application-specific constraints by selecting standard components from a library of standard coding algorithms. The possibility of dynamic configuration and reconfiguration of codecs also requires new methodologies and new tools for describing the new bitstream syntaxes and the parsers of such new codecs. The RVC framework is based on the usage of a new actor/ dataflow oriented language called CAL for the specification of the standard library and instantiation of the RVC decoder model. This language has been specifically designed for modeling complex signal processing systems. CAL dataflow models expose the intrinsic concurrency of the algorithms by employing the notions of actor programming and dataflow. The paper gives an overview of the concepts and technologies building the standard RVC framework and the non standard tools supporting the RVC model from the instantiation and simulation of the CAL model to software and/or hardware code synthesis.
Research output: Contribution to journal › Article › Scientific › peer-review
The upcoming Reconfigurable Video Coding (RVC) standard from MPEG (ISO / IEC SC29WG11) defines a library of coding tools to specify existing or new compressed video formats and decoders. The coding tool library has been written in a dataflow/actor-oriented language named CAL. Each coding tool (actor) can be represented with an extended finite state machine and the data communication between the tools are described as dataflow graphs. This paper proposes an approach to model the CAL actor network with Parameterized Synchronous Data Flow and to derive a quasi-static multiprocessor execution schedule for the system. In addition to proposing a scheduling approach for RVC, an extension to the well-known permutation flow shop scheduling problem that enables rapid run-time scheduling of RVC tasks, is introduced.
Research output: Contribution to journal › Article › Scientific › peer-review
Electroencephalography is a non-invasive imaging modality in which a primary current density generated by the neural activity in the brain is to be reconstructed based on external electric potential measurements. This paper focuses on the finite element method (FEM) from both forward and inverse aspects. The goal is to establish a clear correspondence between the lowest order Raviart-Thomas basis functions and dipole sources as well as to show that the adopted FEM approach is computationally effective. Each basis function is associated with a dipole moment and a location. Four candidate locations are tested. Numerical experiments cover two different spherical multilayer head models, four mesh resolutions and two different forward simulation approaches, one based on FEM and another based on the boundary element method (BEM) with standard dipoles as sources. The forward simulation accuracy is examined through column- and matrix-wise relative errors as well as through performance in inverse dipole localization. A closed-form approximation of dipole potential was used as the reference forward simulation. The present approach is compared to the BEM and indirectly also to the recent FEM-based subtraction approach regarding both accuracy, computation time and accessibility of implementation.
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow descriptions have been used in a wide range of Digital Signal Processing (DSP) applications, such as multi-media processing, and wireless communications. Among various forms of dataflow modeling, Synchronous Dataflow (SDF) is geared towards static scheduling of computational modules, which improves system performance and predictability. However, many DSP applications do not fully conform to the restrictions of SDF modeling. More general dataflow models, such as CAL (Eker and Janneck 2003), have been developed to describe dynamically-structured DSP applications. Such generalized models can express dynamically changing functionality, but lose the powerful static scheduling capabilities provided by SDF. This paper focuses on the detection of SDF-like regions in dynamic dataflow descriptions-in particular, in the generalized specification framework of CAL. This is an important step for applying static scheduling techniques within a dynamic dataflow framework. Our techniques combine the advantages of different dataflow languages and tools, including CAL (Eker and Janneck 2003), DIF (Hsu et al. 2005) and CAL2C (Roquier et al. 2008). In addition to detecting SDF-like regions, we apply existing SDF scheduling techniques to exploit the static properties of these regions within enclosing dynamic dataflow models. Furthermore, we propose an optimized approach for mapping SDF-like regions onto parallel processing platforms such as multi-core processors.
Research output: Contribution to journal › Article › Scientific › peer-review
Dimensionality reduction is one of the basic operations in the toolbox of data analysts and designers of machine learning and pattern recognition systems. Given a large set of measured variables but few observations, an obvious idea is to reduce the degrees of freedom in the measurements by representing them with a smaller set of more condensed variables. Another reason for reducing the dimensionality is to reduce computational load in further processing. A third reason is visualization.
Research output: Contribution to journal › Article › Scientific › peer-review
Tools for designing signal processing systems with their semantic foundation in dataflow modeling often use high-level graphical user interfaces (GUIs) or text based languages that allow specifying applications as directed graphs. Such graphical representations serve as an initial reference point for further analysis and optimizations that lead to platform-specific implementations. For large-scale applications, the underlying graphs often consist of smaller substructures that repeat multiple times. To enable more concise representation and direct analysis of such substructures in the context of high level DSP specification languages and design tools, we develop the modeling concept of topological patterns, and propose ways for supporting this concept in a high-level language. We augment the dataflow interchange format (DIF) language-a language for specifying DSP-oriented dataflow graphs-with constructs for supporting topological patterns, and we show how topological patterns can be effective in various aspects of embedded signal processing design flows using specific application examples.
Research output: Contribution to journal › Article › Scientific › peer-review
The additive white Gaussian noise (AWGN) model is ubiquitous in signal processing. This model is often justified by central-limit theorem (CLT) arguments. However, whereas the CLT may support a Gaussian distribution for the random errors, it does not provide any justification for the assumed additivity and whiteness. As a matter of fact, data acquired in real applications can seldom be described with good approximation by the AWGN model, especially because errors are typically correlated and not additive. Failure to model accurately the noise leads to inaccurate analysis, ineffective filtering, and distortion or even failure in the estimation. This chapter provides an introduction to both signal-dependent and correlated noise and to the relevant models and basic methods for the analysis and estimation of these types of noise. Generic one-parameter families of distributions are used as the essential mathematical setting for the observed signals. The distribution families covered as leading examples include Poisson, mixed Poisson–Gaussian, various forms of signal-dependent Gaussian noise (including multiplicative families and approximations of the Poisson family), as well as doubly censored heteroskedastic Gaussian distributions. We also consider various forms of noise correlation, encompassing pixel and readout cross-talk, fixed-pattern noise, column/row noise, etc., as well as related issues like photo-response and gain nonuniformity. The introduced models and methods are applicable to several important imaging scenarios and technologies, such as raw data from digital camera sensors, various types of radiation imaging relevant to security and to biomedical imaging.
Research output: Chapter in Book/Report/Conference proceeding › Chapter › Scientific › peer-review
In this chapter, we discuss the state of the art and future challenges in adaptive stream mining systems for computer vision. Adaptive stream mining in this context involves the extraction of knowledge from image and video streams in real-time, and from sources that are possibly distributed and heterogeneous. With advances in sensor and digital processing technologies, we are able to deploy networks involving large numbers of cameras that acquire increasing volumes of image data for diverse applications in monitoring and surveillance. However, to exploit the potential of such extensive networks for image acquisition, important challenges must be addressed in efficient communication and analysis of such data under constraints on power consumption, communication bandwidth, and end-to-end latency. We discuss these challenges in this chapter, and we also discuss important directions for research in addressing such challenges using dynamic, data-driven methodologies.
Research output: Chapter in Book/Report/Conference proceeding › Chapter › Scientific › peer-review
The emerging 5G New Radio (NR) networks are expected to enable huge improvements, e.g., in terms of capacity, number of connected devices, peak data rates and latency, compared to existing networks. At the same time, a new trend referred to as the RF convergence is aiming to jointly integrate communications and sensing functionalities into the same systems and hardware platforms. In this paper, we investigate the sensing prospects of 5G NR systems, with particular emphasis on the user equipment side and their potential for joint communications and environment mapping. To this end, a radio-based sensing approach utilizing the 5G NR uplink transmit signal and an efficient receiver processing and mapping scheme are proposed. An indoor scenario is then studied and evaluated through real-world RF measurements at 28 GHz mm-wave band, showing that impressive mapping performance can be achieved by the proposed system. The measurement data is available at a permanent open repository.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Positioning is considered one of the most important features and enabler of various novel industry verticals in future radio systems. Since path loss or received signal strength-based measurements are widely available and accessible in the majority of wireless standards, path loss-based positioning has an important role among other positioning technologies. Conventionally path loss-based positioning has two phases; i) fitting a path loss model to training data, if such training data is available, and ii) determining link distance estimates based on the path loss model and calculating the position estimate. However, in both phases, the maximum measurable path loss is limited by measurement noise. Such immeasurable samples are called censored path loss data and such noisy data is commonly neglected in both the model fitting and in the positioning phase. In the case of censored path loss, the loss is known to be above a known threshold level and that information can be used in model fitting as well as in the positioning phase. In this paper, we examine and propose how to use censored path loss data in path loss model-based positioning and demonstrate with simulations the potential of the proposed approach for considerable improvements (over 30%) in positioning accuracy.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Future 5G networks will serve both terrestrial and aerial users, thanks to their network slicing and flexible numerology capabilities. The probability of Line-of-Sight (LoS) propagation will be intuitively higher for aerial users than for terrestrial users and this will provide a trade-off between increased capacity and increased interference. Our paper analyzes theoretically this trade-off and proposes solutions based on downlink multiantenna beamorming and joint optimization of the signal-to- interference ratio of multiple aerial users. It is shown that Multiple-Input-Single-Output solutions offer the most convenient tradeoff between complexity and capacity/interference performance. Simulation results are provided for mm Wave bands and low-altitude aerial vehicles.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Powerful in-band interference can saturate a receiver's front-end and limit the usefulness of digital interference suppression methods that are bounded by the receiver's limited dynamic range. This is especially true for the self-interference (SI) encountered in full-duplex (FD) radios, but also in the case of strong interference between co-located radios. However, unlike in FD radios, receivers co-located with interference sources do not typically have direct access to the transmitted interference. This work analyzes the performance of a digitally-assisted analog interference mitigation method and its implementation for the suppression of frequency-modulated (FM) interference before quantization in global navigation satellite system (GNSS) receivers that are co-located with interference sources. Over-the-air measurement results are presented that illustrate the effects of interference mitigation on GPS L1 and Galileo E1 reception in a commercial off-the-shelf GNSS receiver and a software-defined GNSS receiver. The analysis covers the effects of the interference mitigation on the radio frequency (RF) front-end, acquisition, tracking, and positioning stages.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Deep learning models are capable of achieving state-of-the-art performance on a wide range of time series analysis tasks. However, their performance crucially depends on the employed normalization scheme, while they are usually unable to efficiently handle non-stationary features without first appropriately pre-processing them. These limitations impact the performance of deep learning models, especially when used for forecasting financial time series, due to their non-stationary and multimodal nature. In this paper we propose a data-driven adaptive normalization layer which is capable of learning the most appropriate normalization scheme that should be applied on the data. To this end, the proposed method first identifies the distribution from which the data were generated and then it dynamically shifts and scales them in order to facilitate the task at hand. The proposed nor-malization scheme is fully differentiable and it is trained in an end-to-end fashion along with the rest of the parameters of the model. The proposed method leads to significant performance improvements over several competitive normalization approaches, as demonstrated using a large-scale limit order book dataset.
EXT="Tefas, Anastasios"
EXT="Iosifidis, Alexandros"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Fog computing brings computation and services to the edge of networks enabling real time applications. In order to provide satisfactory quality of experience, the latency of fog networks needs to be minimized. In this paper, we consider a peer computation offloading problem for a fog network with unknown dynamics. Peer competition occurs when different fog nodes offload tasks to the same peer FN. In this paper, the computation offloading problem is modeled as a sequential FN selection problem with delayed feedback. We construct an online learning policy based on the adversary multi-arm bandit framework to deal with peer competition and delayed feedback. Simulation results validate the effectiveness of the proposed policy.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, the potential of extending 5G New Radio physical layer solutions to support communications in sub-THz frequencies is studied. More specifically, we introduce the status of third generation partnership project studies related to operation on frequencies beyond 52.6 GHz and note also the recent proposal on spectrum horizons provided by federal communications commission (FCC) related to experimental licenses on 95 GHz-3 THz frequency band. Then, we review the power amplifier (PA) efficiency and output power challenge together with the increased phase noise (PN) distortion effect in terms of the supported waveforms. As a practical example on the waveform and numerology design from the perspective of the PN robustness, link performance results using 90 GHz carrier frequency are provided. The numerical results demonstrate that new, higher subcarrier spacings are required to support high throughput, which requires larger changes in the physical layer design. It is also observed that new phase-tracking reference signal designs are required to make the system robust against PN. The results illustrate that single-carrier frequency division multiple access is significantly more robust against PN and can provide clearly larger PA output power than cyclic-prefix orthogonal frequency division multiplexing, and is therefore a highly potential waveform for sub-THz communications.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this study, machine washing durability of working glove-integrated passive RFID tags is evaluated. These glove-tags are embedded inside 3D-printed thermoplastic polyurethane platforms. The results are compared to platforms embedded inside brush-painted encapsulant platforms. For a preliminary washing reliability evaluation, both types of glove- integrated platforms are washed in a washing machine for 5 times. Although both platforms can protect glove-tags from the effects of water, the main reliability challenge is found to be the fragile antenna-IC attachments. This paper introduces the two platform materials and the achieved washing test results. These preliminary results determine the future direction of this research: The next step is to study suitable methods to strengthen the interconnections, so that these glove-tags can survive the harsh environment inside a washing machine.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We present a headband loop antenna for wireless power transfer to multiple IMDs located in the cranial cavity at the depth of 10 mm from the skin. We characterize the wireless power transfer link in terms of the power gain and the power delivered to the IMD, when maximum SAR compliant transmission power is fed to the headband antenna at frequency of 5 MHz. We also consider two types of the misalignments i.e. lateral and angular, between the IMD antenna and the headband antenna and discuss their impact on the transducer gain, impedance matching and on the power delivered to the IMD.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We present a compact circularly polarized (CP) antenna for wearable passive UHF RFID tags. The antenna is a square-shaped microstrip patch antenna where we have applied corner truncation and slotting techniques in the top layer conductor for achieving the CP property and a shorting pin and loop structure for impedance matching. Despite using a lowpermittivity textile as antenna substrate, the antenna's footprint size is only 5-by-5 cm, which is approximately 15% of the operating wavelength. At the same time, the on-body measurements, the antenna's axial ratio is 0.9 dB and the measured attainable read range (reader's EIRP =3.28W) of the tag reaches 4.2 meters with a CP reader antenna and ranges from 2.9 meters to 3.4 meters for a linear reader antenna, depending on the rotation angle between the antennas.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The beyond-5G vehicular communications are expected not only to utilize the already explored millimeter-wave band but also to start harnessing the higher frequencies above 100 GHz ultimately targeting the so-called low terahertz band, 300 GHz-1 THz. In this paper, we perform a set of propagation measurements at 300 GHz band in representative vehicular environments. Particularly, we report on the reflection losses from the front, rear, and side of a regular vehicle. In addition, the penetration losses when propagating through, over, and under the vehicle are presented. Our study reveals that the vehicle body is extremely heterogeneous in terms of the propagation losses: the attenuation heavily depends on the trajectory of the 300 GHz signal through the vehicle. The reported measurement data may be used as a reference when developing the vehiclespecific channel and interference models for future wireless communications in the low terahertz band.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The interior structures of the comets and asteroids, still poorly known, might hold a unique key to understand the early Solar System. Considering the interaction of an illuminated electromagnetic wave with this kind of targets, these 'objects' are very large compared to the applicable wavelength. Consequently, tomographic imaging of such targets, i.e., reconstructing their interior structure via multiple measurements, constitutes a challenging inverse problem. To reach this objective and to develop and test inverse algorithms, we need to investigate electromagnetic fields that have interacted with structures analogous to real asteroids and comets. In this study, we focus on the acquisition of these fields considering three methods: calculated fields obtained with (1) time and (2) frequency domain methods and (3) microwave measurements performed for an analogue model, i.e., a small-scale asteroid model.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents the first prototype of a passive RFID-based textile touchpad. Our unique solution takes advantage of ICs from passive UHF RFID technology. These components are combined into a textile-integrated IC array, which can be used for handwritten character recognition. As the solution is fully passive and gets all the needed energy from the RFID reader, it enables a maintenance-free and cost-effective user interface that can be integrated into clothing and into textiles around us.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Deploying sub-THz frequencies for mobile communications is one timely research area, due to the availability of very wide and contiguous chunks of the radio spectrum. However, at such extremely high frequencies, there are large challenges related to, e.g., phase noise, propagation losses as well as to energy-efficiency, since generating and radiating power with reasonable efficiency is known to be far more difficult than at lower frequencies. To address the energy-efficiency and power amplifier (PA) nonlinear distortion related challenges, modulation methods and waveforms with low peak-to-average-power ratio (PAPR) are needed. To this end, a new modulation approach is formulated and proposed in this paper, referred to as constrained phase-shift keying (CPSK). The CPSK concept builds on the traditional PSK constellations, while additional constraints are applied to the time domain symbol transitions in order to control and reduce the PAPR of the resulting waveform. This new modulation is then compared with pulse-shaped π/2-BPSK and ordinary QPSK, in the discrete Fourier transform (DFT) spread orthogonal frequency division multiplexing (DFT-s-OFDM) context, in terms of the resulting PAPR distributions and the achievable maximum PA output power, subject to constraints in the passband waveform quality and out-of-band emissions. The obtained results show that the proposed CPSK approach allows for reducing the PAPR and thereon for achieving higher PA output powers, compared to QPSK, while still offering the same spectral efficiency. Overall, the CPSK concept offers a flexible modulation solution with controlled PAPR for the future sub-THz networks.
JUFOID=88220
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
There is a strong interest in utilizing commercial cellular networks to support unmanned aerial vehicles (UAVs) to send control commands and communicate heavy traffic. Cellular networks are well suited for offering reliable and secure connections to the UAVs as well as facilitating traffic management systems to enhance safe operation. However, for the full-scale integration of UAVs that perform critical and high-risk tasks, more advanced solutions are required to improve wireless connectivity in mobile networks. In this context, integrated access and backhaul (IAB) is an attractive approach for the UAVs to enhance connectivity and traffic forwarding. In this paper, we study a novel approach to dynamic associations based on reinforcement learning at the edge of the network and compare it to alternative association algorithms. Considering the average data rate, our results indicate that the reinforcement learning methods improve the achievable data rate. The optimal parameters of the introduced algorithm are highly sensitive to the donor next generation node base (DgNB) and UAV IAB node densities, and need to be identified beforehand or estimated via a stateful search. However, its performance nearly converges to that of the ideal scheme with a full knowledge of the data rates in dense deployments of DgNBs.
JUFOID=88220
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The prospective millimeter-wave (mmWave) networks are envisioned to heavily utilize relay nodes to improve their performance in certain scenarios. In addition to the stationary mmWave relays already considered by 3GPP as one of the main focuses, the community recently started to explore the use of unmanned aerial vehicle (UAV)-based mmWave relays. These aerial nodes provide greater flexibility in terms of the relay placement in different environments as well as the ability to optimize the deployment height thus maximizing the cell performance. At the same time, the use of UAV-based relays leads to additional deployment complexity and expenditures for the network operators. In this paper, taking into account 3GPP-standardized mmWave-specific propagation, blockage, and resource allocation we compare the capacity gains brought by the static and the UAV-based mmWave relays in different scenarios. For each of the relay types, we investigate both uniform and clustered distribution of human users. The developed mathematical framework and a numerical study reveal that the highest capacity gains when utilizing the UAV-based relays instead of the static ones are observed in clustered deployments (up to 31%), while the performance difference between the UAV-based and the static mmWave relays under a uniform distribution of users is just 3%.
JUFOID=88220
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Cortical spreading depression (CSD) is a slowly propagating wave of depolarization of brain cells, followed by temporary silenced electrical brain activity. Major structural changes during CSD are linked to neuronal and possibly glial swelling. However, basic questions still remain unanswered. In particular, there are open questions regarding whether neurons or glial cells swell more, and how the cellular swelling affects the CSD wave propagation.In this study, we computationally explore how different parameters affect the swelling of neurons and astrocytes (starshaped glial cells) during CSD and how the cell swelling alters the CSD wave spatial distribution. We apply a homogenized mathematical model that describes electrodiffusion in the intraand extracellular space, and discretize the equations using a finite element method. The simulations are run with a twocompartment (extracellular space and neurons) and a threecompartment version of the model with astrocytes added. We consider cell swelling during CSD in four scenarios: (A) incorporating aquaporin-4 channels in the astrocytic membrane, (B) increasing the neuron/astrocyte ratio to 2:1, (C) blocking and increasing the Na+/K+-ATPase rate in the astrocytic compartment, and (D) blocking the Cl- channels in astrocytes. Our results show that increasing the water permeability in the astrocytes results in a higher astrocytic swelling and a lower neuronal swelling than in the default case. Further, elevated neuronal density increases the swelling in both neurons and astrocytes. Blocking the Na+/K+-ATPase in the astrocytes leads to an increased wave width and swelling in both compartments, which instead decreases when the pump rate is raised. Blocking the Cl- channels in the astrocytes results in neuronal swelling, and a shrinkage in the astrocytes. Our results suggest a supporting role of astrocytes in preventing cellular swelling and CSD, as well as highlighting how dysfunctions in astrocytes might elicit CSD.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Providing sufficient mobile coverage during mass public events or critical situations is a highly challenging task for the network operators. To fulfill the extreme capacity and coverage demands within a limited area, several augmenting solutions might be used. Among them, novel technologies like a fleet of compact base stations mounted on Unmanned Aerial Vehicles (UAVs) are gaining momentum because of their time- and cost- efficient deployment. Despite the fact that the concept of aerial wireless access networks has been investigated recently in many research studies, there are still numerous practical aspects that require further understanding and extensive evaluation. Taking this as a motivation, in this paper, we develop the concept of continuous wireless coverage provisioning by the means of UAVs and assess its usability in mass scenarios with thousands of users. With our system-level simulations as well as a measurement campaign, we take into account a set of important parameters including weather conditions, UAV speed, weight, power consumption, and millimeter- wave (mmWave) antenna configuration. As a result, we provide more realistic data about the performance of the access and backhaul links together with the practical lessons learned about the design and real-world applicability of the UAV-enabled wireless access networks.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The limitations of state-of-the-art cellular modems prevent achieving low-power and low-latency Machine Type Communications (MTC) based on current power saving mechanisms alone. Recently, the concept of wake-up scheme has been proposed to enhance battery lifetime of 5G devices, while reducing the buffering delay. The existing wake-up algorithms use static operational parameters that are determined by the radio access network at the start of the userâ™s session. In this paper, the average power consumption of the wake-up enabled MTC UE is modeled by using a semi-Markov process and then optimized through a delay-constrained optimization problem, by which the optimal wake-up cycle is obtained in closed form. Numerical results show that the proposed solution reduces the power consumption of an optimized Discontinuous Reception (DRX) scheme by up to 40% for a given delay requirement.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper explores the activity of coding with smart toy robots Dash and Botley as a part of playful learning in the Finnish early education context. The findings of our study demonstrate how coding with the two toy robots was approached, conducted and played by Finnish preschoolers aged 5-6 years. The main conclusion of the study is that preschoolers used the toy robots with affordances related to coding mainly in developing gamified play around them by designing tracks for the toys, programming the toys to solve obstacle paths, and competing in player-generated contests of dexterity, speed and physically mobile play.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
To meet the prospective demands of intelligent transportation systems (ITS), the Release 14 (Rel-14) and Rel-15 of the Long Term Evolution (LTE) specifications include solutions for enhanced vehicle-to-everything (V2X) communications. While the technical enablers of Rel-14 are suitable for delivering basic safety messages, Rel-15 supports more demanding ITS services with stringent latency and reliability. Starting in Rel-15 and continuing in Rel-16, the 3GPP was developing a novel radio interface for 5G systems, termed the New Radio (NR), which will enable ultra reliable and low latency communications suitable even for the most demanding ITS applications. In this paper, we overview the new V2X-specific features in Rel-15 and Rel-16. Further, we argue that future V2X and automotive radar systems may reuse common equipment, such as millimeter-wave antenna arrays. We finally discuss the vision of joint vehicular communications and radar sensing as well as characterize unified channel access for millimeter-wave vehicular communications and radar sensing.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Due to the closely-spaced antenna elements in largearray or massive MIMO transmitters, antenna crosstalk is inevitable. This imposes additional challenges when seeking to linearize the power amplifiers at the transmitter through digital predistortion (DPD). In the commonly applied indirect learning architecture (ILA), the antenna crosstalk is known to result in a large amount of additional basis functions (BFs) in order to account for all the coupling signal terms and achieve good linearization. In this article, we propose a novel closed-loop DPD architecture and associated parameter learning algorithms that can provide efficient linearization of digital MIMO transmitters under antenna crosstalk. The proposed solution does not need extra basis functions, and is thus shown to provide large benefits in terms of computational complexity compared to existing state-of-the-art. Comprehensive numerical results are also provided, showing excellent linearization performance outperforming the existing reference methods.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We briefly introduce two submissions to the Illumination Estimation Challenge, in the Int'l Workshop on Color Vision, affiliated to the 11th Int'l Symposium on Image and Signal Processing and Analysis. The fourier-transform-based submission is ranked 3rd, and the statistical Gray-pixel-based one ranked 6th.
EXT="Chen, Ke"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Graphics Processing Units (GPU) have been widely used in various fields of scientific computing, such as in signal processing. GPUs have a hierarchical memory structure with memory layers that are shared between GPU processing elements. Partly due to the complex memory hierarchy, GPU programming is non-Trivial, and several aspects must be taken into account, one being memory access patterns. One of the fastest GPU memory layers, shared memory, is grouped into banks to enable fast, parallel access for processing elements. Unfortunately, it may happen that multiple threads of a GPU program may access the same shared memory bank simultaneously causing a bank conflict. If this happens, program execution slows down as memory accesses have to be rescheduled to determine which instruction to execute first. Bank conflicts are not taken into account automatically by the compiler, and hence the programmer must detect and deal with them prior to program execution. In this paper, we present an algebraic approach to detect bank conflicts and prove some theoretical results that can be used to predict when bank conflicts happen and how to avoid them. Also, our experimental results illustrate the savings in computation time.
INT=comp,"Ferranti, Luca"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The Internet of Things (IoT) enables long-range outdoor networks, such as smart grid and municipal lighting, as well as short-range indoor systems for smart homes, residential security, and energy management. Wireless connectivity and standardized communication protocols become an essential technology baseline for these diverse IoT applications. The focus of this work is wireless connectivity for smart metering systems. One of the recent protocols in this field is Wireless M-BUS, which is being widely utilized for remote metering applications across Europe. Therefore, in this paper, we detail a novel multi-platform framework designed to serve as a data generator for the protocol in question. The developed software allows to construct Wireless M-Bus telegrams with a high level of detail according to the EN 13757-4 specification and then schedule them for periodic transmission. The evaluation of the data generator is done in real scenario by using previously developed prototype equipped with IQRF TR72DA communication module acting as a smart meter with implemented software framework. As a result, the evaluation of communication distance between the developed Wireless MBus prototype and commercial gateway was tested in case of indoor scenario at Brno University of Technology, Faculty of Electrical Engineering and Communication.
EXT="Stusek, Martin"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Narrowband IoT (NB-IoT) stands for a radio access technology standardized by the 3GPP organization in Release 13 to enable a large set of use-cases for massive Machine-type Communications (mMTCs). Compared to legacy human-oriented 4G (LTE) communication systems, NB-IoT has game-changing features in terms of extended coverage, enhanced power saving modes, and a reduced set of available functionality. At the end of the day, these features allow for connectivity of devices in challenging positions, enabling long battery life and reducing device complexity. This article addresses the development of the universal testing device for delay-tolerant services allowing for in-depth verification of NB-IoT communication parameters. The presented outputs build upon our long-term cooperation with the Vodafone Czech Republic a.s. company.
EXT="Stusek, Martin"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper we discuss how does the input magnitude data setting influence the behavior of error-reduction algorithm in the case of the one-dimensional discrete phase retrieval problem. We present experimental results related to the convergence or stagnation of the algorithm. We also discuss the issue of the zeros distribution of the solution, when the solution of the problem exists.
EXT="Rusu, Corneliu"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
360° videos are increasingly used for media and entertainment, but the best practices for editing them are not yet well established. In this paper, we present a study in which we investigated the user experience of 360° music videos viewed on computer monitor and VR goggles. The research was conducted in the form of a laboratory experiment with 20 test participants. During the within-subject study, participants watched and evaluated four versions of the same 360° music video with a different cutting rate. Based on the results, an average cutting rate of 26 seconds delivered the highest-quality user experience both for computer monitor and VR goggles. The cutting rate matched with participants' mental models, and there was enough time to explore the environment without getting bored. Faster cutting rates made the users nervous, and a video consisting of a single shot was considered to be too static and boring.
jufoid=58079
EXT="Holm, Jukka"
INT=comp,"Remans, Mohammad Mushfiqur Rahman"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation. It consists of three parts: a) the usage of long-short-term-memory (LSTM) networks instead of their bidirectional variant used in the original work, b) using a short synthesis window (here 8 ms) required for low-latency operation, and, c) using a buffer in the beginning of audio mixture to estimate cluster centres corresponding to constituent speakers which are then utilized to separate speakers within the rest of the signal. The buffer duration would serve as an initialization phase after which the system is capable of operating with 8 ms algorithmic latency. We evaluate our proposed approach on two-speaker mixtures from Wall Street Journal (WSJ0) corpus. We observe that the use of LSTM yields around one dB lower SDR as compared to the baseline bidirectional LSTM in terms of source to distortion ratio (SDR). Moreover, using an 8 ms synthesis window instead of 32 ms degrades the separation performance by around 2.1 dB as compared to the baseline. Finally, we also report separation performance with different buffer durations noting that separation can be achieved even for buffer duration as low as 300 ms.
int=comp,"Wang, Shanshan"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The Time Difference of Arrival (TDoA) of a sound wavefront impinging on a microphone pair carries spatial information about the source. However, captured speech typically contains dynamic non-speech interference sources and noise. Therefore, the TDoA estimates fluctuate between speech and interference. Deep Neural Networks (DNNs) have been applied for Time-Frequency (TF) masking for Acoustic Source Localization (ASL) to filter out non-speech components from a speaker location likelihood function. However, the type of TF mask for this task is not obvious. Secondly, the DNN should estimate the TDoA values, but existing solutions estimate the TF mask instead. To overcome these issues, a direct formulation of the TF masking as a part of a DNN-based ASL structure is proposed. Furthermore, the proposed network operates in an online manner, i.e., producing estimates frame-by-frame. Combined with the use of recurrent layers it exploits the sequential progression of speaker related TDoAs. Training with different microphone spacings allows model re-use for different microphone pair geometries in inference. Real-data experiments with smartphone recordings of speech in interference demonstrate the network's generalization capability.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The energy efficiency of modern MPSoCs is enhanced by complex hardware features such as Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management (DPM). This paper introduces a new method, based on convex problem solving, that determines the most energy efficient operating point in terms of frequency and number of active cores in an MPSoC. The solution can challenge the popular approaches based on never-idle (or As-Slow-As-Possible (ASAP)) and race-to-idle (or As-Fast-As-Possible (AFAP)) principles. Experimental data are reported using a Samsung Exynos 5410 MPSoC and show a reduction in energy of up to 27 % when compared to ASAP and AFAP.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Developing accurate financial analysis tools can be useful both for speculative trading, as well as for analyzing the behavior of markets and promptly responding to unstable conditions ensuring the smooth operation of the financial markets. This led to the development of various methods for analyzing and forecasting the behaviour of financial assets, ranging from traditional quantitative finance to more modern machine learning approaches. However, the volatile and unstable behavior of financial markets forbids the accurate prediction of future prices, reducing the performance of these approaches. In contrast, in this paper we propose a novel price trailing method that goes beyond traditional price forecasting by reformulating trading as a control problem, effectively overcoming the aforementioned limitations. The proposed method leads to developing robust agents that can withstand large amounts of noise, while still capturing the price trends and allowing for taking profitable decisions.
EXT="Tefas, Anastasios"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
1D Convolutional Neural Networks (CNNs) have recently become the state-of-the-art technique for crucial signal processing applications such as patient-specific ECG classification, structural health monitoring, anomaly detection in power electronics circuitry and motor-fault detection. This is an expected outcome as there are numerous advantages of using an adaptive and compact 1D CNN instead of a conventional (2D) deep counterparts. First of all, compact 1D CNNs can be efficiently trained with a limited dataset of 1D signals while the 2D deep CNNs, besides requiring 1D to 2D data transformation, usually need datasets with massive size, e.g., in the »Big Data» scale in order to prevent the well-known »overfitting» problem. 1D CNNs can directly be applied to the raw signal (e.g., current, voltage, vibration, etc.) without requiring any pre- or post-processing such as feature extraction, selection, dimension reduction, denoising, etc. Furthermore, due to the simple and compact configuration of such adaptive 1D CNNs that perform only linear 1D convolutions (scalar multiplications and additions), a real-time and low-cost hardware implementation is feasible. This paper reviews the major signal processing applications of compact 1D CNNs with a brief theoretical background. We will present their state-of-the-art performances and conclude with focusing on some major properties. Keywords - 1-D CNNs, Biomedical Signal Processing, SHM.
EXT="Kiranyaz, Serkan"
EXT="Ince, Turker"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Forecasting time series has several applications in various domains. The vast amount of data that are available nowadays provide the opportunity to use powerful deep learning approaches, but at the same time pose significant challenges of high-dimensionality, velocity and variety. In this paper, a novel logistic formulation of the well-known Bag-of-Features model is proposed to tackle these challenges. The proposed method is combined with deep convolutional feature extractors and is capable of accurately modeling the temporal behavior of time series, forming powerful forecasting models that can be trained in an end-to-end fashion. The proposed method was extensively evaluated using a large-scale financial time series dataset, that consists of more than 4 million limit orders, outperforming other competitive methods.
EXT="Tefas, Anastasios"
EXT="Iosifidis, Alexandros"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of future video coding standard by the Joint Video Exploration Team (JVET), named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined by High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a fast QTBT partitioning scheme based on a Machine Learning approach. Complementary to techniques proposed in literature to reduce the complexity of HEVC Quad Tree (QT) partitioning, the propose solution uses Random Forest classifiers to determine for each block which partition modes between QT and BT is more likely to be selected. Using uncertainty zones of classifier decisions, the proposed complexity reduction technique is able to reduce in average by 30% the encoding time of JEM-v7.0 software in Random Access configuration with only 0.57% Bjontegaard Delta Rate (BD-BR) increase.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In recent years, several studies have established a relationship between mammographic parenchymal patterns and breast cancer risk. However, there is a lack of publicly available data and software for objective comparison and clinical validation. This paper presents an open and adaptable implementation (OpenBreast v1.0) of a fully-Automatic computerized framework for mammographic image analysis for breast cancer risk assessment. OpenBreast implements mammographic image analysis in four stages: breast segmentation, detection of region-of-interests, feature extraction and risk scoring. For each stage, we provide implementations of several state-of-The-Art methods. The pipeline is tested on a set of 305 full-field digital mammography images corresponding to 84 patients (51 cases and 49 controls) from the breast cancer digital repository (BCDR). OpenBreast achieves a competitive AUC of 0.846 in breast cancer risk assessment. In addition, used jointly with widely accepted risk factors such as patient age and breast density, mammographic image analysis using OpenBreast shows a statistically significant improvement in performance with an AUC of 0.876 (\mathrm{p}<0.001). Our framework will be made publicly available and it is easy to incorporate new methods.
EXT="Pertuz, Said"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Photonic neuromorphic hardware can provide significant performance benefits for Deep Learning (DL) applications by accelerating and reducing the energy requirements of DL models. However, photonic neuromorphic architectures employ different activation elements than those traditionally used in DL, slowing down the convergence of the training process for such architectures. An initialization scheme that can be used to efficiently train deep photonic networks that employ quadratic sinusoidal activation functions is proposed in this paper. The proposed initialization scheme can overcome these limitations, leading to faster and more stable training of deep photonic neural networks. The ability of the proposed method to improve the convergence of the training process is experimentally demonstrated using two different DL architectures and two datasets.
EXT="Tefas, Anastasios"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Ground Penetrating Radar (GPR) is generally used as a non-destructive method of inspection for structures and for finding defects in concrete slabs. In this paper, GPR is used in the detection of water inside the cavities of concrete hollow core slabs. We propose an algorithm that determines the water level inside the concrete slab by analyzing the time delays of the reflections originating from inside the cavity. The algorithm is based on utilizing prior knowledge about the geometry of the hollow core slab. The presence of water was successfully detected and an estimate for the height of the water surface was obtained with a GPR system operating with a central frequency of 2.7 GHz. Based on the experiments, the proposed method holds promise in providing a robust and accurate method for the detection of water inside the concrete slabs. Results, possible future research and analysis of the feasibility of GPR systems in water detection are presented and discussed.
jufoid=57477
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Terahertz (THz) band communications, capable of achieving the theoretical capacity of up to several terabits-per-second, are one of the attractive enablers for beyond 5G wireless networks. THz systems will use extremely directional narrow beams, allowing not only to extend the communication range but also to partially secure the data already at the physical layer. The reason is that, in most cases, the Attacker has to be located within the transmitter beam in order to eavesdrop the message. However, even the use of very narrow beams results in the considerably large area around the receiver, where the Attacker can capture all the data. In this paper, we study how to decrease the message eavesdropping probability by leveraging the inherent multi-path nature of the THz communications. We particularly propose sharing the data transmission over multiple THz propagation paths currently available between the communicating entities. We show that, at a cost of the slightly reduced link capacity, the message eavesdropping probability in the described scheme decreases significantly even when several Attackers operate in a cooperative manner. The proposed solution can be utilized for the transmission of the sensitive data, as well as to secure the key exchange in THz band networks beyond 5G.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The intermittent millimeter-wave radio links as a result of human-body blockage are an inherent feature of the 5G New Radio (NR) technology by 3GPP. To improve session continuity in these emerging systems, two mechanisms have recently been proposed, namely, multi-connectivity and guard bandwidth. The former allows to establish multiple spatially-diverse connections and switch between them dynamically, while the latter reserves a fraction of system bandwidth for sessions changing their state from non-blocked to blocked, which ensures that the ongoing sessions have priority over the new ones. In this paper, we assess the joint performance of these two schemes for the user- and system-centric metrics of interest. Our numerical results reveal that the multi-connectivity operation alone may not suffice to increase the ongoing session drop probability considerably. On the other hand, the use of guard bandwidth significantly improves session continuity by somewhat compromising new session drop probability and system resource utilization. Surprisingly, the 5G NR system implementing both these techniques inherits their drawbacks. However, complementing it with an initial AP selection procedure effectively alleviates these limitations by maximizing the system resource utilization, while still providing sufficient flexibility to enable the desired trade-off between new and ongoing session drop probabilities.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper investigates the performance of energy detection-based spectrum sensing over Fisher-Snedecor F fading channels. To this end, an analytical expression for the corre- sponding average detection probability is firstly derived and then this is extended to account for collaborative spectrum sensing. The complementary receiver operating characteristics (ROC) are analyzed for different conditions of the average signal-to- noise ratio (SNR), time-bandwidth product, multipath fading, shadowing and number of collaborating users. It is shown that the energy detection performance is strongly linked to the severity of the multipath fading and amount of shadowing, whereby even small variations in either of these physical phenomena significantly impact the detection probability. Also, the versatile modeling capability of the Fisher-Snedecor F distribution is veridfied in the context of energy detection based spectrum sensing as it provides considerably more accurate characterization than the conventional Rayleigh fading model. To confirm the validity of the analytical results presented in this paper, we compare them with the results of some simulations.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Markov Decision Processes (MDPs) provide important capabilities for facilitating the dynamic adaptation of hardware and software configurations to the environments in which they operate. However, the use of MDPs in embedded signal processing systems is limited because of the large computational demands for solving this class of system models. This paper presents Sparse Parallel Value Iteration (SPVI), a new algorithm for solving large MDPs on resource-constrained embedded systems that are equipped with mobile GPUs. SPVI leverages recent advances in parallel solving of MDPs and adds sparse linear algebra techniques to significantly outperform the state-of-the-art. The method and its application are described in detail, and demonstrated with case studies that are implemented on an NVIDIA Tegra K1 System On Chip (SoC). The experimental results show execution time improvements in the range of 65 % -78% for several applications. SPVI also lifts restrictions required by other MDP solver approaches, making it more widely compatible with large classes of optimization problems.
jufoid=71852
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes an active learning method to control a labeling process for efficient annotation of acoustic training material, which is used for training sound event classifiers. The proposed method performs K-medoids clustering over an initially unlabeled dataset, and medoids as local representatives, are presented to an annotator for manual annotation. The annotated label on a medoid propagates to other samples in its cluster for label prediction. After annotating the medoids, the annotation continues to the unexamined sounds with mismatched prediction results from two classifiers, a nearest-neighbor classifier and a model-based classifier, both trained with annotated data. The annotation on the segments with mismatched predictions are ordered by the distance to the nearest annotated sample, farthest first. The evaluation is made on a public environmental sound dataset. The labels obtained through a labeling process controlled by the proposed method are used to train a classifier, using supervised learning. Only 20% of the data needs to be manually annotated with the proposed method, to achieve the accuracy with all the data annotated. In addition, the proposed method clearly outperforms other active learning algorithms proposed for sound event classification through all the experiments, simulating varying fraction of data that is manually labeled.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
To detect the class, and start and end times of sound events in real world recordings is a challenging task. Current computer systems often show relatively high frame-wise accuracy but low event-wise accuracy. In this paper, we attempted to merge the gap by explicitly including sequential information to improve the performance of a state-of-the-art polyphonic sound event detection system. We propose to 1) use delayed predictions of event activities as additional input features that are fed back to the neural network; 2) build N-grams to model the co-occurrence probabilities of different events; 3) use se-quentialloss to train neural networks. Our experiments on a corpus of real world recordings show that the N-grams could smooth the spiky output of a state-of-the-art neural network system, and improve both the frame-wise and the event-wise metrics.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We present an overview of the challenge entries for the Acoustic Scene Classification task of DCASE 2017 Challenge. Being the most popular task of the challenge, acoustic scene classification entries provide a wide variety of approaches for comparison, with a wide performance gap from top to bottom. Analysis of the submissions confirms once more the popularity of deep-learning approaches and mel frequency representations. Statistical analysis indicates that the top ranked system performed significantly better than the others, and that combinations of top systems are capable of reaching close to perfect performance on the given data.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents a low-latency neural network based speech enhancement system. Low-latency operation is critical for speech communication applications. The system uses the time-frequency (TF) masking approach to retain speech and remove the non-speech content from the observed signal. The ideal TF mask are obtained by supervised training of neural networks. As the main contribution different neural network models are experimentally compared to investigate computational complexity and speech enhancement performance. The proposed system is trained and tested on noisy speech data where signal-to-noise ratio (SNR) ranges from -5 dB to +5 dB and the results show significant reduction of non-speech content in the resulting signal while still meeting a low-latency operation criterion, which is here considered to be less than 20 ms.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Mean square error (MSE) has been the preferred choice as loss function in the current deep neural network (DNN) based speech separation techniques. In this paper, we propose a new cost function with the aim of optimizing the extended short time objective intelligibility (ESTOI) measure. We focus on applications where low algorithmic latency (≤ 10 ms) is important. We use long short-term memory networks (LSTM) and evaluate our proposed approach on four sets of two-speaker mixtures from extended Danish hearing in noise (HINT) dataset. We show that the proposed loss function can offer improved or at par objective intelligibility (in terms of ESTOI) compared to an MSE optimized baseline while resulting in lower objective separation performance (in terms of the source to distortion ratio (SDR)). We then proceed to propose an approach where the network is first initialized with weights optimized for MSE criterion and then trained with the proposed ESTOI loss criterion. This approach mitigates some of the losses in objective separation performance while preserving the gains in objective intelligibility.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes a novel method for separation of sound sources with ambisonic signals using multichannel non-negative matrix factorization (MNMF) for source spectrogram estimation. We present a novel frequency-independent spatial covariance matrix (SCM) model for spherical harmonic (SH) domain signals which makes the MNMF parameter estimation framework computationally feasible up to 3rd order SH signals. The evaluation is done with simulated SH domain mixtures by measuring the separation performance using objective criteria and comparing the proposed method against SH domain beamforming. The proposed method improves average separation performance over beamforming with post-filtering when using 1st and 2nd order SH signals while at higher orders performance among all tested methods is similar.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Parkinson's disease (PD) is a degenerative and long-term disorder of the central nervous system, which often causes motor symptoms, e.g., tremor, rigidity, and slowness. Currently, the diagnosis of PD is based on patient history and clinical examination. Technology-derived decision support systems utilizing, for example, sensor-rich smartphones can facilitate more accurate PD diagnosis. These technologies could provide less obtrusive and more comfortable remote symptom monitoring. The recent studies showed that motor symptoms of PD can reliably be detected from data gathered via smartphones. The current study utilized an open-access dataset named 'mPower' to assess the feasibility of discriminating PD from non-PD by analyzing a single self-administered 20-step walking test. From this dataset, 1237 subjects (616 had PD) who were age and gender matched were selected and classified into PD and non-PD categories. Linear acceleration (ACC) and gyroscope (GYRO) were recorded by built-in sensors of smartphones. Walking bouts were extracted by thresholding signal magnitude area of the ACC signals. Features were computed from both ACC and GYRO signals and fed into a random forest classifier of size 128 trees. The classifier was evaluated deploying 100-fold cross-validation and provided an accumulated accuracy rate of 0.7 after 10k validations. The results show that PD and non-PD patients can be separated based on a single short-lasting self-administered walking test gathered by smartphones' built-in inertial measurement units.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Postural Instability (PI) is a major reason for fall in geriatric population as well as for people with diseases or disorders like Parkinson's, stroke etc. Conventional stability indicators like Berg Balance Scale (BBS) require clinical settings with skilled personnel's interventions to detect PI and finally classify the person into low, mid or high fall risk categories. Moreover these tests demand a number of functional tasks to be performed by the patient for proper assessment. In this paper a machine learning based approach is developed to determine fall risk with minimal human intervention using only Single Limb Stance exercise. The analysis is done based on the spatiotemporal dynamics of skeleton joint positions obtained from Kinect sensor. A novel posture modeling method has been applied for feature extraction along with some traditional time domain and metadata features to successfully predict the fall risk category. The proposed unobstrusive, affordable system is tested over 224 subjects and is able to achieve 75% mean accuracy on the geriatric and patient population.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Atrial fibrillation (AF) is the most common type of cardiac arrhythmia. Although not life-threatening itself, AF significantly increases the risk of stroke and myocardial infarction. Current tools available for screening and monitoring of AF are inadequate and an unobtrusive alternative, suitable for long-term use, is needed. This paper evaluates an atrial fibrillation detection algorithm based on wrist photoplethysmographic (PPG) signals. 29 patients recovering from surgery in the post-anesthesia care unit were monitored. 15 patients had sinus rhythm (SR, 67.5± 10.7 years old, 7 female) and 14 patients had AF (74.8± 8.3 years old, 8 female) during the recordings. Inter-beat intervals (IBI) were estimated from PPG signals. As IBI estimation is highly sensitive to motion or other types of noise, acceleration signals and PPG waveforms were used to automatically detect and discard unreliable IBI. AF was detected from windows of 20 consecutive IBI with 98.45±6.89% sensitivity and 99.13±1.79% specificity for 76.34±19.54% of the time. For the remaining time, no decision was taken due to the lack of reliable IBI. The results show that wrist PPG is suitable for long term monitoring and AF screening. In addition, this technique provides a more comfortable alternative to ECG devices.
INT=tut-bmt, "Yousefi, Zeinab Rezaei"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes a method for online estimation of time-varying room impulse responses (RIR) between multiple isolated sound sources and a far-field mixture. The algorithm is formulated as adaptive convolutive filtering in short-time Fourier transform (STFT) domain. We use the recursive least squares (RLS) algorithm for estimating the filter parameters due to its fast convergence rate, which is required for modeling rapidly changing RIRs of moving sound sources. The proposed method allows separation of reverberated sources from the far-field mixture given that their close-field signals are available. The evaluation is based on measuring unmixing performance (removal of reverberated source) using objective separation criteria calculated between the ground truth recording of the preserved sources and the unmixing result obtained with the proposed algorithm. We compare online and offline formulations for the RIR estimation and also provide evaluation with blind source separation algorithm only operating on the mixture signal.
jufoid=57409
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
There is an emerging interest aiming at defining principles for signals on general graphs, which are analogous to the basic principles in traditional signal processing. One example is the Graph Fourier Transform which aims at decomposing a graph signal into its components based on a set of basis functions with corresponding graph frequencies. It has been observed that most of the important information of a graph signal is contained inside the low frequency band, which leads to several applications such as denoising, compression, etc. In this paper, we show that the low frequency basis functions span the salient regions in an image, which can also be considered as important regions. Motivated by this, we present a novel simple and unsupervised method to utilize a number of low-energy basis functions and show that it improves the performance of seven state-of-the-art salient object detection methods in five datasets under four different evaluation criteria, with only minor exceptions.
jufoid=57409
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper studies the problem of full reference visual quality assessment of denoised images with a special emphasis on images with low contrast and noise-like texture. Denoising of such images together with noise removal often results in image details loss or smoothing. A new test image database, FLT, containing 75 noise-free 'reference' images and 300 filtered ('distorted') images is developed. Each reference image, corrupted by an additive white Gaussian noise, is denoised by the BM3D filter with four different values of threshold parameter (four levels of noise suppression). After carrying out a perceptual quality assessment of distorted images, the mean opinion scores (MOS) are obtained and compared with the values of known full reference quality metrics. As a result, the Spearman Rank Order Correlation Coefficient (SROCC) between PSNR values and MOS has a value close to zero, and SROCC between values of known full-reference image visual quality metrics and MOS does not exceed 0.82 (which is reached by a new visual quality metric proposed in this paper). The FLT dataset is more complex than earlier datasets used for assessment of visual quality for image denoising. Thus, it can be effectively used to design new image visual quality metrics for image denoising.
EXT="Lukin, Vladimir"
JUFOID=57409
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we propose a joint framework for target localization and classification using a single generalized model for non-imaging based multi-modal sensor data. For target localizat