Research output: Contribution to journal › Meeting Abstract › Scientific › peer-review
Research output: Other conference contribution › Paper, poster or abstract › Scientific
Research output: Other conference contribution › Paper, poster or abstract › Scientific
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The band pass filter is used to attenuating breathing originated signal from the heart originated BCG signal. The bandwidth of the both signals slightly overlap, hereby the complete attenuation of the breathing is not possible without also altering the heart originated BCG waveforms and the parameters which are obtained from the BCG. In our study we investigated the optimal lower cut-off frequency, and 1.3 Hz was found as the reasonable compromise between the attenuation of the breathing and the altering of the heart originated BCG.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Contribution to journal › Editorial › Scientific
Research output: Contribution to journal › Editorial › Scientific
The papers in this special section are devoted to the growing field of acoustic scene classification and acoustic event recognition. Machine listening systems still have difficulties to reach the ability of human listeners in the analysis of realistic acoustic scenes. If sustained research efforts have been made for decades in speech recognition, speaker identification and to a lesser extent in music information retrieval, the analysis of other types of sounds, such as environmental sounds, is the subject of growing interest from the community and is targeting an ever increasing set of audio categories. This problem appears to be particularly challenging due to the large variety of potential sound sources in the scene, which may in addition have highly different acoustic characteristics, especially in bioacoustics. Furthermore, in realistic environments, multiple sources are often present simultaneously, and in reverberant conditions.
Research output: Contribution to journal › Article › Scientific
Research output: Contribution to journal › Article › Scientific
The emerging 5G New Radio (NR) networks are expected to enable huge improvements, e.g., in terms of capacity, number of connected devices, peak data rates and latency, compared to existing networks. At the same time, a new trend referred to as the RF convergence is aiming to jointly integrate communications and sensing functionalities into the same systems and hardware platforms. In this paper, we investigate the sensing prospects of 5G NR systems, with particular emphasis on the user equipment side and their potential for joint communications and environment mapping. To this end, a radio-based sensing approach utilizing the 5G NR uplink transmit signal and an efficient receiver processing and mapping scheme are proposed. An indoor scenario is then studied and evaluated through real-world RF measurements at 28 GHz mm-wave band, showing that impressive mapping performance can be achieved by the proposed system. The measurement data is available at a permanent open repository.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Positioning is considered one of the most important features and enabler of various novel industry verticals in future radio systems. Since path loss or received signal strength-based measurements are widely available and accessible in the majority of wireless standards, path loss-based positioning has an important role among other positioning technologies. Conventionally path loss-based positioning has two phases; i) fitting a path loss model to training data, if such training data is available, and ii) determining link distance estimates based on the path loss model and calculating the position estimate. However, in both phases, the maximum measurable path loss is limited by measurement noise. Such immeasurable samples are called censored path loss data and such noisy data is commonly neglected in both the model fitting and in the positioning phase. In the case of censored path loss, the loss is known to be above a known threshold level and that information can be used in model fitting as well as in the positioning phase. In this paper, we examine and propose how to use censored path loss data in path loss model-based positioning and demonstrate with simulations the potential of the proposed approach for considerable improvements (over 30%) in positioning accuracy.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Future 5G networks will serve both terrestrial and aerial users, thanks to their network slicing and flexible numerology capabilities. The probability of Line-of-Sight (LoS) propagation will be intuitively higher for aerial users than for terrestrial users and this will provide a trade-off between increased capacity and increased interference. Our paper analyzes theoretically this trade-off and proposes solutions based on downlink multiantenna beamorming and joint optimization of the signal-to- interference ratio of multiple aerial users. It is shown that Multiple-Input-Single-Output solutions offer the most convenient tradeoff between complexity and capacity/interference performance. Simulation results are provided for mm Wave bands and low-altitude aerial vehicles.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Powerful in-band interference can saturate a receiver's front-end and limit the usefulness of digital interference suppression methods that are bounded by the receiver's limited dynamic range. This is especially true for the self-interference (SI) encountered in full-duplex (FD) radios, but also in the case of strong interference between co-located radios. However, unlike in FD radios, receivers co-located with interference sources do not typically have direct access to the transmitted interference. This work analyzes the performance of a digitally-assisted analog interference mitigation method and its implementation for the suppression of frequency-modulated (FM) interference before quantization in global navigation satellite system (GNSS) receivers that are co-located with interference sources. Over-the-air measurement results are presented that illustrate the effects of interference mitigation on GPS L1 and Galileo E1 reception in a commercial off-the-shelf GNSS receiver and a software-defined GNSS receiver. The analysis covers the effects of the interference mitigation on the radio frequency (RF) front-end, acquisition, tracking, and positioning stages.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Deep learning models are capable of achieving state-of-the-art performance on a wide range of time series analysis tasks. However, their performance crucially depends on the employed normalization scheme, while they are usually unable to efficiently handle non-stationary features without first appropriately pre-processing them. These limitations impact the performance of deep learning models, especially when used for forecasting financial time series, due to their non-stationary and multimodal nature. In this paper we propose a data-driven adaptive normalization layer which is capable of learning the most appropriate normalization scheme that should be applied on the data. To this end, the proposed method first identifies the distribution from which the data were generated and then it dynamically shifts and scales them in order to facilitate the task at hand. The proposed nor-malization scheme is fully differentiable and it is trained in an end-to-end fashion along with the rest of the parameters of the model. The proposed method leads to significant performance improvements over several competitive normalization approaches, as demonstrated using a large-scale limit order book dataset.
EXT="Tefas, Anastasios"
EXT="Iosifidis, Alexandros"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Fog computing brings computation and services to the edge of networks enabling real time applications. In order to provide satisfactory quality of experience, the latency of fog networks needs to be minimized. In this paper, we consider a peer computation offloading problem for a fog network with unknown dynamics. Peer competition occurs when different fog nodes offload tasks to the same peer FN. In this paper, the computation offloading problem is modeled as a sequential FN selection problem with delayed feedback. We construct an online learning policy based on the adversary multi-arm bandit framework to deal with peer competition and delayed feedback. Simulation results validate the effectiveness of the proposed policy.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, the potential of extending 5G New Radio physical layer solutions to support communications in sub-THz frequencies is studied. More specifically, we introduce the status of third generation partnership project studies related to operation on frequencies beyond 52.6 GHz and note also the recent proposal on spectrum horizons provided by federal communications commission (FCC) related to experimental licenses on 95 GHz-3 THz frequency band. Then, we review the power amplifier (PA) efficiency and output power challenge together with the increased phase noise (PN) distortion effect in terms of the supported waveforms. As a practical example on the waveform and numerology design from the perspective of the PN robustness, link performance results using 90 GHz carrier frequency are provided. The numerical results demonstrate that new, higher subcarrier spacings are required to support high throughput, which requires larger changes in the physical layer design. It is also observed that new phase-tracking reference signal designs are required to make the system robust against PN. The results illustrate that single-carrier frequency division multiple access is significantly more robust against PN and can provide clearly larger PA output power than cyclic-prefix orthogonal frequency division multiplexing, and is therefore a highly potential waveform for sub-THz communications.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this study, machine washing durability of working glove-integrated passive RFID tags is evaluated. These glove-tags are embedded inside 3D-printed thermoplastic polyurethane platforms. The results are compared to platforms embedded inside brush-painted encapsulant platforms. For a preliminary washing reliability evaluation, both types of glove- integrated platforms are washed in a washing machine for 5 times. Although both platforms can protect glove-tags from the effects of water, the main reliability challenge is found to be the fragile antenna-IC attachments. This paper introduces the two platform materials and the achieved washing test results. These preliminary results determine the future direction of this research: The next step is to study suitable methods to strengthen the interconnections, so that these glove-tags can survive the harsh environment inside a washing machine.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We present a headband loop antenna for wireless power transfer to multiple IMDs located in the cranial cavity at the depth of 10 mm from the skin. We characterize the wireless power transfer link in terms of the power gain and the power delivered to the IMD, when maximum SAR compliant transmission power is fed to the headband antenna at frequency of 5 MHz. We also consider two types of the misalignments i.e. lateral and angular, between the IMD antenna and the headband antenna and discuss their impact on the transducer gain, impedance matching and on the power delivered to the IMD.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We present a compact circularly polarized (CP) antenna for wearable passive UHF RFID tags. The antenna is a square-shaped microstrip patch antenna where we have applied corner truncation and slotting techniques in the top layer conductor for achieving the CP property and a shorting pin and loop structure for impedance matching. Despite using a lowpermittivity textile as antenna substrate, the antenna's footprint size is only 5-by-5 cm, which is approximately 15% of the operating wavelength. At the same time, the on-body measurements, the antenna's axial ratio is 0.9 dB and the measured attainable read range (reader's EIRP =3.28W) of the tag reaches 4.2 meters with a CP reader antenna and ranges from 2.9 meters to 3.4 meters for a linear reader antenna, depending on the rotation angle between the antennas.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The beyond-5G vehicular communications are expected not only to utilize the already explored millimeter-wave band but also to start harnessing the higher frequencies above 100 GHz ultimately targeting the so-called low terahertz band, 300 GHz-1 THz. In this paper, we perform a set of propagation measurements at 300 GHz band in representative vehicular environments. Particularly, we report on the reflection losses from the front, rear, and side of a regular vehicle. In addition, the penetration losses when propagating through, over, and under the vehicle are presented. Our study reveals that the vehicle body is extremely heterogeneous in terms of the propagation losses: the attenuation heavily depends on the trajectory of the 300 GHz signal through the vehicle. The reported measurement data may be used as a reference when developing the vehiclespecific channel and interference models for future wireless communications in the low terahertz band.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The interior structures of the comets and asteroids, still poorly known, might hold a unique key to understand the early Solar System. Considering the interaction of an illuminated electromagnetic wave with this kind of targets, these 'objects' are very large compared to the applicable wavelength. Consequently, tomographic imaging of such targets, i.e., reconstructing their interior structure via multiple measurements, constitutes a challenging inverse problem. To reach this objective and to develop and test inverse algorithms, we need to investigate electromagnetic fields that have interacted with structures analogous to real asteroids and comets. In this study, we focus on the acquisition of these fields considering three methods: calculated fields obtained with (1) time and (2) frequency domain methods and (3) microwave measurements performed for an analogue model, i.e., a small-scale asteroid model.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents the first prototype of a passive RFID-based textile touchpad. Our unique solution takes advantage of ICs from passive UHF RFID technology. These components are combined into a textile-integrated IC array, which can be used for handwritten character recognition. As the solution is fully passive and gets all the needed energy from the RFID reader, it enables a maintenance-free and cost-effective user interface that can be integrated into clothing and into textiles around us.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Deploying sub-THz frequencies for mobile communications is one timely research area, due to the availability of very wide and contiguous chunks of the radio spectrum. However, at such extremely high frequencies, there are large challenges related to, e.g., phase noise, propagation losses as well as to energy-efficiency, since generating and radiating power with reasonable efficiency is known to be far more difficult than at lower frequencies. To address the energy-efficiency and power amplifier (PA) nonlinear distortion related challenges, modulation methods and waveforms with low peak-to-average-power ratio (PAPR) are needed. To this end, a new modulation approach is formulated and proposed in this paper, referred to as constrained phase-shift keying (CPSK). The CPSK concept builds on the traditional PSK constellations, while additional constraints are applied to the time domain symbol transitions in order to control and reduce the PAPR of the resulting waveform. This new modulation is then compared with pulse-shaped π/2-BPSK and ordinary QPSK, in the discrete Fourier transform (DFT) spread orthogonal frequency division multiplexing (DFT-s-OFDM) context, in terms of the resulting PAPR distributions and the achievable maximum PA output power, subject to constraints in the passband waveform quality and out-of-band emissions. The obtained results show that the proposed CPSK approach allows for reducing the PAPR and thereon for achieving higher PA output powers, compared to QPSK, while still offering the same spectral efficiency. Overall, the CPSK concept offers a flexible modulation solution with controlled PAPR for the future sub-THz networks.
JUFOID=88220
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
There is a strong interest in utilizing commercial cellular networks to support unmanned aerial vehicles (UAVs) to send control commands and communicate heavy traffic. Cellular networks are well suited for offering reliable and secure connections to the UAVs as well as facilitating traffic management systems to enhance safe operation. However, for the full-scale integration of UAVs that perform critical and high-risk tasks, more advanced solutions are required to improve wireless connectivity in mobile networks. In this context, integrated access and backhaul (IAB) is an attractive approach for the UAVs to enhance connectivity and traffic forwarding. In this paper, we study a novel approach to dynamic associations based on reinforcement learning at the edge of the network and compare it to alternative association algorithms. Considering the average data rate, our results indicate that the reinforcement learning methods improve the achievable data rate. The optimal parameters of the introduced algorithm are highly sensitive to the donor next generation node base (DgNB) and UAV IAB node densities, and need to be identified beforehand or estimated via a stateful search. However, its performance nearly converges to that of the ideal scheme with a full knowledge of the data rates in dense deployments of DgNBs.
JUFOID=88220
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The prospective millimeter-wave (mmWave) networks are envisioned to heavily utilize relay nodes to improve their performance in certain scenarios. In addition to the stationary mmWave relays already considered by 3GPP as one of the main focuses, the community recently started to explore the use of unmanned aerial vehicle (UAV)-based mmWave relays. These aerial nodes provide greater flexibility in terms of the relay placement in different environments as well as the ability to optimize the deployment height thus maximizing the cell performance. At the same time, the use of UAV-based relays leads to additional deployment complexity and expenditures for the network operators. In this paper, taking into account 3GPP-standardized mmWave-specific propagation, blockage, and resource allocation we compare the capacity gains brought by the static and the UAV-based mmWave relays in different scenarios. For each of the relay types, we investigate both uniform and clustered distribution of human users. The developed mathematical framework and a numerical study reveal that the highest capacity gains when utilizing the UAV-based relays instead of the static ones are observed in clustered deployments (up to 31%), while the performance difference between the UAV-based and the static mmWave relays under a uniform distribution of users is just 3%.
JUFOID=88220
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Cortical spreading depression (CSD) is a slowly propagating wave of depolarization of brain cells, followed by temporary silenced electrical brain activity. Major structural changes during CSD are linked to neuronal and possibly glial swelling. However, basic questions still remain unanswered. In particular, there are open questions regarding whether neurons or glial cells swell more, and how the cellular swelling affects the CSD wave propagation.In this study, we computationally explore how different parameters affect the swelling of neurons and astrocytes (starshaped glial cells) during CSD and how the cell swelling alters the CSD wave spatial distribution. We apply a homogenized mathematical model that describes electrodiffusion in the intraand extracellular space, and discretize the equations using a finite element method. The simulations are run with a twocompartment (extracellular space and neurons) and a threecompartment version of the model with astrocytes added. We consider cell swelling during CSD in four scenarios: (A) incorporating aquaporin-4 channels in the astrocytic membrane, (B) increasing the neuron/astrocyte ratio to 2:1, (C) blocking and increasing the Na+/K+-ATPase rate in the astrocytic compartment, and (D) blocking the Cl- channels in astrocytes. Our results show that increasing the water permeability in the astrocytes results in a higher astrocytic swelling and a lower neuronal swelling than in the default case. Further, elevated neuronal density increases the swelling in both neurons and astrocytes. Blocking the Na+/K+-ATPase in the astrocytes leads to an increased wave width and swelling in both compartments, which instead decreases when the pump rate is raised. Blocking the Cl- channels in the astrocytes results in neuronal swelling, and a shrinkage in the astrocytes. Our results suggest a supporting role of astrocytes in preventing cellular swelling and CSD, as well as highlighting how dysfunctions in astrocytes might elicit CSD.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Providing sufficient mobile coverage during mass public events or critical situations is a highly challenging task for the network operators. To fulfill the extreme capacity and coverage demands within a limited area, several augmenting solutions might be used. Among them, novel technologies like a fleet of compact base stations mounted on Unmanned Aerial Vehicles (UAVs) are gaining momentum because of their time- and cost- efficient deployment. Despite the fact that the concept of aerial wireless access networks has been investigated recently in many research studies, there are still numerous practical aspects that require further understanding and extensive evaluation. Taking this as a motivation, in this paper, we develop the concept of continuous wireless coverage provisioning by the means of UAVs and assess its usability in mass scenarios with thousands of users. With our system-level simulations as well as a measurement campaign, we take into account a set of important parameters including weather conditions, UAV speed, weight, power consumption, and millimeter- wave (mmWave) antenna configuration. As a result, we provide more realistic data about the performance of the access and backhaul links together with the practical lessons learned about the design and real-world applicability of the UAV-enabled wireless access networks.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The limitations of state-of-the-art cellular modems prevent achieving low-power and low-latency Machine Type Communications (MTC) based on current power saving mechanisms alone. Recently, the concept of wake-up scheme has been proposed to enhance battery lifetime of 5G devices, while reducing the buffering delay. The existing wake-up algorithms use static operational parameters that are determined by the radio access network at the start of the userâ™s session. In this paper, the average power consumption of the wake-up enabled MTC UE is modeled by using a semi-Markov process and then optimized through a delay-constrained optimization problem, by which the optimal wake-up cycle is obtained in closed form. Numerical results show that the proposed solution reduces the power consumption of an optimized Discontinuous Reception (DRX) scheme by up to 40% for a given delay requirement.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper explores the activity of coding with smart toy robots Dash and Botley as a part of playful learning in the Finnish early education context. The findings of our study demonstrate how coding with the two toy robots was approached, conducted and played by Finnish preschoolers aged 5-6 years. The main conclusion of the study is that preschoolers used the toy robots with affordances related to coding mainly in developing gamified play around them by designing tracks for the toys, programming the toys to solve obstacle paths, and competing in player-generated contests of dexterity, speed and physically mobile play.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
To meet the prospective demands of intelligent transportation systems (ITS), the Release 14 (Rel-14) and Rel-15 of the Long Term Evolution (LTE) specifications include solutions for enhanced vehicle-to-everything (V2X) communications. While the technical enablers of Rel-14 are suitable for delivering basic safety messages, Rel-15 supports more demanding ITS services with stringent latency and reliability. Starting in Rel-15 and continuing in Rel-16, the 3GPP was developing a novel radio interface for 5G systems, termed the New Radio (NR), which will enable ultra reliable and low latency communications suitable even for the most demanding ITS applications. In this paper, we overview the new V2X-specific features in Rel-15 and Rel-16. Further, we argue that future V2X and automotive radar systems may reuse common equipment, such as millimeter-wave antenna arrays. We finally discuss the vision of joint vehicular communications and radar sensing as well as characterize unified channel access for millimeter-wave vehicular communications and radar sensing.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Due to the closely-spaced antenna elements in largearray or massive MIMO transmitters, antenna crosstalk is inevitable. This imposes additional challenges when seeking to linearize the power amplifiers at the transmitter through digital predistortion (DPD). In the commonly applied indirect learning architecture (ILA), the antenna crosstalk is known to result in a large amount of additional basis functions (BFs) in order to account for all the coupling signal terms and achieve good linearization. In this article, we propose a novel closed-loop DPD architecture and associated parameter learning algorithms that can provide efficient linearization of digital MIMO transmitters under antenna crosstalk. The proposed solution does not need extra basis functions, and is thus shown to provide large benefits in terms of computational complexity compared to existing state-of-the-art. Comprehensive numerical results are also provided, showing excellent linearization performance outperforming the existing reference methods.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We briefly introduce two submissions to the Illumination Estimation Challenge, in the Int'l Workshop on Color Vision, affiliated to the 11th Int'l Symposium on Image and Signal Processing and Analysis. The fourier-transform-based submission is ranked 3rd, and the statistical Gray-pixel-based one ranked 6th.
EXT="Chen, Ke"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Graphics Processing Units (GPU) have been widely used in various fields of scientific computing, such as in signal processing. GPUs have a hierarchical memory structure with memory layers that are shared between GPU processing elements. Partly due to the complex memory hierarchy, GPU programming is non-Trivial, and several aspects must be taken into account, one being memory access patterns. One of the fastest GPU memory layers, shared memory, is grouped into banks to enable fast, parallel access for processing elements. Unfortunately, it may happen that multiple threads of a GPU program may access the same shared memory bank simultaneously causing a bank conflict. If this happens, program execution slows down as memory accesses have to be rescheduled to determine which instruction to execute first. Bank conflicts are not taken into account automatically by the compiler, and hence the programmer must detect and deal with them prior to program execution. In this paper, we present an algebraic approach to detect bank conflicts and prove some theoretical results that can be used to predict when bank conflicts happen and how to avoid them. Also, our experimental results illustrate the savings in computation time.
INT=comp,"Ferranti, Luca"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The Internet of Things (IoT) enables long-range outdoor networks, such as smart grid and municipal lighting, as well as short-range indoor systems for smart homes, residential security, and energy management. Wireless connectivity and standardized communication protocols become an essential technology baseline for these diverse IoT applications. The focus of this work is wireless connectivity for smart metering systems. One of the recent protocols in this field is Wireless M-BUS, which is being widely utilized for remote metering applications across Europe. Therefore, in this paper, we detail a novel multi-platform framework designed to serve as a data generator for the protocol in question. The developed software allows to construct Wireless M-Bus telegrams with a high level of detail according to the EN 13757-4 specification and then schedule them for periodic transmission. The evaluation of the data generator is done in real scenario by using previously developed prototype equipped with IQRF TR72DA communication module acting as a smart meter with implemented software framework. As a result, the evaluation of communication distance between the developed Wireless MBus prototype and commercial gateway was tested in case of indoor scenario at Brno University of Technology, Faculty of Electrical Engineering and Communication.
EXT="Stusek, Martin"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Narrowband IoT (NB-IoT) stands for a radio access technology standardized by the 3GPP organization in Release 13 to enable a large set of use-cases for massive Machine-type Communications (mMTCs). Compared to legacy human-oriented 4G (LTE) communication systems, NB-IoT has game-changing features in terms of extended coverage, enhanced power saving modes, and a reduced set of available functionality. At the end of the day, these features allow for connectivity of devices in challenging positions, enabling long battery life and reducing device complexity. This article addresses the development of the universal testing device for delay-tolerant services allowing for in-depth verification of NB-IoT communication parameters. The presented outputs build upon our long-term cooperation with the Vodafone Czech Republic a.s. company.
EXT="Stusek, Martin"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper we discuss how does the input magnitude data setting influence the behavior of error-reduction algorithm in the case of the one-dimensional discrete phase retrieval problem. We present experimental results related to the convergence or stagnation of the algorithm. We also discuss the issue of the zeros distribution of the solution, when the solution of the problem exists.
EXT="Rusu, Corneliu"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
360° videos are increasingly used for media and entertainment, but the best practices for editing them are not yet well established. In this paper, we present a study in which we investigated the user experience of 360° music videos viewed on computer monitor and VR goggles. The research was conducted in the form of a laboratory experiment with 20 test participants. During the within-subject study, participants watched and evaluated four versions of the same 360° music video with a different cutting rate. Based on the results, an average cutting rate of 26 seconds delivered the highest-quality user experience both for computer monitor and VR goggles. The cutting rate matched with participants' mental models, and there was enough time to explore the environment without getting bored. Faster cutting rates made the users nervous, and a video consisting of a single shot was considered to be too static and boring.
jufoid=58079
EXT="Holm, Jukka"
INT=comp,"Remans, Mohammad Mushfiqur Rahman"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes a low algorithmic latency adaptation of the deep clustering approach to speaker-independent speech separation. It consists of three parts: a) the usage of long-short-term-memory (LSTM) networks instead of their bidirectional variant used in the original work, b) using a short synthesis window (here 8 ms) required for low-latency operation, and, c) using a buffer in the beginning of audio mixture to estimate cluster centres corresponding to constituent speakers which are then utilized to separate speakers within the rest of the signal. The buffer duration would serve as an initialization phase after which the system is capable of operating with 8 ms algorithmic latency. We evaluate our proposed approach on two-speaker mixtures from Wall Street Journal (WSJ0) corpus. We observe that the use of LSTM yields around one dB lower SDR as compared to the baseline bidirectional LSTM in terms of source to distortion ratio (SDR). Moreover, using an 8 ms synthesis window instead of 32 ms degrades the separation performance by around 2.1 dB as compared to the baseline. Finally, we also report separation performance with different buffer durations noting that separation can be achieved even for buffer duration as low as 300 ms.
int=comp,"Wang, Shanshan"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The Time Difference of Arrival (TDoA) of a sound wavefront impinging on a microphone pair carries spatial information about the source. However, captured speech typically contains dynamic non-speech interference sources and noise. Therefore, the TDoA estimates fluctuate between speech and interference. Deep Neural Networks (DNNs) have been applied for Time-Frequency (TF) masking for Acoustic Source Localization (ASL) to filter out non-speech components from a speaker location likelihood function. However, the type of TF mask for this task is not obvious. Secondly, the DNN should estimate the TDoA values, but existing solutions estimate the TF mask instead. To overcome these issues, a direct formulation of the TF masking as a part of a DNN-based ASL structure is proposed. Furthermore, the proposed network operates in an online manner, i.e., producing estimates frame-by-frame. Combined with the use of recurrent layers it exploits the sequential progression of speaker related TDoAs. Training with different microphone spacings allows model re-use for different microphone pair geometries in inference. Real-data experiments with smartphone recordings of speech in interference demonstrate the network's generalization capability.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The energy efficiency of modern MPSoCs is enhanced by complex hardware features such as Dynamic Voltage and Frequency Scaling (DVFS) and Dynamic Power Management (DPM). This paper introduces a new method, based on convex problem solving, that determines the most energy efficient operating point in terms of frequency and number of active cores in an MPSoC. The solution can challenge the popular approaches based on never-idle (or As-Slow-As-Possible (ASAP)) and race-to-idle (or As-Fast-As-Possible (AFAP)) principles. Experimental data are reported using a Samsung Exynos 5410 MPSoC and show a reduction in energy of up to 27 % when compared to ASAP and AFAP.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Developing accurate financial analysis tools can be useful both for speculative trading, as well as for analyzing the behavior of markets and promptly responding to unstable conditions ensuring the smooth operation of the financial markets. This led to the development of various methods for analyzing and forecasting the behaviour of financial assets, ranging from traditional quantitative finance to more modern machine learning approaches. However, the volatile and unstable behavior of financial markets forbids the accurate prediction of future prices, reducing the performance of these approaches. In contrast, in this paper we propose a novel price trailing method that goes beyond traditional price forecasting by reformulating trading as a control problem, effectively overcoming the aforementioned limitations. The proposed method leads to developing robust agents that can withstand large amounts of noise, while still capturing the price trends and allowing for taking profitable decisions.
EXT="Tefas, Anastasios"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
1D Convolutional Neural Networks (CNNs) have recently become the state-of-the-art technique for crucial signal processing applications such as patient-specific ECG classification, structural health monitoring, anomaly detection in power electronics circuitry and motor-fault detection. This is an expected outcome as there are numerous advantages of using an adaptive and compact 1D CNN instead of a conventional (2D) deep counterparts. First of all, compact 1D CNNs can be efficiently trained with a limited dataset of 1D signals while the 2D deep CNNs, besides requiring 1D to 2D data transformation, usually need datasets with massive size, e.g., in the »Big Data» scale in order to prevent the well-known »overfitting» problem. 1D CNNs can directly be applied to the raw signal (e.g., current, voltage, vibration, etc.) without requiring any pre- or post-processing such as feature extraction, selection, dimension reduction, denoising, etc. Furthermore, due to the simple and compact configuration of such adaptive 1D CNNs that perform only linear 1D convolutions (scalar multiplications and additions), a real-time and low-cost hardware implementation is feasible. This paper reviews the major signal processing applications of compact 1D CNNs with a brief theoretical background. We will present their state-of-the-art performances and conclude with focusing on some major properties. Keywords - 1-D CNNs, Biomedical Signal Processing, SHM.
EXT="Kiranyaz, Serkan"
EXT="Ince, Turker"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Forecasting time series has several applications in various domains. The vast amount of data that are available nowadays provide the opportunity to use powerful deep learning approaches, but at the same time pose significant challenges of high-dimensionality, velocity and variety. In this paper, a novel logistic formulation of the well-known Bag-of-Features model is proposed to tackle these challenges. The proposed method is combined with deep convolutional feature extractors and is capable of accurately modeling the temporal behavior of time series, forming powerful forecasting models that can be trained in an end-to-end fashion. The proposed method was extensively evaluated using a large-scale financial time series dataset, that consists of more than 4 million limit orders, outperforming other competitive methods.
EXT="Tefas, Anastasios"
EXT="Iosifidis, Alexandros"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Block partition structure is a critical module in video coding scheme to achieve significant gap of compression performance. Under the exploration of future video coding standard by the Joint Video Exploration Team (JVET), named Versatile Video Coding (VVC), a new Quad Tree Binary Tree (QTBT) block partition structure has been introduced. In addition to the QT block partitioning defined by High Efficiency Video Coding (HEVC) standard, new horizontal and vertical BT partitions are enabled, which drastically increases the encoding time compared to HEVC. In this paper, we propose a fast QTBT partitioning scheme based on a Machine Learning approach. Complementary to techniques proposed in literature to reduce the complexity of HEVC Quad Tree (QT) partitioning, the propose solution uses Random Forest classifiers to determine for each block which partition modes between QT and BT is more likely to be selected. Using uncertainty zones of classifier decisions, the proposed complexity reduction technique is able to reduce in average by 30% the encoding time of JEM-v7.0 software in Random Access configuration with only 0.57% Bjontegaard Delta Rate (BD-BR) increase.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In recent years, several studies have established a relationship between mammographic parenchymal patterns and breast cancer risk. However, there is a lack of publicly available data and software for objective comparison and clinical validation. This paper presents an open and adaptable implementation (OpenBreast v1.0) of a fully-Automatic computerized framework for mammographic image analysis for breast cancer risk assessment. OpenBreast implements mammographic image analysis in four stages: breast segmentation, detection of region-of-interests, feature extraction and risk scoring. For each stage, we provide implementations of several state-of-The-Art methods. The pipeline is tested on a set of 305 full-field digital mammography images corresponding to 84 patients (51 cases and 49 controls) from the breast cancer digital repository (BCDR). OpenBreast achieves a competitive AUC of 0.846 in breast cancer risk assessment. In addition, used jointly with widely accepted risk factors such as patient age and breast density, mammographic image analysis using OpenBreast shows a statistically significant improvement in performance with an AUC of 0.876 (\mathrm{p}<0.001). Our framework will be made publicly available and it is easy to incorporate new methods.
EXT="Pertuz, Said"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Photonic neuromorphic hardware can provide significant performance benefits for Deep Learning (DL) applications by accelerating and reducing the energy requirements of DL models. However, photonic neuromorphic architectures employ different activation elements than those traditionally used in DL, slowing down the convergence of the training process for such architectures. An initialization scheme that can be used to efficiently train deep photonic networks that employ quadratic sinusoidal activation functions is proposed in this paper. The proposed initialization scheme can overcome these limitations, leading to faster and more stable training of deep photonic neural networks. The ability of the proposed method to improve the convergence of the training process is experimentally demonstrated using two different DL architectures and two datasets.
EXT="Tefas, Anastasios"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Ground Penetrating Radar (GPR) is generally used as a non-destructive method of inspection for structures and for finding defects in concrete slabs. In this paper, GPR is used in the detection of water inside the cavities of concrete hollow core slabs. We propose an algorithm that determines the water level inside the concrete slab by analyzing the time delays of the reflections originating from inside the cavity. The algorithm is based on utilizing prior knowledge about the geometry of the hollow core slab. The presence of water was successfully detected and an estimate for the height of the water surface was obtained with a GPR system operating with a central frequency of 2.7 GHz. Based on the experiments, the proposed method holds promise in providing a robust and accurate method for the detection of water inside the concrete slabs. Results, possible future research and analysis of the feasibility of GPR systems in water detection are presented and discussed.
jufoid=57477
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Terahertz (THz) band communications, capable of achieving the theoretical capacity of up to several terabits-per-second, are one of the attractive enablers for beyond 5G wireless networks. THz systems will use extremely directional narrow beams, allowing not only to extend the communication range but also to partially secure the data already at the physical layer. The reason is that, in most cases, the Attacker has to be located within the transmitter beam in order to eavesdrop the message. However, even the use of very narrow beams results in the considerably large area around the receiver, where the Attacker can capture all the data. In this paper, we study how to decrease the message eavesdropping probability by leveraging the inherent multi-path nature of the THz communications. We particularly propose sharing the data transmission over multiple THz propagation paths currently available between the communicating entities. We show that, at a cost of the slightly reduced link capacity, the message eavesdropping probability in the described scheme decreases significantly even when several Attackers operate in a cooperative manner. The proposed solution can be utilized for the transmission of the sensitive data, as well as to secure the key exchange in THz band networks beyond 5G.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The intermittent millimeter-wave radio links as a result of human-body blockage are an inherent feature of the 5G New Radio (NR) technology by 3GPP. To improve session continuity in these emerging systems, two mechanisms have recently been proposed, namely, multi-connectivity and guard bandwidth. The former allows to establish multiple spatially-diverse connections and switch between them dynamically, while the latter reserves a fraction of system bandwidth for sessions changing their state from non-blocked to blocked, which ensures that the ongoing sessions have priority over the new ones. In this paper, we assess the joint performance of these two schemes for the user- and system-centric metrics of interest. Our numerical results reveal that the multi-connectivity operation alone may not suffice to increase the ongoing session drop probability considerably. On the other hand, the use of guard bandwidth significantly improves session continuity by somewhat compromising new session drop probability and system resource utilization. Surprisingly, the 5G NR system implementing both these techniques inherits their drawbacks. However, complementing it with an initial AP selection procedure effectively alleviates these limitations by maximizing the system resource utilization, while still providing sufficient flexibility to enable the desired trade-off between new and ongoing session drop probabilities.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper investigates the performance of energy detection-based spectrum sensing over Fisher-Snedecor F fading channels. To this end, an analytical expression for the corre- sponding average detection probability is firstly derived and then this is extended to account for collaborative spectrum sensing. The complementary receiver operating characteristics (ROC) are analyzed for different conditions of the average signal-to- noise ratio (SNR), time-bandwidth product, multipath fading, shadowing and number of collaborating users. It is shown that the energy detection performance is strongly linked to the severity of the multipath fading and amount of shadowing, whereby even small variations in either of these physical phenomena significantly impact the detection probability. Also, the versatile modeling capability of the Fisher-Snedecor F distribution is veridfied in the context of energy detection based spectrum sensing as it provides considerably more accurate characterization than the conventional Rayleigh fading model. To confirm the validity of the analytical results presented in this paper, we compare them with the results of some simulations.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Markov Decision Processes (MDPs) provide important capabilities for facilitating the dynamic adaptation of hardware and software configurations to the environments in which they operate. However, the use of MDPs in embedded signal processing systems is limited because of the large computational demands for solving this class of system models. This paper presents Sparse Parallel Value Iteration (SPVI), a new algorithm for solving large MDPs on resource-constrained embedded systems that are equipped with mobile GPUs. SPVI leverages recent advances in parallel solving of MDPs and adds sparse linear algebra techniques to significantly outperform the state-of-the-art. The method and its application are described in detail, and demonstrated with case studies that are implemented on an NVIDIA Tegra K1 System On Chip (SoC). The experimental results show execution time improvements in the range of 65 % -78% for several applications. SPVI also lifts restrictions required by other MDP solver approaches, making it more widely compatible with large classes of optimization problems.
jufoid=71852
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes an active learning method to control a labeling process for efficient annotation of acoustic training material, which is used for training sound event classifiers. The proposed method performs K-medoids clustering over an initially unlabeled dataset, and medoids as local representatives, are presented to an annotator for manual annotation. The annotated label on a medoid propagates to other samples in its cluster for label prediction. After annotating the medoids, the annotation continues to the unexamined sounds with mismatched prediction results from two classifiers, a nearest-neighbor classifier and a model-based classifier, both trained with annotated data. The annotation on the segments with mismatched predictions are ordered by the distance to the nearest annotated sample, farthest first. The evaluation is made on a public environmental sound dataset. The labels obtained through a labeling process controlled by the proposed method are used to train a classifier, using supervised learning. Only 20% of the data needs to be manually annotated with the proposed method, to achieve the accuracy with all the data annotated. In addition, the proposed method clearly outperforms other active learning algorithms proposed for sound event classification through all the experiments, simulating varying fraction of data that is manually labeled.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
To detect the class, and start and end times of sound events in real world recordings is a challenging task. Current computer systems often show relatively high frame-wise accuracy but low event-wise accuracy. In this paper, we attempted to merge the gap by explicitly including sequential information to improve the performance of a state-of-the-art polyphonic sound event detection system. We propose to 1) use delayed predictions of event activities as additional input features that are fed back to the neural network; 2) build N-grams to model the co-occurrence probabilities of different events; 3) use se-quentialloss to train neural networks. Our experiments on a corpus of real world recordings show that the N-grams could smooth the spiky output of a state-of-the-art neural network system, and improve both the frame-wise and the event-wise metrics.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We present an overview of the challenge entries for the Acoustic Scene Classification task of DCASE 2017 Challenge. Being the most popular task of the challenge, acoustic scene classification entries provide a wide variety of approaches for comparison, with a wide performance gap from top to bottom. Analysis of the submissions confirms once more the popularity of deep-learning approaches and mel frequency representations. Statistical analysis indicates that the top ranked system performed significantly better than the others, and that combinations of top systems are capable of reaching close to perfect performance on the given data.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents a low-latency neural network based speech enhancement system. Low-latency operation is critical for speech communication applications. The system uses the time-frequency (TF) masking approach to retain speech and remove the non-speech content from the observed signal. The ideal TF mask are obtained by supervised training of neural networks. As the main contribution different neural network models are experimentally compared to investigate computational complexity and speech enhancement performance. The proposed system is trained and tested on noisy speech data where signal-to-noise ratio (SNR) ranges from -5 dB to +5 dB and the results show significant reduction of non-speech content in the resulting signal while still meeting a low-latency operation criterion, which is here considered to be less than 20 ms.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Mean square error (MSE) has been the preferred choice as loss function in the current deep neural network (DNN) based speech separation techniques. In this paper, we propose a new cost function with the aim of optimizing the extended short time objective intelligibility (ESTOI) measure. We focus on applications where low algorithmic latency (≤ 10 ms) is important. We use long short-term memory networks (LSTM) and evaluate our proposed approach on four sets of two-speaker mixtures from extended Danish hearing in noise (HINT) dataset. We show that the proposed loss function can offer improved or at par objective intelligibility (in terms of ESTOI) compared to an MSE optimized baseline while resulting in lower objective separation performance (in terms of the source to distortion ratio (SDR)). We then proceed to propose an approach where the network is first initialized with weights optimized for MSE criterion and then trained with the proposed ESTOI loss criterion. This approach mitigates some of the losses in objective separation performance while preserving the gains in objective intelligibility.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes a novel method for separation of sound sources with ambisonic signals using multichannel non-negative matrix factorization (MNMF) for source spectrogram estimation. We present a novel frequency-independent spatial covariance matrix (SCM) model for spherical harmonic (SH) domain signals which makes the MNMF parameter estimation framework computationally feasible up to 3rd order SH signals. The evaluation is done with simulated SH domain mixtures by measuring the separation performance using objective criteria and comparing the proposed method against SH domain beamforming. The proposed method improves average separation performance over beamforming with post-filtering when using 1st and 2nd order SH signals while at higher orders performance among all tested methods is similar.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Parkinson's disease (PD) is a degenerative and long-term disorder of the central nervous system, which often causes motor symptoms, e.g., tremor, rigidity, and slowness. Currently, the diagnosis of PD is based on patient history and clinical examination. Technology-derived decision support systems utilizing, for example, sensor-rich smartphones can facilitate more accurate PD diagnosis. These technologies could provide less obtrusive and more comfortable remote symptom monitoring. The recent studies showed that motor symptoms of PD can reliably be detected from data gathered via smartphones. The current study utilized an open-access dataset named 'mPower' to assess the feasibility of discriminating PD from non-PD by analyzing a single self-administered 20-step walking test. From this dataset, 1237 subjects (616 had PD) who were age and gender matched were selected and classified into PD and non-PD categories. Linear acceleration (ACC) and gyroscope (GYRO) were recorded by built-in sensors of smartphones. Walking bouts were extracted by thresholding signal magnitude area of the ACC signals. Features were computed from both ACC and GYRO signals and fed into a random forest classifier of size 128 trees. The classifier was evaluated deploying 100-fold cross-validation and provided an accumulated accuracy rate of 0.7 after 10k validations. The results show that PD and non-PD patients can be separated based on a single short-lasting self-administered walking test gathered by smartphones' built-in inertial measurement units.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Postural Instability (PI) is a major reason for fall in geriatric population as well as for people with diseases or disorders like Parkinson's, stroke etc. Conventional stability indicators like Berg Balance Scale (BBS) require clinical settings with skilled personnel's interventions to detect PI and finally classify the person into low, mid or high fall risk categories. Moreover these tests demand a number of functional tasks to be performed by the patient for proper assessment. In this paper a machine learning based approach is developed to determine fall risk with minimal human intervention using only Single Limb Stance exercise. The analysis is done based on the spatiotemporal dynamics of skeleton joint positions obtained from Kinect sensor. A novel posture modeling method has been applied for feature extraction along with some traditional time domain and metadata features to successfully predict the fall risk category. The proposed unobstrusive, affordable system is tested over 224 subjects and is able to achieve 75% mean accuracy on the geriatric and patient population.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Atrial fibrillation (AF) is the most common type of cardiac arrhythmia. Although not life-threatening itself, AF significantly increases the risk of stroke and myocardial infarction. Current tools available for screening and monitoring of AF are inadequate and an unobtrusive alternative, suitable for long-term use, is needed. This paper evaluates an atrial fibrillation detection algorithm based on wrist photoplethysmographic (PPG) signals. 29 patients recovering from surgery in the post-anesthesia care unit were monitored. 15 patients had sinus rhythm (SR, 67.5± 10.7 years old, 7 female) and 14 patients had AF (74.8± 8.3 years old, 8 female) during the recordings. Inter-beat intervals (IBI) were estimated from PPG signals. As IBI estimation is highly sensitive to motion or other types of noise, acceleration signals and PPG waveforms were used to automatically detect and discard unreliable IBI. AF was detected from windows of 20 consecutive IBI with 98.45±6.89% sensitivity and 99.13±1.79% specificity for 76.34±19.54% of the time. For the remaining time, no decision was taken due to the lack of reliable IBI. The results show that wrist PPG is suitable for long term monitoring and AF screening. In addition, this technique provides a more comfortable alternative to ECG devices.
INT=tut-bmt, "Yousefi, Zeinab Rezaei"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes a method for online estimation of time-varying room impulse responses (RIR) between multiple isolated sound sources and a far-field mixture. The algorithm is formulated as adaptive convolutive filtering in short-time Fourier transform (STFT) domain. We use the recursive least squares (RLS) algorithm for estimating the filter parameters due to its fast convergence rate, which is required for modeling rapidly changing RIRs of moving sound sources. The proposed method allows separation of reverberated sources from the far-field mixture given that their close-field signals are available. The evaluation is based on measuring unmixing performance (removal of reverberated source) using objective separation criteria calculated between the ground truth recording of the preserved sources and the unmixing result obtained with the proposed algorithm. We compare online and offline formulations for the RIR estimation and also provide evaluation with blind source separation algorithm only operating on the mixture signal.
jufoid=57409
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
There is an emerging interest aiming at defining principles for signals on general graphs, which are analogous to the basic principles in traditional signal processing. One example is the Graph Fourier Transform which aims at decomposing a graph signal into its components based on a set of basis functions with corresponding graph frequencies. It has been observed that most of the important information of a graph signal is contained inside the low frequency band, which leads to several applications such as denoising, compression, etc. In this paper, we show that the low frequency basis functions span the salient regions in an image, which can also be considered as important regions. Motivated by this, we present a novel simple and unsupervised method to utilize a number of low-energy basis functions and show that it improves the performance of seven state-of-the-art salient object detection methods in five datasets under four different evaluation criteria, with only minor exceptions.
jufoid=57409
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper studies the problem of full reference visual quality assessment of denoised images with a special emphasis on images with low contrast and noise-like texture. Denoising of such images together with noise removal often results in image details loss or smoothing. A new test image database, FLT, containing 75 noise-free 'reference' images and 300 filtered ('distorted') images is developed. Each reference image, corrupted by an additive white Gaussian noise, is denoised by the BM3D filter with four different values of threshold parameter (four levels of noise suppression). After carrying out a perceptual quality assessment of distorted images, the mean opinion scores (MOS) are obtained and compared with the values of known full reference quality metrics. As a result, the Spearman Rank Order Correlation Coefficient (SROCC) between PSNR values and MOS has a value close to zero, and SROCC between values of known full-reference image visual quality metrics and MOS does not exceed 0.82 (which is reached by a new visual quality metric proposed in this paper). The FLT dataset is more complex than earlier datasets used for assessment of visual quality for image denoising. Thus, it can be effectively used to design new image visual quality metrics for image denoising.
EXT="Lukin, Vladimir"
JUFOID=57409
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we propose a joint framework for target localization and classification using a single generalized model for non-imaging based multi-modal sensor data. For target localization, we exploit both sensor data and estimated dynamics within a local neighborhood. We validate the capabilities of our framework by using a multi-modal dataset, which includes ground truth GPS information (e.g., time and position) and data from co-located seismic and acoustic sensors. Experimental results show that our framework achieves better classification accuracy compared to recent fusion algorithms using temporal accumulation and achieves more accurate target localizations than multilateration.
JUFOID=57409
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Due to the increased popularity of augmented and virtual reality experiences, the interest in representing the real world in an immersive fashion has never been higher. Distributing such representations enables users all over the world to freely navigate in never seen before media experiences. Unfortunately, such representations require a large amount of data, not feasible for transmission on today's networks. Thus, efficient compression technologies are in high demand. This paper proposes an approach to compress 3D video data utilizing 2D video coding technology. The proposed solution was developed to address the needs of 'tele-immersive' applications, such as virtual (VR), augmented (AR) or mixed (MR) reality with Six Degrees of Freedom (6DoF) capabilities. Volumetric video data is projected on 2D image planes and compressed using standard 2D video coding solutions. A key benefit of this approach is its compatibility with readily available 2D video coding infrastructure. Furthermore, objective and subjective evaluation shows significant improvement in coding efficiency over reference technology.
INT=sgn,"Sheikhi-Pour, Nahid"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we exploit the 3D-beamforming features of multiantenna equipment employed in fifth generation (5G) networks, operating in the millimeter wave (mmW) band, for accurate positioning and tracking of users. We consider sequential estimation of users' positions, and propose a two-stage extended Kalman filter (EKF) that is based on reference signal received power (RSRP) measurements. In particular, beamformed downlink (DL) reference signals (RSs) are transmitted by multiple base stations (BSs) and measured by user equipments (UEs) employing receive beamforming. The so-obtained beam-RSRP (BRSRP) measurements are reported to the BSs where the corresponding directions of departure (DoDs) are sequentially estimated by a novel EKF. Such angle estimates from multiple BSs are subsequently fused on a central entity into 3D position estimates of UEs by means of another (second-stage) EKF. The proposed positioning scheme is scalable since the computational burden is shared among different network entities, namely transmission/reception points (TRPs) and 5G-NR Node B (gNB), and may be accomplished with the signalling currently specified for 5G. We assess the performance of the proposed algorithm on a realistic outdoor 5G deployment with a detailed ray tracing propagation model based on the METIS Madrid map. Numerical results with a system operating at 39 GHz show that sub-meter 3D positioning accuracy is achievable in future mmW 5G networks.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Building a complete inertial navigation system using the limited quality data provided by current smartphones has been regarded challenging, if not impossible. This paper shows that by careful crafting and accounting for the weak information in the sensor samples, smartphones are capable of pure inertial navigation. We present a probabilistic approach for orientation and use-case free inertial odometry, which is based on double-integrating rotated accelerations. The strength of the model is in learning additive and multiplicative IMU biases online. We are able to track the phone position, velocity, and pose in realtime and in a computationally lightweight fashion by solving the inference with an extended Kalman filter. The information fusion is completed with zero-velocity updates (if the phone remains stationary), altitude correction from barometric pressure readings (if available), and pseudo-updates constraining the momentary speed. We demonstrate our approach using an iPad and iPhone in several indoor dead-reckoning applications and in a measurement tool setup.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents an algorithm for multiple source localization using a beamforming-inspired spatial covariance model (SCM) and complex non-negative matrix factorization (CNMF). In this work, we assume that the source signals are known in advance whereas the mixing filter is modeled by the weighted sum of direction of arrival (DOA) kernels which encode the phase and the amplitude differences between microphones for every possible source direction. The direction of arrival (i.e. azimuth and elevation) for each source is estimated using CNMF. The proposed system is evaluated for DOA estimation task using two datasets covering a large number of configurations (number of channels, number of simultaneous sources, reverberation time, microphones spacing, source types and angular positions of the sources). Finally, a comparison to other state-of-the-art methods is performed, showing the robustness of the proposed method.
EXT="Carabias-Orti, J. J."
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Many approaches to compressive video recovery proceed iteratively, treating the difference between the previous estimate and the ideal video as residual noise to be filtered. We go beyond the common white-noise modeling by adaptively modeling the residual as stationary spatiotemporally correlated noise. This adaptive noise model is updated at each iteration and is highly anisotropic in space and time; we leverage it with respect to the transform spectra of a motion-compensated video denoiser. Experimental results demonstrate that our proposed adaptive correlated noise model outperforms state-of-the-art methods both quantitatively and qualitatively.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
To meet the capacity demand of future cellular networks, which is expected to increase substantially by 2020, using mm Wave band has been suggested. Although this higher frequency band has larger amounts of spectrum and will lead to much higher capacity in 5G networks, it has some propagation constraints such as limited coverage and sensitivity to Line of Sight (LoS) blockage. A possible solution to these issues could be small cell densification combined with utilizing access points carried by Unmanned Aerial Vehicles (UAVs). This paper presents a performance evaluation of the UAV-assisted mmWave network in urban environments using ns-3 simulations.
INT=elt,"Khosravi, Zeinab"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Data clustering is a fundamental machine learning problem. Community structure is common in social and biological networks. In this article we propose a novel data clustering algorithm that uses this phenomenon in mutual k - nearest neighbor (MKNN) graph constructed from the input dataset. We use the authentic scores-a metric that measures the strength of an edge in a social network graph-to rank all the edges in the MKNN graph. By removing the edges gradually in the order of their authentic scores, we collapse the MKNN graph into components to find the clusters. The proposed method has two major advantages comparing to other popular data clustering algorithms. First, it is robust to the noise in the data. Second, it finds clusters of arbitrary shape. We evaluated our algorithm on synthetic noisy datasets, synthetic 2D datasets and real-world image datasets. Results on the noisy datasets show that the proposed algorithm clearly outperforms the competing algorithms in terms of Normalized Mutual Information (NMI) scores. The proposed algorithm is the only one that does not fail on any data in the the synthetic 2D dataset, which are specifically designed to show the limitations of the clustering algorithms. When testing on the real-world image datasets, the best NMI scores achieved by the proposed algorithm is more than any other competing algorithm. The proposed algorithm has computational complexity of O(k3n+kn\log (kn)) and space complexity of O(kn), which is better than or equivalent to the most popular clustering algorithms.
EXT="Kiranyaz, Serkan"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents a method for improving the accuracy of extended GNSS satellite orbit predictions with convolutional neural networks (CNN). Satellite orbit predictions are used in self-assisted GNSS to reduce the Time to First Fix of a satellite positioning device. We describe the models we use to predict the satellite orbit and present the improvement method that uses CNN. The CNN estimates future prediction errors of our model and these estimates are used to correct our orbit predictions. We also describe how the neural network can be implemented into our prediction algorithm. In tests with GPS and BeiDou data, the method significantly improves orbit prediction accuracy. For example, the 68% error quantile of 7 day orbit prediction errors of GPS satellites was reduced by 45% on average.
INT=mat,"Jaakko Pihlajasalo"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We have focused our paper on the aspects important in adapting an Information System (IS) to the user's cultural background. We are interested both in the factors related to IS development and in the use of IS. Increasingly, ISs are being developed and used in a global context. We have perceived differences in expectations of functionalities, architecture, structural properties, information search practices, web-based system properties, and user interfaces. One conclusion would be that a high quality IS reflects user behavior in its use context. In that case, the system has to model its user one way or another. Until now, the topic has been handled without meaningful effort to model user behavior. Current publications cover a wide variety of rules on how to take into account cultural differences in the IS context. In this paper, our aim is to study the current state-of-the-art of user modeling - modeling the human being as an IS user. We start with general aspects related to the role of the user in IS development and alternatives to adaptable systems. The findings are applicable in the educational context as well. More and more, the use of computers and ISs is becoming an essential part of studies: the use of MOOCs (Massively Open Online Courses) as a part or replacement for traditional face-to-face classes; flipped learning methodology emphasizing the significance of self-learning; and blended learning, including quite often computerized study content. Our focus is on the global context, in which students represent different cultures and the IS is globally available.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Close to 100% employment of students and easy access to abundance of information on Internet has essentially changed student's learning practices and their earlier knowledge background, especially on rapidly progressing field of Software Engineering. On workplace they have to use technologies, which are used in practice of their employing enterprise, but often do not understand the scientific and/or technological principles on which these technologies are based. They seek explanations on Internet, but information on Internet is often low quality, one-sided and presented with business targets on mind - to get more users to technologies developed and sold by a business enterprise. Thus university has to explain basic principles of technologies what students already know and have used and correct some popular beliefs, which are supported by software vendors and based their business interests. Non-formal sources of knowledge - workplace training and Internet - do not reduce teacher's task, but force teachers constantly study all new which appears in this field, thus increase teachers workload. Students increasing use of non-formal sources of knowledge imply need for flipping the process - instead of teaching students are set to learn from provided detailed tutorials. Use of Internet and work has made self-study, seeking information from Internet sources very customary for current students, thus such flipping worked very well in a game programming course provided by the first author.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Originating from civilian/commercial wireless networks, the progressive concept of same frequency simultaneous transmission and reception (SF-STAR), a.k.a. in-band full-duplex operation, has high potential also at the future battlefield. The prospects of a military full-duplex radio (MFDR) are not limited to enhancing the spectral efficiency of tactical communications, which would already be a significant advancement considering the universal congestion of electromagnetic spectrum. Perhaps even more importantly, armed forces could gain a major technical advantage by employing multifunction MFDRs that are capable of jointly conducting signals intelligence, electronic warfare, and tactical communications owing to their SF-STAR capability. This study focuses on one specific promising application, where a radio transceiver performs spectrum monitoring and signal surveillance for potential hostile transmissions when simultaneously performing an electronic attack against opposing forces' receivers at the same frequency band. In particular, we demonstrate by experiments in a laboratory environment that the MFDR technology can be successfully used for detecting an attempt to control remotely an improvised explosive device while also preventing its activation by transmitting a jamming signal.
INT=elt,"Turunen, Matias"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we propose a new regularization scheme for the well-known Support Vector Machine (SVM) classifier that operates on the training sample level. The proposed approach is motivated by the fact that Maximum Margin-based classification defines decision functions as a linear combination of the selected training data and, thus, the variations on training sample selection directly affect generalization performance. We show that the exploitation of the proposed regularization scheme is well motivated and intuitive. Experimental results show that the proposed regularization scheme outperforms standard SVM in human action recognition tasks as well as classical recognition problems.
INT=sgn,"Tran, Dat Thanh"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The paper proposes an image segmentation method for lossless compression of plenoptic images. Each light-field image captured by the plenoptic camera is processed to obtain a stack of subaperture images. Each subaperture image is encoded by using a gradient-base detector which classifies the image edges and designs refined contexts for an improved prediction and segmentation. The paper's main contribution is a new segmentation method which generates a preliminary segmentation, either by scaling the intensity differences or by using a quantum cut based algorithm, and merges it with an edge ranking-based segmentation. The results show around 2% improved performance compared to the state-of-the-art for a dataset of 118 plenoptic images.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, a computational color constancy method is proposed via estimating the illuminant chromaticity in a scene by pooling from many local estimates. To this end, first, for each image in a dataset, we form an image pyramid consisting of several scales of the original image. Next, local patches of certain size are extracted from each scale in this image pyramid. Then, a convolutional neural network is trained to estimate the illuminant chromaticity per-patch. Finally, two more consecutive trainings are conducted, where the estimation is made per-image via taking the mean (1st training) and median (2nd training) of local estimates. The proposed method is shown to outperform the state-of-the-art in a widely used color constancy dataset.
jufoid=57423
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Traditional challenges for deploying end-to-end streaming systems are made harder when considering 360-degree media content. One of these challenges relates to the lack of commonly accepted standardized methodologies for subjective 360-degree video quality assessment, especially oriented towards streaming services. The contribution of this paper falls in the area of subjective assessment of 360-degree video. - From traditional standardized test methodologies originally designed for 2D/3D video, we tailored a methodology more oriented towards Virtual Reality (VR) streaming services. The methodology inherits a lot from existing ITU standards for video subjective quality evaluation. The additions incorporate the special properties of 360-video, namely omnidirectionality, as opposed to traditional video. - With this goal in mind, a new metric called Similarity Ring Metric (SRM) is introduced. It measures the degree of similarity in watching patterns of a single subject or between different subjects for several subjective assessment tests. This metric enables an inclusion or rejection criteria for test results in subjective assessment sessions. We also present visual fatigue results related to a subjective quality experiment of 360-degree video.
EXT="Curcio, Igor D.D."
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we present a new software tool, called HTGS Model-based Engine (HMBE), for the design and implementation of multicore signal processing applications. HMBE provides complementary capabilities to HTGS (Hybrid Task Graph Scheduler), which is a recently-introduced software tool for implementing scalable workflows for high performance computing applications. HMBE integrates advanced design optimization techniques provided in HTGS with model-based approaches that are founded on dataflow principles. Such integration contributes to (a) making the application of HTGS more systematic and less time consuming, (b) incorporating additional dataflow-based optimization capabilities with HTGS optimizations, and (c) automating significant parts of the HTGS-based design process. In this paper, we present HMBE with an emphasis on novel dynamic scheduling techniques that are developed as part of the tool. We demonstrate the utility of HMBE through a case study involving an image stitching application for large scale microscopy images.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Detection of targets using low power embedded devices has important applications in border security and surveillance. In this paper, we build on recent algorithmic advances in sensor fusion, and present the design and implementation of a novel, multi-mode embedded signal processing system for detection of people and vehicles using acoustic and seismic sensors. Here, by "multi-mode", we mean that the system has available a complementary set of configurations that are optimized for different trade-offs. The multimode capability delivered by the proposed system is useful to supporting long lifetime (long term, energy-efficient "standby" operation), while also supporting optimized accuracy during critical time periods (e.g., when a potential threat is detected). In our target detection system, we apply a strategically-configured suite of single-and dual-modality signal processing techniques together with dataflow-based design optimization for energyefficient, real-time implementation. Through experiments using a Raspberry Pi platform, we demonstrate the capability of our target detection system to provide efficient operational tradeoffs among detection accuracy, energy efficiency, and processing speed.
jufoid=55867
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The goal of block matching is to find small parts (blocks) of an image that are similar to a given pattern (template). A lot of full search (FS) equivalent algorithms are based on transforms. However, the template size is limited to be a power-of-two. In this paper, we consider a fast block matching algorithm based on orthonormal tree-structured Haar transform (OTSHT) which makes it possible to use a template with arbitrary size. We evaluated the pruning performance, computational complexity, and design of tree. The pruning performance is compared to the algorithm based on orthonormal Haar transform (OHT).
jufoid=57665
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
High resolution disparity images are stored in floating point raw files, where the number of bits per pixel is typically 32, although the number of used bits when converted to a fixed point representation is lower, e.g., between 24 and 26 in the dataset used in our experiments. In order to compress images with such high dynamic range, the bitplanes of the original image are combined into integer images with at most 16 bits, for which readily existing compressors are available. We introduce first a context predictive compressor (CPC) which can operate on integer images having more than 16 bits. The proposed overall compression scheme uses a first revertible linear transformation of the image as a first decorrelation process, and then splits the transformed image into integer images with smaller dynamic range, which are finally encoded. We experiment with schemes of split-into-2 and split-into-3, with combinations of several existing compressors for the integer image components and show that the newly introduced CPC operating over the least significant bitplanes combined with CERV operating over the most significant bitplanes achieves always the best compression, with final lossless compressed results of between 8 and 12 bits per pixel.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The iterative methods are well-known approaches to solve the one-dimensional phase retrieval problem. Amongst them, the error-reduction algorithm is often used since it can easily implement support constraints. Unfortunately this method often stagnates. Recently we have formulated the extended form of the one-dimensional discrete phase retrieval problem and we have assumed that the stagnation can be avoided by oversampling. Simulations have indicated that the conjecture is true. In this work we prove the convergence of the error-reduction algorithm in the proposed extended one-dimensional discrete phase retrieval framework.
EXT="Rusu, Corneliu"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The main target of this research work is to study the provision of indoor service (coverage) using outdoor base stations at higher frequencies i.e. 10 GHz in the context of a single building scenario. In an outdoor to indoor propagation, an angular wall loss model is used in the General Building Penetration (GBP) model for estimating the additional loss at the intercept point of the building exterior wall. A novel angular wall loss model based on a separate incidence angle in azimuth and elevation plane is proposed in this paper. In the second part of this study, an Extended Building Penetration (EBP) model is proposed, and the performance of EBP model is compared with the GBP model. In EBP model, the additional fifth path known as the 'Direct path' is proposed to be included in the GBP model. Based on the evaluation results, the impact of the direct path is found significant for the indoor users having the same or closed by height as that of the height of the transmitter.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The main target of this paper is to perform the multidimensional analysis of multipath propagation at higher frequencies i.e. 15 GHz and 28 GHz, using 'sAGA' a 3D ray tracing tool. A real world outdoor Line of Sight (LOS) microcellular environment from the Yokusuka city of Japan is considered for the analysis. The simulation data acquired from the 3D ray tracing tool includes the received signal strength, power angular spectrum and the power delay profile. The different propagation mechanisms were closely analyzed. The simulation results show the difference of propagation at two frequencies i.e. 15 GHz and 28 GHz and draw a special attention on the impact of diffuse scattering at 28 GHz. In a simple outdoor microcellular environment with a valid LOS link between the transmitter and a receiver, a path loss difference of around 5.7 dB was found between 15 GHz and 28 GHz frequency of operation. However, the propagation loss at higher frequency can be compensated by using the antenna with narrow beamwidth and larger gain.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A novel reduced-complexity digital predistortion (DPD) solution is presented in this paper. The proposed DPD can suppress the unwanted distortions due to power amplifier (PA) nonlinearity and I/Q modulator impairments in direct conversion transmitters using reduced-bandwidth filtered basis functions. Moreover, the DPD parameter estimation is based on very simple decorrelation based closed-loop processing and reduced-bandwidth observation, thus further reducing the overall complexity. The proposed DPD can be used in large array or massive MIMO systems with large number of radio transceivers and PAs, where reducing the complexity of the DPD processing is very critical.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Multilabel ranking is an important machine learning task with many applications, such as content-based image retrieval (CBIR). However, when the number of labels is large, traditional algorithms are either infeasible or show poor performance. In this paper, we propose a simple yet effective multilabel ranking algorithm that is based on k-nearest neighbor paradigm. The proposed algorithm ranks labels according to the probabilities of the label association using the neighboring samples around a query sample. Different from traditional approaches, we take only positive samples into consideration and determine the model parameters by directly optimizing ranking loss measures. We evaluated the proposed algorithm using four popular multilabel datasets. The proposed algorithm achieves equivalent or better performance than other instance-based learning algorithms. When applied to a CBIR system with a dataset of 1 million samples and over 190 thousand labels, which is much larger than any other multilabel datasets used earlier, the proposed algorithm clearly outperforms the competing algorithms.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, a self-backhauling radio access system is studied and analyzed. In particular, we consider a scenario where a full-duplex access node is serving mobile users simultaneously in uplink and downlink, while also maintaining a wireless backhaul connection. The full-duplex capability of the access node, together with large antenna arrays, allows it to do all of this using the same center frequency. The minimum transmit powers for such a system are solved in a closed form under the condition that certain Quality of Service (QoS) requirements, defined in terms of minimum uplink and downlink data rates, are fulfilled. It is demonstrated with numerical results that, by using the derived expressions for the optimal transmit powers, the probability of fulfilling the QoS requirements is greatly increased, while simultaneously the overall transmit power usage of the system is significantly reduced when compared to a benchmark scheme.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we develop new multiclass classification algorithms for detecting people and vehicles by fusing data from a multimodal, unattended ground sensor node. The specific types of sensors that we apply in this work are acoustic and seismic sensors. We investigate two alternative approaches to multiclass classification in this context - the first is based on applying Dempster-Shafer Theory to perform score-level fusion, and the second involves the accumulation of local similarity evidences derived from a feature-level fusion model that combines both modalities. We experiment with the proposed algorithms using different datasets obtained from acoustic and seismic sensors in various outdoor environments, and evaluate the performance of the two algorithms in terms of receiver operating characteristic and classification accuracy. Our results demonstrate overall superiority of the proposed new feature-level fusion approach for multiclass discrimination among people, vehicles and noise.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Urbanisation can modify the local climate, increasing the temperature of cities compared to rural areas. This phenomenon is known as the Urban Heat Island (UHI), and this paper introduces a methodology to investigate the spatial variability of air and surface temperatures across London. In particular, this study aims to investigate if a widely used spatial resolution (1 km) is appropriate for heat-related health risk studies. Data from vehicle-transect and ASTER thermal images were overlaid on a reference grid of 1 km, used by UHI simulation models. The results showed higher variability of air temperature within some specific modelled grid cells in the city centre, while surface temperatures presented higher variability in the London borders. This investigation suggests that LST has larger variation levels and more grid cells with sub-grid variation above 1°C compared to air temperature measurements.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The quality of visual contents displayed on 3D autostereoscopic displays-such as light field displays-essentially depend on factors that are not present in case of 3D stereoscopic or 2D ones, like angular resolution. A higher number of views in a given field of view enables a smoother, continuous motion parallax, but evidently requires more resources to transmit and display. However, in several cases a sufficiently high number of views might not even be available, thus light field reconstruction is required to increase the density of intermediate views. In this paper we introduce the results of a research aiming to measure the perceptual difference between light field reconstruction and different angular resolutions via a series of subjective image quality assessments. The analysis also calls attention to transmission requirements of content for light field displays.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Virtual reality (VR) provides unprecedented immersive experience using high-resolution spherical stereoscopic panoramic video. Such an experience is achieved by using head-mounted display (HMD) which has very strict latency bounds in order to respond promptly to user movements. Conventional streaming of VR video requires large bandwidth because the entire captured panorama is transmitted. However, only a limited field-of-view (FOV) is displayed by an HMD, resulting in wastage of bandwidth. To alleviate the problem, this paper proposes a High Efficiency Video Coding (HEVC) compliant approach for efficient coding and streaming of stereoscopic VR content. The proposed method is based on partitioning video pictures into tiles, where only the required tiles corresponding to the primary viewport are transmitted in high resolution, while the remaining parts are transmitted in low resolution. Furthermore, this method enables coding stereoscopic video contents using a conventional HEVC codec, while still achieving significant compression gain by means of adopting inter-view prediction only in intra random access point (IRAP) pictures. Using this method, the predicted view can be decoded independently of the main view, hence allowing simultaneous decoding instances. Experimental results demonstrate that the proposed approach is able to substantially improve compression efficiency and streaming bitrate performance.
EXT="Vadakital, Vinod Kumar Malamal"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this work, a novel digital channelizer design is developed through the use of a compact, system-level modeling approach. The model efficiently captures key properties of a digital channelizer system and its time-varying operation. The model applies powerful Markov Decision Process (MDP) techniques in new ways for design optimization of reconfigurable channelization processing. The result is a promising methodology for design and implementation of digital channelizers that adapt dynamically to changing use cases and stochastic environments while optimizing simultaneously for multiple conflicting performance goals. The method is used to employ an MDP to generate a runtime reconfiguration policy for a time-varying environment. Through extensive simulations, the robustness of the adaptation is demonstrated in comparison with the prior state of the art.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we describe an advanced real-time cancellation architecture for efficient digital-domain suppression of self-interference in inband full-duplex devices. The digital canceller takes into account the nonlinear distortion produced by the transmitter power amplifier, and is thereby a robust solution for low-cost implementations. The developed real-time digital canceller implementation is then evaluated with actual RF measurements, where it is complemented with a real-time adaptive RF canceller. The obtained results show that the RF canceller and the developed digital canceller implementation can together cancel the residual self-interference below the receiver noise floor in real-time for a 20 MHz cancellation bandwidth.
INT=elt,"Piilila, Mauno"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Dataflow modeling techniques facilitate many aspects of design exploration and optimization for signal processing systems, such as efficient scheduling, memory management, and task synchronization. The lightweight dataflow (LWDF) programming methodology provides an abstract programming model that supports dataflow-based design and implementation of signal processing hardware and software components and systems. Previous work on LWDF techniques has emphasized their application to DSP software implementation. In this paper, we present new extensions of the LWDF methodology for effective integration with hardware description languages (HDLs), and we apply these extensions to develop efficient methods for low power DSP hardware implementation. Through a case study of a deep neural network application for vehicle classification, we demonstrate our proposed LWDF-based hardware design methodology, and its effectiveness in low power implementation of complex signal processing systems.
INT=tie,"Xie, Renjie"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The divergence similarity between two color images is presented based on the Jensen-Shannon divergence to measure the color-distribution similarity. Subjective assessment experiments were developed to obtain mean opinion scores (MOS) of test images. It was found that the divergence similarity and MOS values showed statistically significant correlations.
JUFOID=72850
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communicationrelated part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we propose a supervised subspace learning method that exploits the rich representation power of deep feedforward networks. In order to derive a fast, yet efficient, learning scheme we employ deep randomized neural networks that have been recently shown to provide good compromise between training speed and performance. For optimally determining the learnt subspace, we formulate a regression problem where we employ target vectors designed to encode both the labeling information available for the training data and geometric properties of the training data, when represented in the feature space determined by the network's last hidden layer outputs. We experimentally show that the proposed approach is able to outperform deep randomized neural networks trained by using the standard network target vectors.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Rényi's entropies play a significant role in many signal processing applications. Plug-in kernel density estimation methods have been employed to estimate such entropies with good results. However, they become computationally intractable in higher dimensions, because of the requirement to store intermediate probability density values for a large number of data points. We propose a method to reduce the number of the samples in a plug-in kernel density estimation method for Rényi's entropies of real exponents and to improve the result of the standard plug-in kernel density method. To this end, we derive a univariate estimator, using an Hermite expansion of sums of Gaussian kernels and a hierarchical clustering of the samples. On simulated data from a univariate Gaussian distribution, our method performs better than a k-nearest neigbour algorithm and other kernel density estimation methods.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A satellite navigation receiver traditionally searches for positioning signals using an acquisition procedure. In situations, in which the required information is only a binary decision whether at least one positioning signal is present or absent, the procedure represents an unnecessarily complex solution. This paper presents a different approach for the binary detection problem with significantly reduced computational complexity. The approach is based on a novel decision metric which is utilized to design two binary detectors. The first detector operates under the theoretical assumption of additive white Gaussian noise and is evaluated by means of Receiver Operating Characteristics. The second one considers also additional interferences and is suitable to operate in a real environment. Its performance is verified using a signal captured by a receiver front-end.
EXT="Raasakka, Jussi"
EXT="Peltola, Pekka"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Recent device shipment trends strongly indicate that the number of Web-enabled devices other than PCs and smart phones are growing rapidly. Marking the end of the dominant era of these two traditional device categories, people will soon commonly use various types of Internet-connected devices in their daily lives, where no single device will dominate. Since today's devices are mostly standalone and only stay in sync in limited ways, new approaches are needed for mastering the complexity arising from the world of many types of devices, created by different manufacturers and implementing competing standards. Today, the most common denominator for dealing with the differences is using clouds. Unfortunately, however, while the cloud is well suited for numerous activities, there are also serious limitations, especially when considering systems that consist of numerous, battery-powered computing devices that have limited connectivity. In this paper, we provide an insight to our research where totally cloud-based orchestration of cooperating devices is partitioned into more local actions, where constant communication with the cloud backend can be at least partially omitted.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, results of image denoising efficiency prediction for filter based on discrete cosine transform (DCT) for the case of spatially correlated additive Gaussian Noise (SCGN) are given. The considered noise model is analyzed for different degrees of spatial correlation that produce varying non-homogeneous spectrum of the noise. PSNR metric is exploited to assess denoising efficiency. It is shown in this paper, that a prediction of denoising efficiency has high accuracy for data distorted by noise with different degrees of spatial correlation, and require low computational resources.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
With the increasing amount of data being published on the Web, it is difficult to analyze their content within a short time. Topic modeling techniques can summarize textual data that contains several topics. Both the label (such as category or tag) and word co-occurrence play a significant role in understanding textual data. However, many conventional topic modeling techniques are limited to the bag-of-words assumption. In this paper, we develop a probabilistic model called Bigram Labeled Latent Dirichlet Allocation (BL-LDA), to address the limitation of the bag-of-words assumption. The proposed BL-LDA incorporates the bigram into the Labeled LDA (L-LDA) technique. Extensive experiments on Yelp data show that the proposed scheme is better than the L-LDA in terms of accuracy.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A novel filter based on four most similar neighbors (MSN) is proposed in this paper which considers all the pixels of the sliding window except the central pixel after taking the first order absolute differences from the central pixel. The proposed filter is composed of two steps: noise detection followed by filtering. In noise detection, first order absolute differences are calculated and sorted in ascending order. Clusters of equal sizes are formed based on most similar pixels and then fuzzy rules are applied to detect the noise present in the current pixel. Threshold parameters are set adaptively. In filtering phase, median based fuzzy filter is used to restore the corrupted pixels. Experimental results show that the proposed filter outperforms several state-of-the-art filers for random value impulse noise removal in an image.
INT=elt,"Ali, Mubashir"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes a fully digital post-processing solution for cancelling nonlinear distortion and mirror-frequency interference in wideband direct-conversion receivers (DCRs). Favorable cost, integrability, and power efficiency have made DCRs a popular choice in communication systems. It is also an emerging trend in radar systems since digital post-processing enables sufficient performance. The proposed method cancels the most essential distortion adaptively during normal receiver operation without any prior information. Improved cancellation performance compared to the state-of-the-art is achieved considering inband and neighboring band distortion induced by the strong received signals. This is verified and demonstrated with extensive simulations and true RF hardware measurements.
EXT="Singh, Simran"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Location based and geo-context aware services form the new fast growing domain of commercially successful ICT solutions. These services play the key role in IoT scenarios and development of smart spaces and proactive solutions. One of the most attractive application areas is e-Tourism. More people can afford travelling and over the last few decades we see continues growth of the tourist activity. At the same time we see huge increase of demand both in quantity and quality of tourist services. Many experts foresee that this growth cannot any longer be fulfilled by applying traditional approaches. Similarly to the change in tickets and hotel booking, it is expected that soon we will witness major transformation in the whole industry towards e-Tourism driven market, where roles of traditional service providers, e.g., tourist agents, guides, will disappear or seriously changed. Internet of Things (IoT) is an integral part of the Future Internet ecosystem that has major impact on development of e-Tourism services. IoT provides an infrastructure to uniquely identify and link physical objects with virtual representations. As a result any physical object can have virtual reflection in the service space. This gives an opportunity to replace actions on physical objects by operations on their virtual reflections, which is faster, cheaper and more comfortable for the user. In this paper we summarize our research in the field, share ideas of innovative e-Tourism services and present Geo2Tag LBS platform that allows easy and fast development of such services.
EXT="Balandin, Sergey"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper describes an extended Kalman filter based algorithm for fusion of monocular vision measurements, inertial rate sensor measurements, and camera motion. The motion of the camera between successive images generates a baseline for range computations by triangulation. The recursive estimation algorithm is based on extended Kalman filtering. The depth estimation accuracy is strongly affected by mutual observer and feature point geometry, measurement accuracy of observer motion parameters and line of sight to a feature point. The simulation study investigates how the estimation accuracy is affected by the following parameters: linear and angular velocity measurement errors, camera noise, and observer path. These results draw requirements to the instrumentation and observation scenarios. It was found that under favorable conditions the error in distance estimation does not exceed 2% of the distance to a feature point.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The paper considers the possible use of computer vision systems for INS aiding. Two methods of navigation data obtaining from the image sequence are analyzed. The first method uses the features of architectural elements in indoor and urban conditions for generation of object attitude parameters. The second method is based on extraction of general features in the image and is more widely applied. Besides the orientation parameters, the second method estimates the object displacement, and thus can be used as visual odometry technique. The described algorithms can be used to develop small-sized MEMS navigation systems efficiently operating in urban conditions.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Noise reduction is often performed at an early stage of the image processing path. In order to keep the processing delays small in different computing platforms, it is important that the noise reduction is performed swiftly. In this paper, the block-matching and three-dimensional filtering (BM3D) denoising algorithm is implemented on heterogeneous computing platforms using OpenCL and CUDA frameworks. To our knowledge, these implementations are the first successful open source attempts to use GPU computation for BM3D denoising. The presented GPU implementations are up to 7.5 times faster than their respective CPU implementations. At the same time, the experiments illustrate general design challenges in using massively parallel processing platforms for the calculation of complex imaging algorithms.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The interest towards programming of streaming applications using dataflow models of computation has been increasing steadily in the recent years. Among the numerous dataflow formalisms, the ISO-standardized RVC-CAL dataflow language has offered a solid basis for programming tool development and research. To this date RVC-CAL programming tools have enabled transforming dataflow programs into concurrent executables for multicore processors, as well as for generating synthesizable hardware descriptions. In this paper it is shown how the RVC-CAL dataflow language can be used for programming graphics processing units (GPUs) with high efficiency. Considering the processing architectures of recent mobile and desktop computing devices, this advance is of high importance, as most consumer devices contain a graphics processing unit nowadays. To evaluate the proposed solution, the paper presents a video processing application case study. At best, the solution is shown to provide a speedup of 42× over single-threaded CPU execution.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Dictionary learning is usually approached by looking at the support of the sparse representations. Recent years have shown results in dictionary improvement by investigating the cosupport via the analysis-based cosparse model. In this paper we present a new cosparse learning algorithm for orthogonal dictionary blocks that provides significant dictionary recovery improvements and representation error shrinkage. Furthermore, we show the beneficial effects of using this algorithm inside existing methods based on building the dictionary as a structured union of orthonormal bases.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Texture analysis provides quantitative information describing the properties of a digital image. The value of texture analysis has been tested in various medical applications, using mostly magnetic resonance images because of the amount of information the method is capable to provide. However, there exists no certain practice to define the region of interest (ROI) within the texture parameters are calculated. Many parameters seem to be dependent on the ROI size. We studied the effect of the ROI size with magnetic resonance head images from 64 healthy adults and artificial noise images. According to our results, ROI size has a significant effect on the computed value of several second-order texture features. We conclude that comparisons of different size ROIs will therefore lead to falsely optimistic classification between analyzed tissues.
EXT="Dastidar, Prasun"
EXT="Sikiö, Minna"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Mobile TV has recently received a lot of attention worldwide with the advances in technologies such as Digital Multimedia Broadcasting (DMB), Digital Video Broadcasting - Handheld (DVB-H) and MediaFLO. On the other hand 3DTV is a new approach to watching TV, introducing the third dimension for a more realistic and interactive experience. With the merge of these two technologies it will be possible to have 3DTV products and services based on portable platforms with switchable 2D/3D autostereoscopic displays. The paper presents the European Mobile3DTV project approach toward achieving such a merge. The project specifically addresses the mobile 3DTV delivery over DVB-H system. It develops a technology demonstration system comprising suitable stereo-video content-creation techniques; efficient, scalable and flexible stereo-video encoders with error resilience and error-concealment capabilities, tailored for robust transmission over DVB-H; and also the corresponding stereo-video decoders and players working on a portable terminal device equipped with an autostereoscopic display.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
It has been known for decades that the iterative methods are perhaps the most popular approaches to solve the phase retrieval problem. Unfortunately the iterative methods often stagnate. This happens also in the case of the 1-D Discrete Phase Retrieval (1-D DPhR) problem. Recently it has been shown that some requirements in the input magnitude data might be one of the reasons why the direct method cannot solve the 1-D DPhR Problem. In this work we present some difficulties that can be encountered when one has to implement the iterative method for finding a solution of 1-D DPhR problem. We shall formulate the extended form of 1-D DPhR problem. Simulations indicate the conjecture to be true.
EXT="Rusu, Corneliu"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we propose an optimization scheme aiming at optimal nonlinear data projection, in terms of Fisher ratio maximization. To this end, we formulate an iterative optimization scheme consisting of two processing steps: optimal data projection calculation and optimal class representation determination. Compared to the standard approach employing the class mean vectors for class representation, the proposed optimization scheme increases class discrimination in the reduced-dimensionality feature space. We evaluate the proposed method in standard classification problems, as well as on the classification of human actions and face, and show that it is able to achieve better generalization performance, when compared to the standard approach.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we propose a method for video summarization based on human activity description. We formulate this problem as the one of automatic video segment selection based on a learning process that employs salient video segment paradigms. For this one-class classification problem, we introduce a novel variant of the One-Class Support Vector Machine (OC-SVM) classifier that exploits subclass information in the OC-SVM optimization problem, in order to jointly minimize the data dispersion within each subclass and determine the optimal decision function. We evaluate the proposed approach in three Hollywood movies, where the performance of the proposed SOC-SVM algorithm is compared with that of the OC-SVM. Experimental results denote that the proposed approach is able to outperform OC-SVM-based video segment selection.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We propose a low complexity method for estimating direction of arrival (DOA) when the positions of the array sensors are affected by errors with known magnitude bound. This robust DOA method is based on solving an optimization problem whose solution is obtained in two stages. First, the problem is relaxed and the corresponding power estimation has an expression similar to that of standard beamforming. If the relaxed solution does not satisfy the magnitude bound, an approximation is made by projection. Unlike other robust DOA methods, no eigenvalue decomposition is necessary and the complexity is similar to that of MVDR. For low and medium SNR, the proposed method competes well with more complex methods and is clearly better than MVDR.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Non-negative matrix factorisations are used in several branches of signal processing and data analysis for separation and classification. Sparsity constraints are commonly set on the model to promote discovery of a small number of dominant patterns. In group sparse models, atoms considered to belong to a consistent group are permitted to activate together, while activations across groups are suppressed, reducing the number of simultaneously active sources or other structures. Whereas most group sparse models require explicit division of atoms into separate groups without addressing their mutual relations, we propose a constraint that permits dynamic relationships between atoms or groups, based on any defined distance measure. The resulting solutions promote approximation with components considered similar to each other. Evaluation results are shown for speech enhancement and noise robust speech and speaker recognition.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Groups of mutually similar image blocks are the key element in nonlocal image processing. In this work, the spatial coordinates of grouped blocks are leveraged in two distinct parts of the transform-domain collaborative filtering within the BM3D algorithm. First, we introduce an adaptive 1-D transform for 3-D collaborative filtering based on sampling 2-D smooth functions at the positions of grouped blocks. This adaptive transform is applied for improved decorrelation of the 2-D spectra of the grouped blocks. Second, we propose a directional sharpening procedure whose strength varies adaptively according to the relative orientation of the transform basis functions with respect to the group coordinates. Experiments confirm the efficacy of the proposed adaptations, for denoising as well as for sharpening of noisy images.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Deep neural network (DNN) based acoustic modelling has been successfully used for a variety of automatic speech recognition (ASR) tasks, thanks to its ability to learn higher-level information using multiple hidden layers. This paper investigates the recently proposed exemplar-based speech enhancement technique using coupled dictionaries as a pre-processing stage for DNN-based systems. In this setting, the noisy speech is decomposed as a weighted sum of atoms in an input dictionary containing exemplars sampled from a domain of choice, and the resulting weights are applied to a coupled output dictionary containing exemplars sampled in the short-time Fourier transform (STFT) domain to directly obtain the speech and noise estimates for speech enhancement. In this work, settings using input dictionary of exemplars sampled from the STFT, Mel-integrated magnitude STFT and modulation envelope spectra are evaluated. Experiments performed on the AURORA-4 database revealed that these pre-processing stages can improve the performance of the DNN-HMM-based ASR systems with both clean and multi-condition training.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
For real-time or close to real-time applications, sound source separation can be performed on-line, where new frames of incoming data for a mixture signal are processed as they arrive, at very low delay. We propose an approach which generates the separation filters for short synthesis frames to achieve low latency source separation, based on a compositional model mixture of the audio to be separated. Filter parameters are derived from a longer temporal context than the current processing frame through use of a longer analysis frame. A pair of dictionaries are used, one for analysis and one for reconstruction. With this approach we are able to increase separation performance at low latencies whilst retaining the low-latency provided by the use of short synthesis frames. The proposed data handling scheme and parameters can be adjusted to achieve real-time performance, given sufficient computational power. Low-latency output allows a human listener to use the results of such a separation scheme directly, without a perceptible delay. With the proposed method, separated source-to-distortion ratios (SDRs) can be improved by over 1 dB for latencies below 20 ms, without any affect on latency.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, the design, on a flexible LCP substrate, and fabrication process of a wearable circuit harvesting the ambient energy emitted from a two-way radio is discussed in detail.The circuit is fabricated through the combination of circuit traces made with masking utilizing inkjet printing technology and lumped circuit components. The input power for the RF-DC conversion circuit is analytically computed from the measured S-parameters of the Tx-Rx propagation channel. A maximum output power of 43.2mW with the RF-DC conversion efficiency of 82.5% and open- circuit voltage of 17.87 V is achieved with an E-field energy harvester placed 7cm away from an off-the-shelf 1W two-way talk radio.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, the design of a novel flexible RF energy harvester utilizing hybrid printed electronics technology is presented for the first time. The proposed RF energy harvester operates at UHF RFID band (868 MHz ∼ 915 MHz) for far-field RF energy harvesting applications. A concept of hybrid printed electronics which takes advantage of both flexibility of low-cost printed electronics and high performances of ICs is introduced. The passive components of the RF energy harvester, such as the circuit layout and the antenna, are printed on a flexible low-cost polymer substrate utilizing a catalyst-based inkjet printing process for the fabrication of copper metallization layers. The surface-mount devices (SMDs) are soldered on the printed circuit board. The proposed approach demonstrates the feasibility of implementing low-cost flexible printed electronics for the Internet of Things (IoT) and stand-alone ('zero-power') wireless sensor platforms.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A system design is presented for radio frequency (RF) energy harvesting on wireless sensor network (WSN) nodes, where all electronics reside inside a 3D structure and the antennas lie on the surfaces of it. Additive manufacturing techniques are used for the packaging and antenna fabrication: A 3D-printed cross-shaped structure is built that folds to a cuboid in an 'origami' fashion and retains its shape at room temperature. Inkjet printing is used to directly fabricate antennas on the surfaces of the 3D-printed plastic, enabling a fully additive manufacturing of the structure. Multiple antennas on the cube's surfaces can be used for RF energy harvesting of signals arriving from totally orthogonal directions, with the use of an appropriate harvester. The system modules (cube, antenna, harvester) are described and characterized, offering a proof-of-concept for the combination of fabrication techniques to build systems for demanding RF applications.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper demonstrates the feasibility to manufacture high performance radio-frequency passive components on cellulose substrates by exploiting two novel technologies: the vertically integrated inkjet printing and the copper laminate method. Both processes are substrate independent and thus suitable for fabricating circuits on paper as well; moreover, in a future perspective, they can be easily combined together in order to exploit their complementarity. Passive components such as capacitors and inductors, with Qs up to 22, never registered before on cellulose substrates, and Self-Resonant-Frequency (SRF) up to 4 GHz are described. The obtained values of the capacitance and inductance per unit area are 0.8 pF/mm2 and 43 nH/mm2, respectively.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The complex design spaces associated with state-of-the-art, multicore signal processing systems pose significant challenges in realizing designs with high productivity and quality. The Partial Expansion Graph (PEG) implementation model was developed to help address these challenges by enabling more efficient exploration of the scheduling design space for multicore digital signal processors. The PEG allows designers and design tools to systematically adjust and adapt the amount of parallelism exposed from applications depending on the targeted platform. In this paper, we develop new algorithms for scheduling and mapping systems implemented using PEGs. Collectively, these algorithms operate in three steps. First, the amount of data parallelism in the application graph is tuned systematically over many iterations to profit from the available cores in the target platform. Then a mapping algorithm that uses graph analysis is developed to distribute data and task parallel instances over different cores while trying to balance the load of all processing units to make use of pipeline parallelism. Finally, we use a novel technique for performance evaluation by implementing the scheduler and a customizable solution on the programmable platform. We demonstrate the utility of our PEG-based scheduling and mapping algorithms through experiments on real application models and various synthetic graphs.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents a lossless compression method performing separately the compression of the vessels and of the remaining part of eye fundus in retinal images. Retinal images contain valuable information sources for several distinct medical diagnosis tasks, where the features of interest can be e.g. the cotton wool spots in the eye fundus, or the volume of the vessels over concentric circular regions. It is assumed that one of the existent segmentation methods provided the segmentation of the vessels. The proposed compression method transmits losslessly the segmentation image, and then transmits the eye fundus part, or the vessels image, or both, conditional on the vessels segmentation. The independent compression of the two color image segments is performed using a sparse predictive method. Experiments are provided over a database of retinal images containing manual and estimated segmentations. The codelength of encoding the overall image, including the segmentation and the image segments, proves to be better than the codelength for the entire image obtained by JPEG2000 and other publicly available compressors.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this preliminary research we examine the suitability of hierarchical strategies of multi-class support vector machines for classification of induced pluripotent stem cell (iPSC) colony images. The iPSC technology gives incredible possibilities for safe and patient specific drug therapy without any ethical problems. However, growing of iPSCs is a sensitive process and abnormalities may occur during the growing process. These abnormalities need to be recognized and the problem returns to image classification. We have a collection of 80 iPSC colony images where each one of the images is prelabeled by an expert to class bad, good or semigood. We use intensity histograms as features for classification and we evaluate histograms from the whole image and the colony area only having two datasets. We perform two feature reduction procedures for both datasets. In classification we examine how different hierarchical constructions effect the classification. We perform thorough evaluation and the best accuracy was around 54% obtained with the linear kernel function. Between different hierarchical structures, in many cases there are no significant changes in results. As a result, intensity histograms are a good baseline for the classification of iPSC colony images but more sophisticated feature extraction and reduction methods together with other classification methods need to be researched in future.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we propose a method for human action recognition in unconstrained environments based on stereoscopic videos. We describe a video representation scheme that exploits the enriched visual and disparity information that is available for such data. Each stereoscopic video is represented by multiple vectors, evaluated on video locations corresponding to different disparity zones. By using these vectors, multiple action descriptions can be determined that either correspond to specific disparity zones, or combine information appearing in different disparity zones in the classification phase. Experimental results denote that the proposed approach enhances action classification performance, when compared to the standard approach, and achieves state-of-the-art performance on the Hollywood 3D database designed for the recognition of complex actions in unconstrained environments.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Training dictionaries for sparse representations is a time consuming task, due to the large size of the data involved and to the complexity of the training algorithms. We investigate a parallel version of the approximate K-SVD algorithm, where multiple atoms are updated simultaneously, and implement it using OpenCL, for execution on graphics processing units (GPU). This not only allows reducing the execution time with respect to the standard sequential version, but also gives dictionaries with which the training data are better approximated. We present numerical evidence supporting this somewhat surprising conclusion and discuss in detail several implementation choices and difficulties.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The micro-Doppler spectrogram depends on parts of a target moving and rotating in addition to the main body motion (e.g., spinning rotor blades) and is thus characteristic for the type of target. In this study, the micro-Doppler spectrogram is exploited to distinguish between birds and small unmanned aerial vehicles (UAVs). The focus hereby is on micro-Doppler features enabling fast classification of birds and mini-UAVs. In a second classification step, it is desired to exploit micro-Doppler features to further characterize the type of UAV, e.g., fixed-wing vs. rotary-wing. In this paper, potentially robust features are discussed supporting the first classification step, i.e., separation of birds and UAVs. The Singular Value Decomposition seems a powerful tool to extract such features, since the information content of the micro-Doppler spectrogram is preserved in the singular vectors. In the paper, some examples of micro-Doppler feature extraction via Singular Value Decomposition are given.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A method to adjust the mean-squared-errors (MSE) value for coded video quality assessment is investigated in this work by incorporating subjective human visual experience. First, we propose a linear model between the mean opinioin score (MOS) and a logarithmic function of the MSE value of coded video under a range of coding rates. This model is validated by experimental data. With further simplification, this model contains only one parameter to be determined by video characteristics. Next, we adopt a machine learing method to learn this parameter. Specifically, we select features to classify video content into groups, where videos in each group are more homoegeneous in their characteristics. Then, a proper model parameter can be trained and predicted within each video group. Experimental results on a coded video database are given to demonstrate the effectiveness of the proposed algorithm.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper introduces a novel multicore scheduling method that leverages a parameterized dataflow Model of Computation (MoC). This method, which we have named Just-In-Time Multicore Scheduling (JIT-MS), aims to efficiently schedule Parameterized and Interfaced Synchronous DataFlow (PiSDF) graphs on multicore architectures. This method exploits features of PiSDF to And locally static regions that exhibit predictable communications. This paper uses a multicore signal processing benchmark to demonstrate that the JIT-MS scheduler can exploit more parallelism than a conventional multicore task scheduler based on task creation and dispatch. Experimental results of the JIT-MS on an 8-core Texas Instruments Keystone Digital Signal Processor (DSP) are compared with those obtained from the OpenMP implementation provided by Texas Instruments. Results shows latency improvements of up to 26% for multicore signal processing systems.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Full use of the parallel computation capabilities of present and expected CPUs and CPUs require use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters are studied.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Recently, the deployment of small cells is considered as an effective solution to enhance the capacity in existing cellular networks. However, massive deployment of small cells also incurs severe interference and increased energy consumption, which degrades the energy efficiency of the system. In this paper, we analyze the energy efficiency in cognitive small cell network, and propose a traffic-aware distributed sensing and access scheme for cognitive small cells base stations (SBSs). The proposed scheme adopts the concept of dynamic data driven applications systems (DDDAS). In the DDDAS paradigm, a model of the underlying design space is managed dynamically, updated periodically based on measurements of data, and used to drive measurement functions and adaptation of system configurations. Through careful integration of DDDAS-based design principles, SBSs have the ability to configure their sensing and access parameters according to the traffic patterns that are actually encountered. Simulation results show that our proposed DDDAS-based scheme can achieve significantly higher energy efficiency compared to conventional spectrum sharing schemes in cognitive small cell networks.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Electronic devices we use on a daily basis collect sensitive information without preserving user's privacy. In this paper, we propose the lord of the sense (LotS), a privacy preserving reputation system for participatory sensing applications. Our system maintains the privacy and anonymity of information with the use of cryptographic techniques and combines voting approaches to support users' reputation. Furthermore, LotS maintains accountability by tracing back a misbehaving user while maintaining k-anonymity. A detailed security analysis is presented with the current advantages and disadvantages of our system.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This work deals with an adaptive and localized time-frequency representation of time-series signals based on rational functions. The proposed rational Discrete Short Time Fourier Transform (DSTFT) is used for extracting discriminative features in EEG data. We take the advantages of bagging ensemble learning and Alternating Decision Tree (ADTree) classifier to detect the seizure segments in presence of seizure-free segments. The effectiveness of different rational systems is compared with the classical Short Time Fourier Transform (STFT). The comparative study demonstrates that Malmquist-Takenaka rational system outperforms STFT while it can provide a tunable time-frequency representation of the EEG signals and less Mean Square Error (MSE) in the inverse transform. © 2014 IEEE.
Contribution: organisation=sgn,FACT1=1<br/>Portfolio EDEND: 2014-06-26
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Audio segmentation is a well-known problem which can be considered from various angles. In the context of this paper, audio segmentation problem is to extract small 'homogeneous' pieces of audio in which the content does not change in terms of the present audio events. The proposed method is compared with the well-known segmentation method; Bayesian Information Criterion (BIC) based Divide-and-Conquer, in terms of average segment duration and computational complexity. © 2014 IEEE.
Contribution: organisation=sgn,FACT1=1<br/>Portfolio EDEND: 2014-08-30<br/>Publisher name: IEEE
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper we propose an algorithm for Single-hidden Layer Feedforward Neural networks training. Based on the observation that the learning process of such networks can be considered to be a non-linear mapping of the training data to a high-dimensional feature space, followed by a data projection process to a low-dimensional space where classification is performed by a linear classifier, we extend the Extreme Learning Machine (ELM) algorithm in order to exploit the training data dispersion in its optimization process. The proposed Minimum Variance Extreme Learning Machine classifier is evaluated in human action recognition, where we compare its performance with that of other ELM-based classifiers, as well as the kernel Support Vector Machine classifier.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper we analyze the effect of the calibration period, or lack of, on self-interference channel estimation in the digital domain of in-band full-duplex radio transceivers. In particular, we consider a scenario where the channel estimation must be performed without a separate calibration period, which means that the received signal of interest will act as an additional noise source from the estimation perspective. We will explicitly analyze its effect, and quantify the increase in the parameter estimation variance, or sample size, if similar accuracy for the self-interference channel estimate is to be achieved as with a separate calibration period. In addition, we will analyze how the calibration period, or its absence, affects the overall achievable rates. Full waveform simulations are then used to determine the validity of the obtained results, as well as to provide numerical results regarding the achievable rates. It is shown that, even though a substantial increase in the parameter sample size is required if there is no calibration period, the achievable rates are still comparable for the two scenarios.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The Adaptive Loop Filter (ALF) is a subjective and objective image quality improving filter in the High Efficiency Video Coding standard (HEVC). The ALF has shown to be computationally complex and its complexity has been reduced during the HEVC development process. In the HEVC TestModel HM-7.0 ALF is a 9×7 cross + 3×3 square shaped filter. This paper presents a programmable application specific instruction processor for the ALF. The proposed processor processes 1920×1080p luminance frames at 30 frames per second, when operated at a clock frequency of 311MHz. Low power consumption and a low gate count make the proposed processor suitable for embedded devices. The processor program code is written in pure C-language, which allows versatile use of the circuit and updates to the filter functionality without modifying the processor design. To the authors' best knowledge this is the first programmable solution for ALF on embedded devices.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we present a designer-configurable, resource efficient FPGA architecture for OFDM system implementation. Our design achieves a significant improvement in resource efficiency for a given data rate. This efficiency improvement is achieved through careful analysis of how FFT computation is performed within the context of OFDM systems, and streamlining memory management and control logic based on this analysis. In particular, our OFDM-targeted FFT design eliminates redundant buffer memory, and simplifies control logic to save FPGA resources. We have synthesized and tested our design using the Xilinx ISE 13.4 synthesis tool, and compared the results with the Xilinx FFT v7.1, which is a widely used commercial FPGA IP core. We have demonstrated that our design provides at least 8.8% enhancement in terms of resource efficiency compared to Xilinx FFT v7.1 when it is embedded within the same OFDM configuration.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The improved reversible data hiding scheme which is a part of JPEG coding process is introduced. Generally, one of the common constraints imposed on digital watermarking in frequency domain is a small payload that can be embedded without causing high degradation of a JPEG stego image. Moreover, even at small hidden payload the stego image file size will increase to some extent. In no existing data hiding technique compliant with JPEG there is a possibility to define in advance the file size of the watermarked image. Therefore, in this paper we propose to use a rate-distortion theory that minimizes coding distortion subject to a coding rate constraint. An iterative algorithm based on a Lagrangian formulation is applied to obtain a vector of quality factors for each of the 8 × 8 blocks that scale the JPEG standard quantization table. The experimental results show the advantage of the proposed improved watermarking scheme in terms of data payload versus quality and file size compared with the state-of-the-art data hiding schemes, and, furthermore, clarify the improvements of its optimized counterpart. © 2013 University Paris 13.
Contribution: organisation=sgn,FACT1=1<br/>Portfolio EDEND: 2013-12-29<br/>Publisher name: University of Paris 13
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
While single-view human action recognition has attracted considerable research study in the last three decades, multi-view action recognition is, still, a less exploited field. This paper provides a comprehensive survey of multi-view human action recognition approaches. The approaches are reviewed following an application-based categorization: methods are categorized based on their ability to operate using a fixed or an arbitrary number of cameras. Finally, benchmark databases frequently used for evaluation of multi-view approaches are briefly described.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Clustering-based Discriminant Analysis (CDA) is a well-known technique for supervised feature extraction and dimensionality reduction. CDA determines an optimal discriminant subspace for linear data projection based on the assumptions of normal subclass distributions and subclass representation by using the mean subclass vector. However, in several cases, there might be other subclass representative vectors that could be more discriminative, compared to the mean subclass vectors. In this paper we propose an optimization scheme aiming at determining the optimal subclass representation for CDA-based data projection. The proposed optimization scheme has been evaluated on standard classification problems, as well as on two publicly available human action recognition databases providing enhanced class discrimination, compared to the standard CDA approach.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper we present a dynamic classification scheme involving Single-hidden Layer Feedforward Neural (SLFN) network-based non-linear data mapping and test sample-specific labeled data selection in multiple levels. The number of levels is dynamically determined by the test sample under consideration, while the use of Extreme Learning Machine (ELM) algorithm for SLFN network training leads to fast operation. The proposed dynamic classification scheme has been applied to human action recognition by employing the Bag of Visual Words (BoVW)-based action video representation providing enhanced classification performance compared to the static classification approach.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Dimensionality reduction of high-dimensional data for visualization has recently been formalized as an information retrieval task where original neighbors of data points are retrieved from the low-dimensional display, and the visualization is optimized to maximize flexible tradeoffs between precision and recall of the retrieval, avoiding misses and false neighbors. The approach has yielded well-performing visualization methods as well as information retrieval interpretations of earlier neighbor embedding methods. However, most of the methods are based on slow gradient search approaches, whereas fast methods are crucial for example in interactive applications. In this paper we propose a fast multiplicative update rule for visualization optimized for information retrieval, and show in experiments it yields equally good results as the previous state of the art gradient based approach but much faster.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This study describes a multimodality images based platform to drive photodynamic therapies of prostate cancer using WST 11 TOOKAD Soluble drug. The platform integrates a pre-treatment planning tool based on magnetic resonance imaging and a per-treatment guidance tool based on transrectal ultrasound images. Evaluation of the platform on clinical data showed that prediction of the therapy outcome was possible with an accuracy of 90 %.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
DBComposer is an R package with a graphical user interface (GUI) to analyze and integrate human gene expression microarray data. With DBComposer, the data can be easily annotated, preprocessed and analyzed in several ways. DBComposer can also serve as a personal expression microarray database allowing users to store multiple datasets together for later retrieval or data analysis. It takes advantage of many R packages for statistics and visualizations, and provides a flexible framework to implement custom workflows to extend the data analysis capabilities.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A channelizer is a part of a receiver front-end subsystem, commonly found in various communication systems, that separates different users or channels. A modern channelizer uses advantages of polyphase filter banks to process multiple channels at the same time, allowing down conversion, downsampling, and filtering all at the same time. However, due to limitations imposed by the structure and requirements of channelizers, their usage is limited and poses significant challenges due to inflexibility using conventional implementation techniques, which are intensively hardware-based. However, with advances in graphics processing unit (GPU) technology, we now have the potential to deliver high computational throughput along with the flexibility of software-based implementation. In this paper, we demonstrate how this potential can be exploited by presenting a novel GPU-based channelizer implementation. Our implementation incorporates methods for eliminating complex buffer managements and performing arbitrary resampling on all channels simultaneously. We also introduce the notion of simultaneously processing many channels as a high data rate parallel receiver system using blocks of threads in the GPU. The multi-channel, flexible, high-throughput, and arbitrary resampling characteristics of our GPU-based channelizer make it attractive for a variety of communication receiver applications.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
While research on the design of heterogeneous concurrent systems has a long and rich history, a unified design methodology and tool support has not emerged so far, and thus the creation of such systems remains a difficult, time-consuming and error-prone process. The absence of principled support for system evaluation and optimization at high abstraction levels makes the quality of the resulting implementation highly dependent on the experience or prejudices of the designer. In this work we present TURNUS, a unified dataflow design space exploration framework for heterogeneous parallel systems. It provides high-level modelling and simulation methods and tools for system level performances estimation and optimization. TURNUS represents the outcome of several years of research in the area of co-design exploration for multimedia stream applications. During the presentation, it will be demonstrated how the initial high-level abstraction of the design facilitates the use of different analysis and optimization heuristics. These guide the designer during validation and optimization stages without requiring low-level implementations of parts of the application. Our framework currently yields exploration and optimization results in terms of algorithmic optimization, rapid performance estimation, application throughput, buffer size dimensioning, and power optimization.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
RVC-CAL is a dataflow language that has acquired an ecosystem of sophisticated design tools. Previous works have shown that RVC-CAL-based applications can automatically be deployed to multiprocessor platforms, as well as hardware descriptions with high efficiency. However, as RVC-CAL is a concurrent language, code generation for a single processor core requires careful application analysis and scheduling. Although much work has been done in this area, to this date no publication has reported that programs generated from RVC-CAL could rival handwritten programs on single-core processors. This paper proposes performance optimization of RVCCAL applications by actor merging at source code level. The proposed methodology is demonstrated with an IEEE 802.15.4 (ZigBee) transmitter case study. The transmitter baseband software, previously written in C, is rewritten in RVC-CAL and optimized with the proposed methodology. Experiments show that on a VLIW-flavored processor the RVC-CAL-based program achieves the performance of manually written software.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Due to the increased complexity of dynamics in modern DSP applications, dataflow-based design methodologies require significant enhancements in modeling and scheduling techniques to provide for efficient and flexible handling of dynamic behavior. In this paper, we address this problem through a new framework that is based on integrating two complementary modeling techniques, core functional dataflow (CFDF) and parameterized synchronous dataflow (PSDF). We apply, in a systematically integrated way, the structured mode-based dynamic dataflow modeling capability of CFDF together with the features of PSDF for dynamic parameter reconfiguration and quasi-static scheduling. We refer to this integrated methodology for mode - and dynamic-parameter - based modeling and scheduling as core functional parameterized synchronous dataflow (CF-PSDF). Through a wireless communication case study involving MIMO detection, we demonstrate the utility of design and implementation using CF-PSDF graphs. Experimental results on this case study demonstrate the efficiency and flexibility of our proposed new CF-PSDF based design methodology.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Strict run-time and resource constraints in wireless sensor networks (WSNs) introduce complex design problems that need to be addressed systematically. Recent processor platforms for WSNs have groups of peripheral devices that are used for data sensing and processing. Hardware interrupts are commonly used as an efficient method for handling data acquisition from such peripherals. Dynamic control for multiple interrupts and efficient handling of power consumption on embedded processors are important issues when implementing dynamic, data-driven signal processing applications, where the structure of processing subsystems may need to be adapted at run-time based characteristics of input data and associated operating conditions. To address these issues, we introduce a dataflow-based design approach based on integrating interrupt-based signal acquisition in the context of parameterized synchronous dataflow (PSDF) modeling. This application of PSDF provides a useful foundation for structured development of power- and energy-efficient wireless sensor network systems for dynamic, data-driven applications systems (DDDAS), including DDDAS that employ intensive acquisition and processing of signals from heterogeneous sensors. To demonstrate our proposed new signal-processing-oriented, dataflow-based design approach - which we refer to as DDPSDF (data-driven PSDF) - we have implemented an embedded speech recognition system using the proposed DDPSDF techniques. We demonstrate that by applying our DDPSDF approach, energy- and resource-efficient embedded software can be derived systematically from high level models of DDDAS functional structure.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The zero-intermediate frequency zero-crossing demodulator (ZIFZCD) is extensively used for demodulating continuous phase frequency shift keying (CPFSK) signals in low power and low cost devices. ZIFZCD has previously been implemented as hardwired circuits. Many variations have been suggested to the ZIFZCD algorithm for different modulation methods and channel conditions. To support all these variants, a programmable processor based implementation of the ZIFZCD is needed. This paper describes a programmable software implementation of ZIFZCD on an application specific processor (ASP). The ASP is based on transport triggered architecture (TTA) and provides an ideal low power platform for ZIFZCD implementation due to its simplicity. The designed processor operates at a maximum clock frequency of 250 MHz and has gate count of 134 kGE for a 32-bit TTA processor and 76 kGE for a 16-bit processor. The demodulator has been developed as a part of an open source radio implementation for wireless sensor nodes.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Visualization of multivariate data sets is often done by mapping data onto a low-dimensional display with nonlinear dimensionality reduction (NLDR) methods. Many NLDR methods are designed for tasks like manifold learning rather than low-dimensional visualization, and can perform poorly in visualization. We have introduced a formalism where NLDR for visualization is treated as an information retrieval task, and a novel NLDR method called the Neighbor Retrieval Visualizer (NeRV) which outperforms previous methods. The remaining concern is that NeRV has quadratic computational complexity with respect to the number of data. We introduce an efficient learning algorithm for NeRV where relationships between data are approximated through mixture modeling, yielding efficient computation with near-linear computational complexity with respect to the number of data. The method inherits the information retrieval interpretation from the original NeRV, it is much faster to optimize as the number of data grows, and it maintains good visualization performance.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Multidimensional synchronous dataflow (MDSDF) provides an effective model of computation for a variety of multidimensional DSP systems that have static dataflow structures. In this paper, we develop new methods for optimized implementation of MDSDF graphs on embedded platforms that employ multiple levels of parallelism to enhance performance at different levels of granularity. Our approach allows designers to systematically represent and transform multi-level parallelism specifications from a common, MDSDF-based application level model. We demonstrate our methods with a case study of image histogram implementation on a graphics processing unit (GPU). Experimental results from this study show that our approach can be used to derive fast GPU implementations, and enhance trade-off analysis during design space exploration.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Local Binary Pattern (LBP) is texture operator used in preprocessing for object detection, tracking, face recognition and fingerprint matching. Many of these applications are performed on embedded devices, which poses limitations on the implementation complexity and power consumption. As LBP features are computed pixelwise, high performance is required for real time extraction of LBP features from high resolution video. This paper presents an application-specific instruction processor for LBP extraction. The compact, yet powerful processor is capable of extracting LBP features from 1280 × 720p (30 fps) video with a reasonable 304 MHz clock rate. With a low power consumption and an area of less than 16k gates the processor is suitable for embedded devices. Experiments present resource and power consumption measured on an FPGA board, along with processor synthesis results. In terms of latency, our processor requires 17.5 × less clock cycles per LBP feature than a workstation implementation and only 2.0 × more than a hardwired ASIC.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Increasing use of multiprocessor system-on-chip (MPSoC) technology is an important trend in the design and implementation of signal processing systems. However, the design of efficient DSP software for MPSoC platforms involves complex inter-related steps, including data decomposition, memory management, and inter-task and inter-thread synchronization. These design steps are challenging, especially under strict constraints on performance and power consumption, and tight time to market pressures. To facilitate these steps, we have developed a new dataflow based design flow within the targeted dataflow interchange format (TDIF) design tool. Our new MPSoC-oriented design flow, called TDIF-PPG, is geared towards analysis and mapping of embedded DSP applications on MPSoCs. An important feature of TDIF-PPG is its capability to integrate graph level parallelism for DSP system flowgraphs and actor level parallelism for DSP functional modules into the application mapping processing. Here, graph level parallelism is exposed by the dataflow graph application representation in TDIF, and actor level parallelism is modeled by a novel model for multiprocessor dataflow graph implementation that we call the parallel processing group (PPG) model. We demonstrate our approach through actor and subsystem design for software defined radio.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents a novel implementation of graphics processing unit (GPU) based symbol timing recovery using polyphase interpolators to detect symbol timing error. Symbol timing recovery is a compute intensive procedure that detects and corrects the timing error in a coherent receiver. We provide optimal sample-time timing recovery using a maximum likelihood (ML) estimator to minimize the timing error. This is an iterative and adaptive system that relies on feedback, therefore, we present an accelerated implementation design by using a GPU for timing error detection (TED), enabling fast error detection by exploiting the 2D filter structure found in the polyphase interpolator. We present this hybrid/heterogeneous CPU and GPU architecture by computing a low complexity and low noise matched filter (MF) while simultaneously performing TED. We then compare the performance of the CPU vs. GPU based timing recovery for different interpolation rates to minimize the error and improve the detection by up to a factor of 35. We further improve the process by utilizing GPU optimization and performing block processing to improve the throughput even more, all while maintaining the lowest possible sampling rate.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In networked signal processing systems, network nodes that perform embedded processing on sensory inputs and other data interact across wired or wireless communication networks. In such applications, the processing on individual network nodes can be described in terms of dataflow graphs. However, to analyze the correctness and performance of these applications, designers must understand the interactions across these individual "node-level" dataflow graphs - as they communicate across the network in addition to the characteristics of the individual graphs. In this paper, we develop a new simulation environment, called the NS-2 - TDIF SIMulation environment (NT-SIM) - that provides integrated co-simulation of networked signal processing systems. NT-SIM systematically combines the network analysis capabilities provided by the Network Simulator (ns) with the scheduling capabilities of a dataflow-based framework, thereby providing novel features for more comprehensive simulation of networked signal processing systems. Through a novel integration of tools for network and dataflow graph simulation, our NT-SIM environment allows comprehensive simulation and analysis of networked signal processing systems. We present a case study that concretely demonstrates the utility of NT-SIM in the context of a heterogeneous signal processing system design.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In recent work, a graphical modeling construct called "topological patterns" has been shown to enable concise representation and direct analysis of repetitive dataflow graph sub-structures in the context of design methods and tools for digital signal processing systems [1]. In this paper, we present a formal design method for specifying topological patterns and deriving parameterized schedules from such patterns based on a novel schedule model called the scalable schedule tree. The approach represents an important class of parameterized schedule structures in a form that is intuitive for representation and efficient for code generation. We demonstrate our methods for topological pattern representation, scalable schedule tree derivation, and associated dataflow graph code generation using a case study for image processing.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The RVC-CAL dataflow language has recently become standardized through its use as the official language of Recon-figurable Video Coding (RVC), a recent standard by MPEG. The tools developed for RVC-CAL have enabled the transformation of RVC-CAL dataflow programs into C language and VHDL (among others), enabling implementations for instruction processors and HDL synthesis. This paper introduces new tools that enable automatic creation of heterogeneous multiprocessor networks out of RVC-CAL dataflow programs. Each processor in the network performs the functionality of one RVC-CAL actor. The processors are of the Transport Triggered Architecture (TTA) type, for which a complete co-design toolset exists. The existing tools enable customizing the processors according to the requirements of individual dataflow actors. The functionality of the tool chain has been demonstrated by synthesizing an MPEG-4 Simple Profile video decoder to an FPGA. This particular decoder is automatically realized into 21 tiny, heterogeneous processors.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A variety of multiprocessor architectures have proliferated even for off-the-shelf computing platforms. To improve performance and productivity for common heterogeneous systems, we have developed a workflow to generate efficient solutions. By starting with a formal description of an application and the mapping problem we are able to generate a range of designs that efficiently trade-of latency and throughput. In this approach, efficient utilization of SIMD cores is achieved by applying extensive block processing in conjunction with efficient mapping and scheduling. We demonstrate our approach through an integration into the GNU Radio environment for software defined radio system design.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Methods for blind estimation of signal dependent noise parameters from scatter-plots by polynomial regression are considered. Some new modifications as well as known ones are discussed and their performance is compared for test images with simulated signal dependent noise. Recommendations on method application and parameter setting are given.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Embedded signal processing has witnessed explosive growth in recent years in both scientific and consumer applications, driving the need for complex, high-performance signal processing systems that are largely application driven. In order to efficiently implement these systems on programmable platforms such as digital signal processors (DSPs), it is important to analyze and optimize the application design from early stages of the design process. A key performance concern for designers is choosing the data format. In this work, we propose a systematic and efficient design flow involving model-based design to analyze application data sets and precision requirements. We demonstrate this design flow with an exploration study into the required precision for eigenvalue decomposition (EVD) using the Jacobi algorithm. We demonstrate that with a high degree of structured analysis and automation, we are able to analyze the data set to derive an efficient data format, and optimize important parts of the algorithm with respect to precision.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Sensor node processing in resource-aware sensor networks is often critically dependent on dynamic signal processing functionality - i.e., signal processing functionality in which computational structure must be dynamically assessed and adapted based on time-varying environmental conditions, operating constraints or application requirements. In dynamic signal processing systems, it is important to provide flexibility for run-time adaptation of application behavior and execution characteristics, but in the domain of resource-aware sensor networks, such flexibility cannot come with significant costs in terms of power consumption overhead or reduced predictability. In this paper, we review a variety of complementary models of computation that are being developed as part of the dataflow interchange format (DIF) project to facilitate efficient and reliable implementation of dynamic signal processing systems. We demonstrate these methods in the context of resource-aware sensor networks.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
CAL is a dataflow oriented language for writing high-level specifications of signal processing applications. The language has recently been standardized and selected for the new MPEG Reconfigurable Video Coding standard.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
IEEE 802.11 wireless networks have received much attention over the past number of years. Still certain aspects of behavior of wireless networks have not been studied well enough. For example, understanding MAC layer packet delay distribution remains challenging yet. However, obtaining such distribution is highly beneficial for modeling QoS provided by wireless networks. This paper proposes a way of obtaining MAC delay distribution in case of single-hop networks. The proposed way is based on theory of terminating renewal processes and delivers approximation of good precision.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Contribution: organisation=sgn,FACT1=1
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The additive white Gaussian noise (AWGN) model is ubiquitous in signal processing. This model is often justified by central-limit theorem (CLT) arguments. However, whereas the CLT may support a Gaussian distribution for the random errors, it does not provide any justification for the assumed additivity and whiteness. As a matter of fact, data acquired in real applications can seldom be described with good approximation by the AWGN model, especially because errors are typically correlated and not additive. Failure to model accurately the noise leads to inaccurate analysis, ineffective filtering, and distortion or even failure in the estimation. This chapter provides an introduction to both signal-dependent and correlated noise and to the relevant models and basic methods for the analysis and estimation of these types of noise. Generic one-parameter families of distributions are used as the essential mathematical setting for the observed signals. The distribution families covered as leading examples include Poisson, mixed Poisson–Gaussian, various forms of signal-dependent Gaussian noise (including multiplicative families and approximations of the Poisson family), as well as doubly censored heteroskedastic Gaussian distributions. We also consider various forms of noise correlation, encompassing pixel and readout cross-talk, fixed-pattern noise, column/row noise, etc., as well as related issues like photo-response and gain nonuniformity. The introduced models and methods are applicable to several important imaging scenarios and technologies, such as raw data from digital camera sensors, various types of radiation imaging relevant to security and to biomedical imaging.
Research output: Chapter in Book/Report/Conference proceeding › Chapter › Scientific › peer-review
In this chapter, we discuss the state of the art and future challenges in adaptive stream mining systems for computer vision. Adaptive stream mining in this context involves the extraction of knowledge from image and video streams in real-time, and from sources that are possibly distributed and heterogeneous. With advances in sensor and digital processing technologies, we are able to deploy networks involving large numbers of cameras that acquire increasing volumes of image data for diverse applications in monitoring and surveillance. However, to exploit the potential of such extensive networks for image acquisition, important challenges must be addressed in efficient communication and analysis of such data under constraints on power consumption, communication bandwidth, and end-to-end latency. We discuss these challenges in this chapter, and we also discuss important directions for research in addressing such challenges using dynamic, data-driven methodologies.
Research output: Chapter in Book/Report/Conference proceeding › Chapter › Scientific › peer-review
Maps of RSS from a wireless transmitter can be used for positioning or for planning wireless infrastructure. The RSS values measured at a single point are not always the same, but follow some distribution, which vary from point to point. In existing approaches in the literature this variation is neglected or its mapping requires making many measurements at every point, which makes the measurement collection very laborious. We propose to use GMs for modeling joint distributions of the position and the RSS value. The proposed model is more versatile than methods found in the literature as it models the joint distribution of RSS measurements and the location space. This allows us to model the distributions of RSS values in every point of space without making many measurement in every point. In addition, GMs allow us to compute conditional probabilities and posteriors of position in closed form. The proposed models can model any RSS attenuation pattern, which is useful for positioning in multifloor buildings. Our tests with WLAN signals show that positioning with the proposed algorithm provides accurate position estimates. We conclude that the proposed algorithm can provide useful information about distributions of RSS values for different applications.
Research output: Contribution to journal › Article › Scientific › peer-review
Decline in respiratory regulation demonstrates the primary forewarning for the onset of physiological aberrations. In clinical environment, the obtrusive nature and cost of instrumentation have retarded the integration of continuous respiration monitoring for standard practice. Photoplethysmography (PPG) presents a non-invasive, optical method of assessing blood flow dynamics in peripheral vasculature. Incidentally, respiration couples as a surrogate constituent in PPG signal, justifying respiratory rate (RR) estimation. The physiological processes of respiration emerge as distinctive oscillations that are fluctuations in various parameters extracted from PPG signal. We propose a novel algorithm designed to account for intermittent diminishment of the respiration induced variabilities (RIV) by a fusion-based enhancement of wavelet synchrosqueezed spectra. We have combined the information on intrinsic mode functions (IMF) of five RIVs to enhance mutually occurring, instantaneous frequencies of the spectra. The respiration rate estimate is obtained by tracking the spectral ridges with a particle filter. We have evaluated the method with a dataset recorded from 29 young adult subjects (mean: 24.17 y, SD: 4.19 y) containing diverse, voluntary, and periodically metronome-assisted respiratory patterns. Bayesian inference on fusion-enhanced Respiration Induced Frequency Variability (RIFV) indicated MAE and RMSE of 1.764 and 3.996 BPM, respectively. The fusion approach was deemed to improve MAE and RMSE of RIFV by 0.185 BPM (95% HDI: 0.0285-0.3488, effect size: 0.548) and 0.250 BPM (95% HDI: 0.0733-0.431, effect size: 0.653), respectively, with further pronounced improvements to other RIVs. We conclude that the fusion of variability signals proves important to IMF localization in the spectral estimation of RR.
INT=bmte,"Pirhonen, Mikko"
Research output: Contribution to journal › Article › Scientific › peer-review
Audio source separation is usually achieved by estimating the short-time Fourier transform (STFT) magnitude of each source, and then applying a spectrogram inversion algorithm to retrieve time-domain signals. In particular, the multiple input spectrogram inversion (MISI) algorithm has been exploited successfully in several recent works. However, this algorithm suffers from two drawbacks, which we address in this letter. First, it has originally been introduced in a heuristic fashion: we propose here a rigorous optimization framework in which MISI is derived, thus proving the convergence of this algorithm. Besides, while MISI operates offline, we propose here an online version of MISI called oMISI, which is suitable for low-latency source separation, an important requirement for e.g., hearing aids applications. oMISI also allows one to use alternative phase initialization schemes exploiting the temporal structure of audio signals. Experiments conducted on a speech separation task show that oMISI performs as well as its offline counterpart, thus demonstrating its potential for real-time source separation.
EXT="Magron, Paul"
Research output: Contribution to journal › Article › Scientific › peer-review
Facial pacing systems aim to reanimate paralyzed facial muscles with electrical stimulation. To aid the development of such systems, the frontalis muscle responsible for eyebrow raising was transcutaneously stimulated in 12 healthy participants using four waveforms: square wave, square wavelet, sine wave, and sinusoidal wavelet. The aim was to investigate the effects of the waveform on muscle activation magnitude, perceived discomfort, and the relationship between the stimulus signal amplitude and the magnitude of evoked movement. The magnitude of movement was measured offline using video recordings and compared to the magnitude of maximum voluntary movement (MVM) of eyebrows. Results showed that stimulations evoked forehead movement at a magnitude comparable to the MVM in 67% of the participants and close to comparable (80% of the MVM) in 92%. All the waveforms were equally successful in evoking movements. Perceived discomfort did not differ between the waveforms in relation to the movement magnitude, but some individual preferences did exist. Further, regression analysis showed a statistically significant linear relation between stimulation amplitudes and the evoked movement in 98% of the cases. As the waveforms performed equally well in evoking muscle activity, the waveform in pacing systems could be selected by emphasizing technical aspects such as the possibility to suppress stimulation artifacts from simultaneous electromyography measurement.
DUPL=53532026
Research output: Contribution to journal › Article › Scientific › peer-review
The authors consider the problem of compressive sensed video recovery via iterative thresholding algorithm. Traditionally, it is assumed that some fixed sparsifying transform is applied at each iteration of the algorithm. In order to improve the recovery performance, at each iteration the thresholding could be applied for different transforms in order to obtain several estimates for each pixel. Then the resulting pixel value is computed based on obtained estimates using simple averaging. However, calculation of the estimates leads to significant increase in reconstruction complexity. Therefore, the authors propose a heuristic approach, where at each iteration only one transform is randomly selected from some set of transforms. First, they present simple examples, when block-based 2D discrete cosine transform is used as the sparsifying transform, and show that the random selection of the block size at each iteration significantly outperforms the case when fixed block size is used. Second, building on these simple examples, they apply the proposed approach when video block-matching and 3D filtering (VBM3D) is used for the thresholding and show that the random transform selection within VBM3D allows to improve the recovery performance as compared with the recovery based on VBM3D with fixed transform.
EXT="Belyaev, Evgeny"
Research output: Contribution to journal › Article › Scientific › peer-review
This paper studies vehicle attribute recognition by appearance. In the literature, image-based target recognition has been extensively investigated in many use cases, such as facial recognition, but less so in the field of vehicle attribute recognition. We survey a number of algorithms that identify vehicle properties ranging from coarse-grained level (vehicle type) to fine-grained level (vehicle make and model). Moreover, we discuss two alternative approaches for these tasks, including straightforward classification and a more flexible metric learning method. Furthermore, we design a simulated real-world scenario for vehicle attribute recognition and present an experimental comparison of the two approaches.
Research output: Contribution to journal › Article › Scientific › peer-review
Efficient mitigation of power amplifier (PA) nonlinear distortion in multi-user hybrid precoding based broadband mmWave systems is an open research problem. In this article, we carry out detailed signal and distortion modeling in broadband multi-user hybrid MIMO systems, with a bank of nonlinear PAs in each subarray, while also take the inevitable crosstalk between the antenna/PA branches into account. Building on the derived models, we adopt and describe an efficient closed-loop (CL) digital predistortion (DPD) solution that utilizes only a single-input DPD unit per transmit chain or subarray, despite crosstalk, providing thus substantial complexity-benefit compared to the state-of-the art multi-dimensional DPD solutions. We show that under spatially correlated multipath propagation, each single-input DPD unit can provide linearization towards every intended user, or more generally, towards all spatial directions where coherent propagation is taking place, and that the adopted CL DPD system is robust against crosstalk. Extensive numerical results building on practical measurement-based mmWave PA models are provided, demonstrating and verifying the excellent linearization performance of the overall DPD system in different evaluation scenarios.
EXT="Abdelaziz, Mahmoud"
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we propose a novel method for projecting data from multiple modalities to a new subspace optimized for one-class classification. The proposed method iteratively transforms the data from the original feature space of each modality to a new common feature space along with finding a joint compact description of data coming from all the modalities. For data in each modality, we define a separate transformation to map the data from the corresponding feature space to the new optimized subspace by exploiting the available information from the class of interest only. We also propose different regularization strategies for the proposed method and provide both linear and non-linear formulations. The proposed Multimodal Subspace Support Vector Data Description outperforms all the competing methods using data from a single modality or fusing data from all modalities in four out of five datasets.
EXT="Iosifidis, Alexandros"
Research output: Contribution to journal › Article › Scientific › peer-review
Shearlet Transform (ST) has been instrumental for the Densely-Sampled Light Field (DSLF) reconstruction, as it sparsifies the underlying Epipolar-Plane Images (EPIs). The sought sparsification is implemented through an iterative regularization, which tends to be slow because of the time spent on domain transformations for dozens of iterations. To overcome this limitation, this letter proposes a novel self-supervised DSLF reconstruction method, CycleST, which employs ST and cycle consistency. Specifically, CycleST is composed of an encoder-decoder network and a residual learning strategy that restore the shearlet coefficients of densely-sampled EPIs using EPI-reconstruction and cycle-consistency losses. CycleST is a self-supervised approach that can be trained solely on Sparsely-Sampled Light Fields (SSLFs) with small disparity ranges (⩽ 8 pixels). Experimental results of DSLF reconstruction on SSLFs with large disparity ranges (16-32 pixels) demonstrate the effectiveness and efficiency of the proposed CycleST method. Furthermore, CycleST achieves ∼ 9x speedup over ST, at least.
Research output: Contribution to journal › Article › Scientific › peer-review
We propose a novel classifier accuracy metric: the Bayesian Area Under the Receiver Operating Characteristic Curve (CBAUC). The method estimates the area under the ROC curve and is related to the recently proposed Bayesian Error Estimator. The metric can assess the quality of a classifier using only the training dataset without the need for computationally expensive cross-validation. We derive a closed-form solution of the proposed accuracy metric for any linear binary classifier under the Gaussianity assumption, and study the accuracy of the proposed estimator using simulated and real-world data. These experiments confirm that the closed-form CBAUC is both faster and more accurate than conventional AUC estimators.
EXT="Tohka, Jussi"
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we propose, describe, and test a modification of the K-SVD algorithm. Given a set of training data, the proposed algorithm computes an overcomplete dictionary by minimizing the β-divergence (β>=1) between the data and its representation as linear combinations of atoms of the dictionary, under strict sparsity restrictions. For the special case β=2, the proposed algorithm minimizes the Frobenius norm and, therefore, for β=2 the proposed algorithm is equivalent to the original K-SVD algorithm. We describe the modifications needed and discuss the possible shortcomings of the new algorithm. The algorithm is tested with random matrices and with an example based on speech separation.
Research output: Contribution to journal › Article › Scientific › peer-review
The partial shading conditions significantly affect the functionality of solar power plants despite the presence of multiple maximum power point tracking systems. The primary cause of this problem is the presence of local maxima in the power–current and/or power–voltage characteristic curves that restrict the functionality of the conventional maximum power point tracking systems. The present article proposes a modified algorithm based on the simplified equivalent circuit of solar cells to improve the functionality of traditional maximum power point tracking systems. This algorithm provides a method for regularly monitoring the photo-current of each solar module. The upper and lower boundaries of the regulating parameter such as current or voltage are decided very precisely, which is helpful to find the location of the global maximum. During a sequential search, the control system accurately determines the lower and upper boundaries of the global maximum. Simultaneously, the maximum power point tracking system increases the photovoltaic current up to one of these boundaries and applies one of the conventional algorithms. Additionally, the control system regularly monitors the photovoltaic characteristics and changes the limits of regulating parameter concerning any change in global maximum location. This proposed method is fast and precise to locate the global maximum boundaries and to track global maximum even under fast-changing partial shading conditions. The improved performance and overall efficiency are validated by simulation study for variable solar irradiance.
Research output: Contribution to journal › Article › Scientific › peer-review
In this work, we consider the problem of single-query 6-DoF camera pose estimation, i.e. estimating the position and orientation of a camera by using reference images and a point cloud. We perform a systematic comparison of three state-of-the-art strategies for 6-DoF camera pose estimation: feature-based, photometric-based and mutual-information-based approaches. Two standard datasets with self-driving setups are used for experiments, and the performance of the studied methods is evaluated in terms of success rate, translation error and maximum orientation error. Building on the analysis of the results, we evaluate a hybrid approach that combines feature-based and mutual-information-based pose estimation methods to benefit from their complementary properties for pose estimation. Experiments show that (1) in cases with large appearance change between query and reference, the hybrid approach outperforms feature-based and mutual-information-based approaches by an average increment of 9.4% and 8.7% in the success rate, respectively; (2) in cases where query and reference images are captured at similar imaging conditions, the hybrid approach performs similarly as the feature-based approach, but outperforms both photometric-based and mutual-information-based approaches with a clear margin; (3) the feature-based approach is consistently more accurate than mutual-information-based and photometric-based approaches when at least 4 consistent matching points are found between the query and reference images.
EXT="Matas, Jiri"
Research output: Contribution to journal › Article › Scientific › peer-review
This paper presents a novel multi-sensor non-stationary EEG model; it is obtained by combining state of the art mono-sensor newborn EEG simulators, a multilayer newborn head model comprised of four homogeneous concentric spheres, a multi-sensor propagation scheme based on array processing and optical dispersion to calculate inter-channel attenuation and delay, and lastly, a multi-variable optimization paradigm using particle swarm optimization and Monte-Carlo simulations to validate the model for optimal conditions. Multi-sensor EEG of 7 newborns, comprised of seizure and background epochs, are analyzed using time-space, time-frequency, power maps and multi-sensor causality techniques. The outcomes of these methods are validated by medical insights and serve as a backbone for any assumptions and as performance benchmarks for the model to be evaluated against. The results obtained with the developed model show 85.7% averaged time-frequency correlation (which is the selected measure for similarity with real EEG)with 5.9% standard deviation, and the averaged error obtained is 34.6% with 8% standard deviation. The resulting performances indicate that the proposed model provides a suitable matching fit with real EEG in terms of their probability density function, inter-sensor attenuation and translation, and multi-sensor causality. They also demonstrate the model flexibility to generate new unseen samples by utilizing user-defined parameters, making it suitable for other relevant applications.
Research output: Contribution to journal › Article › Scientific › peer-review
As the Internet of Vehicles matures and acquires its social flavor, novel wireless connectivity enablers are being demanded for reliable data transfer in high-rate applications. The recently ratified New Radio communications technology operates in millimeter-wave (mmWave) spectrum bands and offers sufficient capacity for bandwidth-hungry services. However, seamless operation over mmWave is difficult to maintain on the move, since such extremely high frequency radio links are susceptible to unexpected blockage by various obstacles, including vehicle bodies. As a result, proactive mode selection, that is, migration from infrastructure- to vehicle-based connections and back, is becoming vital to avoid blockage situations. Fortunately, the very social structure of interactions between the neighboring smart cars and their passengers may be leveraged to improve session continuity by relaying data via proximate vehicles. This paper conceptualizes the socially inspired relaying scenarios, conducts underlying mathematical analysis, continues with a detailed 3-D modeling to facilitate proactive mode selection, and concludes by discussing a practical prototype of a vehicular mmWave platform.
Research output: Contribution to journal › Article › Scientific › peer-review
Given the recent surge in developments of deep learning, this paper provides a review of the state-of-the-art deep learning techniques for audio signal processing. Speech, music, and environmental sound processing are considered side-by-side, in order to point out similarities and differences between the domains, highlighting general methods, problems, key references, and potential for cross fertilization between areas. The dominant feature representations (in particular, log-mel spectra and raw waveform) and deep learning models are reviewed, including convolutional neural networks, variants of the long short-term memory architecture, as well as more audio-specific neural network models. Subsequently, prominent deep learning application areas are covered, i.e., audio recognition (automatic speech recognition, music information retrieval, environmental sound detection, localization and tracking) and synthesis and transformation (source separation, audio enhancement, generative models for speech, sound, and music synthesis). Finally, key issues and future questions regarding deep learning applied to audio signal processing are identified.