Gamification is increasingly becoming a pertinent aspect of any UI and UX design. However, a canonical dearth in research and application of gamification has been related to the role of individual differences in susceptibility to gamification and its varied designs. To address this gap, this study reviews the extant corpus of research on tailored gamification (42 studies). The findings of the review indicate that most studies on the field are mostly focused on user modeling for a future personalization, adaptation, or recommendation of game elements. This user model usually contains the users’ preferences of play (i.e., player types), and is mostly applied in educational settings. The main contributions of this paper are a standardized terminology of the game elements used in tailored gamification, the discussion on the most suitable game elements for each users’ characteristic, and a research agenda including dynamic modeling, exploring multiple characteristics simultaneously, and understanding the effects of other aspects of the interaction on user experience.
Research output: Contribution to journal › Review Article › Scientific › peer-review
Gamification is increasingly employed in learning environments as a way to increase student motivation and consequent learning outcomes. However, while the research on the effectiveness of gamification in the context of education has been growing, there are blind spots regarding which types of gamification may be suitable for different educational contexts. This study investigates the effects of the challenge-based gamification on learning in the area of statistics education. We developed a gamification approach, called Horses for Courses, which is composed of main game design patterns related to the challenge-based gamification; points, levels, challenges and a leaderboard. Having conducted a 2 (read: yes vs. no) x 2 (gamification: yes vs. no) between-subject experiment, we present a quantitative analysis of the performance of 365 students from two different academic majors: Electrical and Computer Engineering (n=279), and Business Administration (n=86). The results of our experiments show that the challenge-based gamification had a positive impact on student learning compared to traditional teaching methods (compared to having no treatment and treatment involving reading exercises). The effect was larger for females or for students at the School of Electrical and Computer Engineering.
Research output: Contribution to journal › Article › Scientific › peer-review
Context: Companies frequently invest effort to remove technical issues believed to impact software qualities, such as removing anti-patterns or coding styles violations. Objective: We aim to analyze the diffuseness of SonarQube issues in software systems and to assess their impact on code changes and fault-proneness, considering also their different types and severities. Methods: We conducted a case study among 33 Java projects from the Apache Software Foundation repository. Results: We analyzed 726 commits containing 27K faults and 12M changes in Java files. The projects violated 173 SonarQube rules generating more than 95K SonarQube issues in more than 200K classes. Classes not affected by SonarQube issues are less change-prone than affected ones, but the difference between the groups is small. Non-affected classes are slightly more change-prone than classes affected by SonarQube issues of type Code Smell or Security Vulnerability. As for fault-proneness, there is no difference between non-affected and affected classes. Moreover, we found incongruities in the type and severity assigned by SonarQube. Conclusion: Our result can be useful for practitioners to understand which SonarQube issues should be refactored and for researchers to bridge the missing gaps. Moreover, results can also support companies and tool vendors in identifying SonarQube issues as accurately as possible.
EXT="Lenarduzzi, Valentina"
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, the potential of extending 5G New Radio physical layer solutions to support communications in sub-THz frequencies is studied. More specifically, we introduce the status of third generation partnership project studies related to operation on frequencies beyond 52.6 GHz and note also the recent proposal on spectrum horizons provided by federal communications commission (FCC) related to experimental licenses on 95 GHz-3 THz frequency band. Then, we review the power amplifier (PA) efficiency and output power challenge together with the increased phase noise (PN) distortion effect in terms of the supported waveforms. As a practical example on the waveform and numerology design from the perspective of the PN robustness, link performance results using 90 GHz carrier frequency are provided. The numerical results demonstrate that new, higher subcarrier spacings are required to support high throughput, which requires larger changes in the physical layer design. It is also observed that new phase-tracking reference signal designs are required to make the system robust against PN. The results illustrate that single-carrier frequency division multiple access is significantly more robust against PN and can provide clearly larger PA output power than cyclic-prefix orthogonal frequency division multiplexing, and is therefore a highly potential waveform for sub-THz communications.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Serverless, the new buzzword, has been gaining a lot of attention from the developers and industry. Cloud vendors such as AWS and Microsoft have hyped the architecture almost everywhere, from practitioners' conferences to local events, to blog posts. In this work, we introduce serverless functions (also known as Function-as-a-Service or FaaS), together with on bad practices experienced by practitioners, members of the Tampere Serverless Meetup group.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Networking devices such as switches and routers have traditionally had fixed functionality. They have the logic for the union of network protocols matching the application and market segment for which they have been designed. Possibility of adding new functionality is limited. One of the aims of Software Defined Networking is to make packet processing devices programmable. This provides for innovation and rapid deployment of novel networking protocols. The first step in processing of packets is packet parsing. In this paper, we present a custom processor for packet parsing. The parser is protocol-independent and can be programmed to parse any sequence of headers. It does so without the use of a Ternary Content Addressable Memory. As a result, the area and power consumption are noticeably smaller than in the state of the art. Moreover, its output is the same as that of the parser used in the Reconfigurable Match Tables (RMT). With an area no more than that of parsers in the RMT architecture, it sustains aggregate throughput of 3.4 Tbps in the worst case which is an improvement by a factor of 5.
Research output: Contribution to journal › Article › Scientific › peer-review
The popularity of tools for analyzing Technical Debt, and particularly the popularity of SonarQube, is increasing rapidly. SonarQube proposes a set of coding rules, which represent something wrong in the code that will soon be reflected in a fault or will increase maintenance effort. However, our local companies were not confident in the usefulness of the rules proposed by SonarQube and contracted us to investigate the fault-proneness of these rules. In this work we aim at understanding which SonarQube rules are actually fault-prone and to understand which machine learning models can be adopted to accurately identify fault-prone rules. We designed and conducted an empirical study on 21 well-known mature open-source projects. We applied the SZZ algorithm to label the fault-inducing commits. We analyzed the fault-proneness by comparing the classification power of seven machine learning models. Among the 202 rules defined for Java by SonarQube, only 25 can be considered to have relatively low fault-proneness. Moreover, violations considered as 'bugs' by SonarQube were generally not fault-prone and, consequently, the fault-prediction power of the model proposed by SonarQube is extremely low. The rules applied by SonarQube for calculating technical debt should be thoroughly investigated and their harmfulness needs to be further confirmed. Therefore, companies should carefully consider which rules they really need to apply, especially if their goal is to reduce fault-proneness.
EXT="Lenarduzzi, Valentina"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We propose a full processing pipeline to acquire anthropometric measurements from 3D measurements. The first stage of our pipeline is a commercial point cloud scanner. In the second stage, a pre-defined body model is fitted to the captured point cloud. We have generated one male and one female model from the SMPL library. The fitting process is based on non-rigid iterative closest point algorithm that minimizes overall energy of point distance and local stiffness energy terms. In the third stage, we measure multiple circumference paths on the fitted model surface and use a nonlinear regressor to provide the final estimates of anthropometric measurements. We scanned 194 male and 181 female subjects, and the proposed pipeline provides mean absolute errors from 2.5 to 16.0 mm depending on the anthropometric measurement.
Research output: Contribution to journal › Article › Scientific › peer-review
The present contribution proposes a spectrally efficient censor-based cooperative spectrum sensing (C-CSS) approach in a sustainable cognitive radio network that consists of multiple antenna nodes and experiences imperfect sensing and reporting channels. In this context, exact analytic expressions are first derived for the corresponding probability of detection, probability of false alarm, and secondary throughput, assuming that each secondary user (SU) sends its detection outcome to a fusion center only when it has detected a primary signal. Capitalizing on the findings of the analysis, the effects of critical measures, such as the detection threshold, the number of SUs, and the number of employed antennas, on the overall system performance are also quantified. In addition, the optimal detection threshold for each antenna based on the Neyman-Pearson criterion is derived and useful insights are developed on how to maximize the system throughput with a reduced number of SUs. It is shown that the C-CSS approach provides two distinct benefits compared with the conventional sensing approach, i.e., without censoring: i) the sensing tail problem, which exists in imperfect sensing environments, can be mitigated; and ii) less SUs are ultimately required to obtain higher secondary throughput, rendering the system more sustainable.
Research output: Contribution to journal › Article › Scientific › peer-review
Background: Pull requests are a common practice for making contributions and reviewing them in both open-source and industrial contexts.
Objective: Our goal is to understand whether quality flaws such as code smells, anti-patterns, security vulnerabilities, and coding style violations in a pull request's code affect the chance of its acceptance when reviewed by a maintainer of the project.
Method: We conducted a case study among 28 Java open-source projects, analyzing the presence of 4.7 M code quality flaws in 36 K pull requests. We analyzed further correlations by applying logistic regression and six machine learning techniques. Moreover, we manually validated 10% of the pull requests to get further qualitative insights on the importance of quality issues in cases of acceptance and rejection.
Results: Unexpectedly, quality flaws measured by PMD turned out not to affect the acceptance of a pull request at all. As suggested by other works, other factors such as the reputation of the maintainer and the importance of the delivered feature might be more important than other qualities in terms of pull request acceptance.
Conclusions:. Researchers have already investigated the influence of the developers’ reputation and the pull request acceptance. This is the first work investigating code style violations and specifically PMD rules. We recommend that researchers further investigate this topic to understand if different measures or different tools could provide some useful measures.
EXT="Lenarduzzi, Valentina"
INT=comp,"Nikkola, Vili"
Research output: Contribution to journal › Article › Scientific › peer-review
Background: The migration from a monolithic system to microservices requires a deep refactoring of the system. Therefore, such a migration usually has a big economic impact and companies tend to postpone several activities during this process, mainly to speed up the migration itself, but also because of the demand for releasing new features.
Objective: We monitored the technical debt of an SME while it migrated from a legacy monolithic system to an ecosystem of microservices. Our goal was to analyze changes in the code technical debt before and after the migration to microservices.
Method: We conducted a case study analyzing more than four years of the history of a twelve-year-old project (280K Lines of Code) where two teams extracted five business processes from the monolithic system as microservices. For the study, we first analyzed the technical debt with SonarQube and then performed a qualitative study with company members to understand the perceived quality of the system and the motivation for possibly postponed activities.
Results: The migration to microservices helped to reduce the technical debt in the long run. Despite an initial spike in the technical debt due to the development of the new microservice, after a relatively short period of time the technical debt tended to grow slower than in the monolithic system.
EXT="Lenarduzzi, Valentina"
Research output: Contribution to journal › Review Article › Scientific › peer-review
In this paper, we investigate the energy efficiency of conventional collaborative compressive sensing (CCCS) scheme, focusing on balancing the tradeoff between energy efficiency and detection accuracy in cognitive radio environment. In particular, we derive the achievable throughput, energy consumption and energy efficiency of the CCCS scheme, and formulate an optimization problem to determine the optimal values of parameters which maximize the energy efficiency of the CCCS scheme. The maximization of energy efficiency is proposed as a multi-variable, non-convex optimization problem, and we provide approximations to reduce it to a convex optimization problem. We highlight that errors due to these approximations are negligible. Subsequently, we analytically characterize the tradeoff between dimensionality reduction and collaborative sensing performance of the CCCS scheme i.e. the implicit tradeoff between energy saving and detection accuracy. It is shown that the resulting loss due to compression can be recovered through collaboration, which improves the overall energy efficiency of the system.
Research output: Contribution to journal › Article › Scientific › peer-review
This paper studies vehicle attribute recognition by appearance. In the literature, image-based target recognition has been extensively investigated in many use cases, such as facial recognition, but less so in the field of vehicle attribute recognition. We survey a number of algorithms that identify vehicle properties ranging from coarse-grained level (vehicle type) to fine-grained level (vehicle make and model). Moreover, we discuss two alternative approaches for these tasks, including straightforward classification and a more flexible metric learning method. Furthermore, we design a simulated real-world scenario for vehicle attribute recognition and present an experimental comparison of the two approaches.
Research output: Contribution to journal › Article › Scientific › peer-review
This paper explores the activity of coding with smart toy robots Dash and Botley as a part of playful learning in the Finnish early education context. The findings of our study demonstrate how coding with the two toy robots was approached, conducted and played by Finnish preschoolers aged 5-6 years. The main conclusion of the study is that preschoolers used the toy robots with affordances related to coding mainly in developing gamified play around them by designing tracks for the toys, programming the toys to solve obstacle paths, and competing in player-generated contests of dexterity, speed and physically mobile play.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The limitations of state-of-the-art cellular modems prevent achieving low-power and low-latency Machine Type Communications (MTC) based on current power saving mechanisms alone. Recently, the concept of wake-up scheme has been proposed to enhance battery lifetime of 5G devices, while reducing the buffering delay. The existing wake-up algorithms use static operational parameters that are determined by the radio access network at the start of the userâ™s session. In this paper, the average power consumption of the wake-up enabled MTC UE is modeled by using a semi-Markov process and then optimized through a delay-constrained optimization problem, by which the optimal wake-up cycle is obtained in closed form. Numerical results show that the proposed solution reduces the power consumption of an optimized Discontinuous Reception (DRX) scheme by up to 40% for a given delay requirement.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Providing sufficient mobile coverage during mass public events or critical situations is a highly challenging task for the network operators. To fulfill the extreme capacity and coverage demands within a limited area, several augmenting solutions might be used. Among them, novel technologies like a fleet of compact base stations mounted on Unmanned Aerial Vehicles (UAVs) are gaining momentum because of their time- and cost- efficient deployment. Despite the fact that the concept of aerial wireless access networks has been investigated recently in many research studies, there are still numerous practical aspects that require further understanding and extensive evaluation. Taking this as a motivation, in this paper, we develop the concept of continuous wireless coverage provisioning by the means of UAVs and assess its usability in mass scenarios with thousands of users. With our system-level simulations as well as a measurement campaign, we take into account a set of important parameters including weather conditions, UAV speed, weight, power consumption, and millimeter- wave (mmWave) antenna configuration. As a result, we provide more realistic data about the performance of the access and backhaul links together with the practical lessons learned about the design and real-world applicability of the UAV-enabled wireless access networks.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In Global Software Development (GSD), the additional complexity caused by global distance requires processes to ease collaboration difficulties, reduce communication overhead, and improve control. How development tasks are broken down, shared and prioritized is key to project success. While the related literature provides some support for architects involved in GSD, guidelines are far from complete. This paper presents a GSD Architectural Practice Framework reflecting the views of software architects, all of whom are working in a distributed setting. In-depth interviews with architects from seven different GSD organizations revealed a complex set of challenges and practices. We found that designing software for distributed teams requires careful selection of practices that support understanding and adherence to defined architectural plans across sites. Teams used Scrum which aided communication, and Continuous Integration which helped solve synchronization issues. However, teams deviated from the design, causing conflicts. Furthermore, there needs to be a balance between the self-organizing Scrum team methodology and the need to impose architectural design decisions across distributed sites. The research presented provides an enhanced understanding of architectural practices in GSD companies. Our GSD Architectural Practice Framework gives practitioners a cohesive set of warnings, which for the most part, are matched by recommendations.
Research output: Contribution to journal › Article › Scientific › peer-review
A major challenge in modelling and simulation is the need to combine expertise in both software technologies and a given scientific domain. When High-Performance Computing (HPC) is required to solve a scientific problem, software development becomes a problematic issue. Considering the complexity of the software for HPC, it is useful to identify programming languages that can be used to alleviate this issue. Because the existing literature on the topic of HPC is very dispersed, we performed a Systematic Mapping Study (SMS) in the context of the European COST Action cHiPSet. This literature study maps characteristics of various programming languages for data-intensive HPC applications, including category, typical user profiles, effectiveness, and type of articles. We organised the SMS in two phases. In the first phase, relevant articles are identified employing an automated keyword-based search in eight digital libraries. This lead to an initial sample of 420 papers, which was then narrowed down in a second phase by human inspection of article abstracts, titles and keywords to 152 relevant articles published in the period 2006–2018. The analysis of these articles enabled us to identify 26 programming languages referred to in 33 of relevant articles. We compared the outcome of the mapping study with results of our questionnaire-based survey that involved 57 HPC experts. The mapping study and the survey revealed that the desired features of programming languages for data-intensive HPC applications are portability, performance and usability. Furthermore, we observed that the majority of the programming languages used in the context of data-intensive HPC applications are text-based general-purpose programming languages. Typically these have a steep learning curve, which makes them difficult to adopt. We believe that the outcome of this study will inspire future research and development in programming languages for data-intensive HPC applications.
Research output: Contribution to journal › Article › Scientific › peer-review
Today's dominant design for the Internet of Things (IoT) is a Cloud-based system, where devices transfer their data to a back-end and in return receive instructions on how to act. This view is challenged when delays caused by communication with the back-end become an obstacle for IoT applications with, for example, stringent timing constraints. In contrast, Fog Computing approaches, where devices communicate and orchestrate their operations collectively and closer to the origin of data, lack adequate tools for programming secure interactions between humans and their proximate devices at the network edge. This paper fills the gap by applying Action-Oriented Programming (AcOP) model for this task. While originally the AcOP model was proposed for Cloud-based infrastructures, presently it is re-designed around the notion of coalescence and disintegration, which enable the devices to collectively and autonomously execute their operations in the Fog by serving humans in a peer-to-peer fashion. The Cloud's role has been minimized—it is being leveraged as a development and deployment platform.
EXT="Mäkitalo, Niko"
EXT="Mikkonen, Tommi"
Research output: Contribution to journal › Article › Scientific › peer-review
Hardware acceleration for famous VPN solution, IPsec, has been widely researched already. Still it is not fully covered and the increasing latency, throughput, and feature requirements need further evaluation. We propose an IPsec accelerator architecture in an FPGA and explain the details that need to be considered for a production ready design. This research considers the IPsec packet processing without IKE to be offloaded on an FPGA in an SDN network. Related work performance rates in 64 byte packet size for throughput is 1–2 Gbps with 0.2 ms latency in software, and 1–4 Gbps with unknown latencies for hardware solutions. Our proposed architecture is capable to host 1000 concurrent tunnels and have 10 Gbps throughput with only 10 µs latency in our test network. Therefore the proposed design is efficient even with voice or video encryption. The architecture is especially designed for data centers and locations with vast number of concurrent IPsec tunnels. The research confirms that FPGA based hardware acceleration increases performance and is feasible to integrate with the other server infrastructure.
EXT="Viitamäki, Vili"
EXT="Kulmala, Ari"
Research output: Contribution to journal › Article › Scientific › peer-review
Mobile app markets have been touted as fastest growing marketplaces in the world. Every day thousands of apps are published to join millions of others on app stores. The competition for top grossing apps and market visibility is fierce. The way an app is visually represented can greatly contribute to the amount of attention an icon receives and to its consequent commercial performance. Therefore, the icon of the app is of crucial importance as it is the first point of contact with the potential user/customer amidst the flood of information. Those apps that fail to arouse attention through their icons danger their commercial performance in the market where consumers browse past hundreds of icons daily. Using semantic differential scale (22 adjective pairs), we investigate the relationship between consumer perceptions of app icons and icon successfulness, measured by 1)overall evaluation of the icon, 2)willingness to click the icon, 3)willingness to download the imagined app and, 4)willingness to purchase the app. The study design was a vignette study with random participant (n = 569)assignment to evaluate 4 icons (n = 2276)from a total of pre-selected 68 game app icons across 4 categories (concrete, abstract, character and text). Results show that consumers are more likely to interact with app icons that are aesthetically pleasing and convey good quality. Particularly, app icons that are perceived unique, realistic and stimulating lead to more clicks, downloads and purchases.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we introduce a new fading model which is capable of characterizing both the shadowing of the dominant component and composite shadowing which may exist in wireless channels. More precisely, this new model assumes a κ-μ envelope where the dominant component is fluctuated by a Nakagami-m random variable (RV) which is preceded (or succeeded) by a secondary round of shadowing brought about by an inverse Nakagami-m RV. We conveniently refer to this as the double shadowed κ-μ fading model. In this context, novel closed-form and analytical expressions are developed for a range of channel related statistics, such as the probability density function, cumulative distribution function, and moments. All of the derived expressions have been validated through Monte-Carlo simulations and reduction to a number of well-known special cases. It is worth highlighting that the proposed fading model offers remarkable flexibility as it includes the κ-μ, η-μ, Rician shadowed, double shadowed Rician, κ-μ shadowed, κ-μ/inverse gamma and η-μ/inverse gamma distributions as special cases.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Graphics Processing Units (GPU) have been widely used in various fields of scientific computing, such as in signal processing. GPUs have a hierarchical memory structure with memory layers that are shared between GPU processing elements. Partly due to the complex memory hierarchy, GPU programming is non-Trivial, and several aspects must be taken into account, one being memory access patterns. One of the fastest GPU memory layers, shared memory, is grouped into banks to enable fast, parallel access for processing elements. Unfortunately, it may happen that multiple threads of a GPU program may access the same shared memory bank simultaneously causing a bank conflict. If this happens, program execution slows down as memory accesses have to be rescheduled to determine which instruction to execute first. Bank conflicts are not taken into account automatically by the compiler, and hence the programmer must detect and deal with them prior to program execution. In this paper, we present an algebraic approach to detect bank conflicts and prove some theoretical results that can be used to predict when bank conflicts happen and how to avoid them. Also, our experimental results illustrate the savings in computation time.
INT=comp,"Ferranti, Luca"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The partial shading conditions significantly affect the functionality of solar power plants despite the presence of multiple maximum power point tracking systems. The primary cause of this problem is the presence of local maxima in the power–current and/or power–voltage characteristic curves that restrict the functionality of the conventional maximum power point tracking systems. The present article proposes a modified algorithm based on the simplified equivalent circuit of solar cells to improve the functionality of traditional maximum power point tracking systems. This algorithm provides a method for regularly monitoring the photo-current of each solar module. The upper and lower boundaries of the regulating parameter such as current or voltage are decided very precisely, which is helpful to find the location of the global maximum. During a sequential search, the control system accurately determines the lower and upper boundaries of the global maximum. Simultaneously, the maximum power point tracking system increases the photovoltaic current up to one of these boundaries and applies one of the conventional algorithms. Additionally, the control system regularly monitors the photovoltaic characteristics and changes the limits of regulating parameter concerning any change in global maximum location. This proposed method is fast and precise to locate the global maximum boundaries and to track global maximum even under fast-changing partial shading conditions. The improved performance and overall efficiency are validated by simulation study for variable solar irradiance.
Research output: Contribution to journal › Article › Scientific › peer-review
Propositional and modal inclusion logic are formalisms that belong to the family of logics based on team semantics. This article investigates the model checking and validity problems of these logics. We identify complexity bounds for both problems, covering both lax and strict team semantics. By doing so, we come close to finalizing the programme that aims to completely classify the complexities of the basic reasoning problems for modal and propositional dependence, independence and inclusion logics.
Research output: Contribution to journal › Article › Scientific › peer-review
Semiconductor devices based upon silicon have powered the modern electronics revolution through advanced manufacturing processes. However, the requirement of high temperatures to create crystalline silicon devices has restricted its use in a number of new applications, such as printed and flexible electronics. Thus, developments with high mobility solution-processable metal oxides, surpassing α-Si in many instances, is opening a new era for flexible and wearable electronics. However, high operating voltages and relatively high deposition temperatures required for metal oxides remain impediments for the flexible devices. Here, the fabrication of low operating voltage, flexible thin film transistors (TFT) using a solution processed indium oxide In2 O3) channel material with room temperature deposited anodized high-κ aluminum oxide (Al2 O3) for gate dielectrics are reported. The flexible TFTs operates at low voltage Vds of 2 V, with threshold voltage Vth 0.42 V, on/off ratio 103 and subthreshold swing (SS) 420 mV/dec. The electron mobility (μ), extracted from the saturation regime, is 2.85 cm2/V.s and transconductance, gm, is 38 μS.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The recent Omnidirectional MediA Format (OMAF) standard, which specifies the delivery of 360° video content, supports only equirectangular projection (ERP) and cubemap projection and their region-wise packing with a limitation on video decoding capability to the maximum resolution of 4K (e.g., 4,096 × 2,048). Streaming of 4K ERP content allows only a limited viewport resolution, which is lower than the resolution of many current head-mounted displays (HMDs). Therefore, to take full advantage of high-resolution HMDs, delivery of 360° video content beyond 4K resolution needs to be enabled. In this regard, we propose two specific mixed-resolution packing schemes of 6K (e.g., 6,144 × 3,072) and 8K (e.g., 8,192 × 4,096) ERP content and their realization in tile-based streaming, while complying with the 4K decoding constraint and the High Efficiency Video Coding standard. The proposed packing schemes offer 6K and 8K effective resolution at the viewport. Using our proposed test methodology, experimental results indicate that the proposed layouts significantly decrease streaming bitrates when compared to mixed-quality viewport-adaptive streaming of 4K ERP. Our results further indicate that 8K-effective packing outperforms 6K-effective packing especially in high-quality videos.
EXT="Zare, Alireza"
EXT="Aminlou, Alireza"
Research output: Contribution to journal › Article › Scientific › peer-review
Background. Architectural smells and code smells are symptoms of bad code or design that can cause different quality problems, such as faults, technical debt, or difficulties with maintenance and evolution. Some studies show that code smells and architectural smells often appear together in the same file. The correlation between code smells and architectural smells, however, is not clear yet; some studies on a limited set of projects have claimed that architectural smells can be derived from code smells, while other studies claim the opposite. Objective. The goal of this work is to understand whether architectural smells are independent from code smells or can be derived from a code smell or from one category of them. Method. We conducted a case study analyzing the correlations among 19 code smells, six categories of code smells, and four architectural smells. Results. The results show that architectural smells are correlated with code smells only in a very low number of occurrences and therefore cannot be derived from code smells. Conclusion. Architectural smells are independent from code smells, and therefore deserve special attention by researchers, who should investigate their actual harmfulness, and practitioners, who should consider whether and when to remove them.
Research output: Contribution to journal › Article › Scientific › peer-review
High-level synthesis tools aim to produce hardware designs out of software descriptions with a goal to lower the bar in FPGA usage for software engineers. Despite their recent progress, however, HLS tools still require FPGA target specific pragmas and other modifications to the originally processor-targeting source code descriptions. Customized soft core based overlay architectures provide a software programmable layer on top of the FPGA fabric. The benefit of this approach is that a platform independent compiler target is presented to the programs, which lowers the porting burden, and online repurposing the same configuration is natural by just switching the executed program. The main drawback, like with any overlay architecture, are the additional implementation overheads the overlay imposes to the resource consumption and the maximum operating frequency. In this paper we show how by utilizing the efficient structure of Transport-Triggered Architectures (TTA), soft-cores can be customized automatically to benefit from the flexible FPGA fabric while still presenting a comfortable software layer to the users. The results compared to previously published non-specialized TTA soft cores indicate equal or better execution times, while the program image size is reduced by up to 49%, and overall resource utilization improved from 10% to 60%.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The blockchain technology is currently penetrating different sides of modern ICT community. Most of the devices involved in blockchain-related processes are specially designed targeting only the mining aspect. At the same time, the use of wearable and mobile devices may also become a part of blockchain operation, especially during the charging time. The paper considers the possibility of using a large number of constrained devices supporting the operation of the blockchain. The utilization of such devices is expected to improve the efficiency of the system and also to attract a more substantial number of users. Authors propose a novel consensus algorithm based on a combination of Proof-of-Work (PoW), Proof-of-Activity (PoA), and Proof-of-Stake (PoS). The paper first overviews the existing strategies and further describes the developed cryptographic primitives used to build a blockchain involving mobile devices. A brief numerical evaluation of the designed system is also provided in the paper.
EXT="Zhidanov, Konstantin"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper we discuss how does the input magnitude data setting influence the behavior of error-reduction algorithm in the case of the one-dimensional discrete phase retrieval problem. We present experimental results related to the convergence or stagnation of the algorithm. We also discuss the issue of the zeros distribution of the solution, when the solution of the problem exists.
EXT="Rusu, Corneliu"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Tracking the location of people and their mobile devices creates opportunities for new and exciting ways of interacting with public technology. For instance, users can transfer content from public displays to their mobile device without touching it, because location tracking allows automatic recognition of the target device. However, many uncertainties remain regarding how users feel about interactive displays that track them and their mobile devices, and whether their experiences vary based on the setting. To close this research gap, we conducted a 24-participant user study. Our results suggest that users are largely willing - even excited - to adopt novel location-tracking systems. However, users expect control over when and where they are tracked, and want the system to be transparent about its ownership and data collection. Moreover, the deployment setting plays a much bigger role on people's willingness to use interactive displays when location tracking is involved.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
As the Internet of Vehicles matures and acquires its social flavor, novel wireless connectivity enablers are being demanded for reliable data transfer in high-rate applications. The recently ratified New Radio communications technology operates in millimeter-wave (mmWave) spectrum bands and offers sufficient capacity for bandwidth-hungry services. However, seamless operation over mmWave is difficult to maintain on the move, since such extremely high frequency radio links are susceptible to unexpected blockage by various obstacles, including vehicle bodies. As a result, proactive mode selection, that is, migration from infrastructure- to vehicle-based connections and back, is becoming vital to avoid blockage situations. Fortunately, the very social structure of interactions between the neighboring smart cars and their passengers may be leveraged to improve session continuity by relaying data via proximate vehicles. This paper conceptualizes the socially inspired relaying scenarios, conducts underlying mathematical analysis, continues with a detailed 3-D modeling to facilitate proactive mode selection, and concludes by discussing a practical prototype of a vehicular mmWave platform.
Research output: Contribution to journal › Article › Scientific › peer-review
In the present contribution, we propose a novel opportunistic ambient backscatter communication (ABC) framework for radio frequency (RF)-powered cognitive radio (CR) networks. This framework considers opportunistic spectrum sensing (SS) integrated with ABC and harvest-then-transmit (HTT) operation strategies. Novel analytic expressions are derived for the average throughput, the average energy consumption and the energy efficiency (EE) in the considered set up. These expressions are represented in closed-form and have a tractable algebraic representation which renders them convenient to handle both analytically and numerically. In addition, we formulate an optimization problem to maximize the EE of the CR system operating in mixed ABC - and HTT - modes, for a given set of constraints, including primary interference and imperfect SS constraints. Capitalizing on this, we determine the optimal set of parameters which in turn comprise the optimal detection threshold, the optimal degree of trade-off between the CR system operating in the ABC - and HTT - modes and the optimal data transmission time. Extensive results from respective computer simulations are also presented for corroborating the corresponding analytic results and to demonstrate the performance gain of the proposed model in terms of EE.
Research output: Contribution to journal › Article › Scientific › peer-review
Coarse-grained reconfigurable architectures and other exposed datapath architectures such as transport-triggered architectures come with a high energy efficiency promise for accelerating data oriented workloads. Their main drawback results from the push of complexity from the architecture to the programmer; compiler techniques that allow starting from a higher-level programming language and generate code efficiently to such architectures robustly is still an open research area. In this article we survey the known main sources of challenges and outline a generic processor architecture template that covers the most common architecture variations along with a proposal for a common code generation framework for such challenging architectures.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Most applications and services rely on central authorities. This introduces a single point of failure to the system. The central authority must be trusted to have data stored by the application available at any given time. More importantly, the privacy of the user depends on the service provider capability to keep the data safe. A decentralized system could be a solution to remove the dependency from a central authority. Moreover, due to the rapid growth of mobile device usage, the availability of decentralization must not be limited only to desktop computers. In this work we aim at studying the possibility to use mobile devices as a decentralized file sharing platform without any central authorities. This was done by implementing Asterism, a peer-to-peer file-sharing mobile application based on the Inter-Planetary File System. We validate the results by deploying and measuring the application network usage and power consumption in multiple different devices. Results show that mobile devices can be used to implement a worldwide distributed file sharing network. However, the file sharing application generated large amounts of network traffic even when no files were shared. This was caused by the chattiness of the protocol of the underlying peer-to-peer network. Consequently, constant network traffic prevented the mobile devices from entering to deep sleep mode. Due to this the battery life of the devices was greatly degraded.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Artificial Intelligence (AI) is one of the current emerging technologies. In the history of computing AI has been in the similar role earlier - almost every decade since the 1950s, when the programming language Lisp was invented and used to implement self-modifying applications. The second time that AI was described as one of the frontier technologies was in the 1970s, when Expert Systems (ES) were developed. A decade later AI was again at the forefront when the Japanese government initiated its research and development effort to develop an AI-based computer architecture called the Fifth Generation Computer System (FGCS). Currently in the 2010s, AI is again on the frontier in the form of (self-)learning systems manifesting in robot applications, smart hubs, intelligent data analytics, etc. What is the reason for the cyclic reincarnation of AI? This paper gives a brief description of the history of AI and also answers the question above. The current AI “cycle” has the capability to change the world in many ways. In the context of the CE conference, it is important to understand the changes it will cause in education, the skills expected in different professions, and in society at large.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The present contribution analyzes the performance of non-orthogonal multiple access (NOMA)-based user cooperation with simultaneous wireless information and power transfer (SWIPT). In particular, we consider a two-user NOMA-based cooperative SWIPT scenario, in which the near user acts as a SWIPT-enabled relay that assists the farthest user. In this context, we derive analytic expressions for the pairwise error probability (PEP) of both users assuming the both amplify-and-forward (AF) and decode-and-forward (DF) relay protocols. The derived expressions are expressed in closed-form and have a tractable algebraic representation which renders them convenient to handle both analytically and numerically. In addition to this, we derive a simple asymptotic closed-form expression for the PEP in the high signal-to-noise ratio (SNR) regime which provide useful insights on the impact of the involved parameters on the overall system performance. Capitalizing on this, we subsequently quantify the maximum achievable diversity order of both users. It is shown that numerical and simulation results corroborate the derived analytic expressions. Furthermore, the offered results provide interesting insights into the error rate performance of each user, which are expected to be useful in future designs and deployments of NOMA based SWIPT systems.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The agricultural sector in Finland has been lagging behind in digital development. Development has long been based on increasing production by investing in larger machines. Over the past decade, change has begun to take place in the direction of digitalization. One of the challenges is that different manufacturers are trying to get farmers' data on their own closed cloud services. In the worst case, farmers may lose an overall view of their farms and opportunities for deeper data analysis because their data is located in different services. The goals and previously studied challenges of the 'MIKÄ DATA' project are described in this research. This project will build an intelligent data service for farmers, which is based on the Oskari platform. In the 'Peltodata' service, farmers can see their own field data and many other data sources layer by layer. The project is focused on the study of machine learning techniques to develop harvest yield prediction and find out the correlation between many data sources. The 'Peltodata' service will be ready at the end of 2019.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Software Defined Networking (SDN) is a new networking paradigm in which the control plane and data plane are decoupled. Throughout the recent years, a number of architectures have emerged for protocol-independent packet processing. One such architecture is the Protocol Independent Switch Architecture (PISA). It is a programmable and protocol-independent architecture composed of a number of Match and Action stages. Inside each of these stages is a crossbar to generate the search key and another crossbar to provide the input to the Action Units. In this paper, we design and explore alternative interconnection schemes with the aim of finding the most area- and power-efficient interconnection structure. Moreover, we propose further modifications to the interconnection structure, as a result of which the on-chip area of both match and action crossbars will be reduced by more than 70 % and power dissipation will be reduced by 25.8 % and 23.1 % for match and action crossbars respectively.
jufoid=72023
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Universities are still mainly preparing students for the world, where 'do something useful', i.e. 'do something with your hands' was the main principle and work was done during strictly regulated time. But world has changed and traditional areas of human activity (what also are the main target in University courses) are rapidly diminishing. More important have become virtual products - computer programs, mobile apps, social networks, new types of digital currencies, IOT (voice in your bathroom suggesting to buy the next model of Alexa), video games, interactive TV, virtual reality etc. Most of these new areas are not present in current curricula and there are problems with involving them in curricula - (working) students know (some aspects of) these areas better than many of university teachers, since corresponding knowledge is not yet present in textbooks - it is present only on Internet. The Internet strongly influences both what we teach and how we teach.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
One of the biggest problems in managing devices for the Internet of Things (IoT) is the ability for a management server to independently discover and retrieve data models for vendor-specific devices. At the same time, several device management methods also lack methods for device vendors to share their data models in a consistent manner. This paper presents the design and implementation of a repository that can flexibly accommodate many needs with regards to these issues, and allows device vendors to publish semantically similar data models as well as attach meta-data to these models. A Machine-to-Machine (M2M) communication interface also allows a management server to communicate with the repository. We show how these techniques can be used with the Lightweight Machine-to-Machine (LWM2M) standard.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Terahertz (THz) band communications, capable of achieving the theoretical capacity of up to several terabits-per-second, are one of the attractive enablers for beyond 5G wireless networks. THz systems will use extremely directional narrow beams, allowing not only to extend the communication range but also to partially secure the data already at the physical layer. The reason is that, in most cases, the Attacker has to be located within the transmitter beam in order to eavesdrop the message. However, even the use of very narrow beams results in the considerably large area around the receiver, where the Attacker can capture all the data. In this paper, we study how to decrease the message eavesdropping probability by leveraging the inherent multi-path nature of the THz communications. We particularly propose sharing the data transmission over multiple THz propagation paths currently available between the communicating entities. We show that, at a cost of the slightly reduced link capacity, the message eavesdropping probability in the described scheme decreases significantly even when several Attackers operate in a cooperative manner. The proposed solution can be utilized for the transmission of the sensitive data, as well as to secure the key exchange in THz band networks beyond 5G.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, we investigate the performance of conventional cooperative sensing (CCS) and superior selective reporting (SSR)-based cooperative sensing in an energy harvesting (EH)-enabled heterogeneous cognitive radio network (HCRN). In particular, we derive expressions for the achievable throughput of both schemes and formulate nonlinear integer programming problems, in order to find the throughput-optimal set of spectrum sensors scheduled to sense a particular channel, given primary user (PU) interference and EH constraints. Furthermore, we present novel solutions for the underlying optimization problems based on the cross-entropy (CE) method, and compare the performance with exhaustive search and greedy algorithms. Finally, we discuss the tradeoff between the average achievable throughput of the SSR and CCS schemes, and highlight the regime where the SSR scheme outperforms the CCS scheme. Notably, we show that there is an inherent tradeoff between the channel available time and the detection accuracy. Our numerical results show that, as the number of spectrum sensors increases, the channel available time gains a higher priority in an HCRN, as opposed to detection accuracy.
Research output: Contribution to journal › Article › Scientific › peer-review
Internet-of-things (IoT) objects are expected to exceed 75 billion objects by 2020, and a large part of the expansion is expected to be at a finer granularity than existing silicon-based IoT objects (i.e. tablets and cell phones) can deliver [1]. Currently, placing a room light or a thermostat on the internet for remote control is considered progressive. However, if printed electronics can achieve performance increases, then IoT objects could be affixed to almost anything, such as coffee creamer cartons, cereal boxes, or that missing sock. Each of these IoT objects could be driving a sensor, perhaps position, temperature or pressure, essentially a multitude of applications. In order for IoT objects to emulate a simple postage stamp, with self-powering from energy scavenging and local energy storage, all housed in a non-toxic flexible form factor, advances in solution processable devices need to occur.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Digital predistortion (DPD) has important applications in wireless communication for smart systems, such as, for example, in Internet of Things (IoT) applications for smart cities. DPD is used in wireless communication transmitters to counteract distortions that arise from nonlinearities, such as those related to amplifier characteristics and local oscillator leakage. In this paper, we propose an algorithm-architecture-integrated framework for design and implementation of adaptive DPD systems. The proposed framework provides energy-efficient, real-time DPD performance, and enables efficient reconfiguration of DPD architectures so that communication can be dynamically optimized based on time-varying communication requirements. Our adaptive DPD design framework applies Markov Decision Processes (MDPs) in novel ways to generate optimized runtime control policies for DPD systems. We present a GPU-based adaptive DPD system that is derived using our design framework, and demonstrate its efficiency through extensive experiments.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The unprecedented proliferation of smart devices together with novel communication, computing, and control technologies have paved the way for A-IoT. This development involves new categories of capable devices, such as high-end wearables, smart vehicles, and consumer drones aiming to enable efficient and collaborative utilization within the smart city paradigm. While massive deployments of these objects may enrich people's lives, unauthorized access to said equipment is potentially dangerous. Hence, highly secure human authentication mechanisms have to be designed. At the same time, human beings desire comfortable interaction with the devices they own on a daily basis, thus demanding authentication procedures to be seamless and user-friendly, mindful of contemporary urban dynamics. In response to these unique challenges, this work advocates for the adoption of multi-factor authentication for A-IoT, such that multiple heterogeneous methods - both well established and emerging - are combined intelligently to grant or deny access reliably. We thus discuss the pros and cons of various solutions as well as introduce tools to combine the authentication factors, with an emphasis on challenging smart city environments. We finally outline the open questions to shape future research efforts in this emerging field.
Research output: Contribution to journal › Article › Scientific › peer-review
In this work we propose a framework for improving the performance of any deep neural network that may suffer from vanishing gradients. To address the vanishing gradient issue, we study a framework, where we insert an intermediate output branch after each layer in the computational graph and use the corresponding prediction loss for feeding the gradient to the early layers. The framework-which we name Elastic network-is tested with several well-known networks on CIFAR10 and CIFAR100 datasets, and the experimental results show that the proposed framework improves the accuracy on both shallow networks (e.g., MobileNet) and deep convolutional neural networks (e.g., DenseNet). We also identify the types of networks where the framework does not improve the performance and discuss the reasons. Finally, as a side product, the computational complexity of the resulting networks can be adjusted in an elastic manner by selecting the output branch according to current computational budget.
INT=comp,"Bai, Yue"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents an analysis of an efficient parallel implementation of the active-set Newton algorithm (ASNA), which is used to estimate the nonnegative weights of linear combinations of the atoms in a large-scale dictionary to approximate an observation vector by minimizing the Kullback–Leibler divergence between the observation vector and the approximation. The performance of ASNA has been proved in previous works against other state-of-the-art methods. The implementations analysed in this paper have been developed in C, using parallel programming techniques to obtain a better performance in multicore architectures than the original MATLAB implementation. Also a hardware analysis is performed to check the influence of CPU frequency and number of CPU cores in the different implementations proposed. The new implementations allow ASNA algorithm to tackle real-time problems due to the execution time reduction obtained.
Research output: Contribution to journal › Article › Scientific › peer-review
EXT="Viitanen, Timo"
Research output: Contribution to journal › Article › Scientific › peer-review
ALMARVI is a collaborative European research project funded by Artemis involving 16 industrial as well as academic partners across 4 countries, working together to address various computational challenges in image and video processing in 3 application domains: healthcare, surveillance and mobile. This paper is an editorial for a special issue discussing the integrated system created by the partners to serve as a cross-domain solution for the project. The paper also introduces the partner articles published in this special issue to discuss the various technological developments achieved within ALMARVI spanning all system layers, from hardware to applications. We illustrate the challenges faced within the project based on use cases from the three targeted application domains, and how these can address the 4 main project objectives addressing 4 challenges faced by high performance image and video processing systems: massive data rate, low power consumption, composability and robustness. We present a system stack composed of algorithms, design frameworks and platforms as a solution to these challenges. Finally, the use cases from the three different application domains are mapped on the system stack solution and are evaluated based on their performance for each of the 4 ALMARVI objectives.
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we present a high data rate implementation of a digital predistortion (DPD) algorithm on a modern mobile multicore CPU containing an on-chip GPU. The proposed implementation is capable of running in real-time, thanks to the execution of the predistortion stage inside the GPU, and the execution of the learning stage on a separate CPU core. This configuration, combined with the low complexity DPD design, allows for more than 400 Msamples/s sample rates. This is sufficient for satisfying 5G new radio (NR) base station radio transmission specifications in the sub-6 GHz bands, where signal bandwidths up to 100 MHz are specified. The linearization performance is validated with RF measurements on two base station power amplifiers at 3.7 GHz, showing that the 5G NR downlink emission requirements are satisfied.
INT=comp,"Meirhaeghe, Alexandre"
Research output: Contribution to journal › Article › Scientific › peer-review
Farm detection using low resolution satellite images is an important topic in digital agriculture. However, it has not received enough attention compared to high-resolution images. Although high resolution images are more efficient for detection of land cover components, the analysis of low-resolution images are yet important due to the low-resolution repositories of the past satellite images used for timeseries analysis, free availability and economic concerns. The current paper addresses the problem of farm detection using low resolution satellite images. In digital agriculture, farm detection has significant role for key applications such as crop yield monitoring. Two main categories of object detection strategies are studied and compared in this paper; First, a two-step semi-supervised methodology is developed using traditional manual feature extraction and modelling techniques; the developed methodology uses the Normalized Difference Moisture Index (NDMI), Grey Level Co-occurrence Matrix (GLCM), 2-D Discrete Cosine Transform (DCT) and morphological features and Support Vector Machine (SVM) for classifier modelling. In the second strategy, high-level features learnt from the massive filter banks of deep Convolutional Neural Networks (CNNs) are utilised. Transfer learning strategies are employed for pretrained Visual Geometry Group Network (VGG-16) networks. Results show the superiority of the high-level features for classification of farm regions.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The performance of physical-layer security of the classic Wyner's wiretap model over Fisher-Snedecor composite fading channels is considered in this work. Specifically, F the main channel (i.e., between the source and the legitimate destination) and the eavesdropper's channel (i.e., between the source and the illegitimate destination) are assumed to experience independent quasi-static Fisher-Snedecor fading conditions, which have been shown to be encountered in realistic wireless transmission scenarios in conventional and emerging communication systems. In this context, exact closed-form expressions for the average secrecy capacity (ASC) and the probability of non-zero secrecy capacity (PNSC) are derived. Additionally, an asymptotic analytical expression for the ASC is presented. The impact of shadowing and multipath fading on the secrecy performance is investigated. Our results show that increasing the fading parameter of the main channel and/or the shadowing parameter of the eavesdropper's channel improves the secrecy performance. The analytical results are compared with Monte-Carlo simulations to validate the analysis.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The Fisher-Snedecor F distribution was recently proposed as an accurate and tractable composite fading model in the context of device-to-device communications. The present work derives the product of the Fisher-Snedecor F composite fading model, which is useful in characterizing fading effects in numerous realistic communication scenarios. To this end, novel analytic expressions are first derived for the probability density function, the cumulative distribution function and the moment of the product of N statistically independent, but not necessarily identically distributed, Fisher-Snedecor F random variables. Capitalizing on these expressions, we derive tractable closed-form expressions for channel quality estimation of the proposed model as well as the corresponding outage probability and average bit error probability for binary modulations. The offered results are corroborated by extensive Monte-Carlo simulation results, which verify the validity of the derived expressions. It is shown that the number of cascaded channels affects considerably the corresponding performance, as a variation of over an order of magnitude is observed across all signal-to-noise ratio regimes.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Markov Decision Processes (MDPs) provide important capabilities for facilitating the dynamic adaptation of hardware and software configurations to the environments in which they operate. However, the use of MDPs in embedded signal processing systems is limited because of the large computational demands for solving this class of system models. This paper presents Sparse Parallel Value Iteration (SPVI), a new algorithm for solving large MDPs on resource-constrained embedded systems that are equipped with mobile GPUs. SPVI leverages recent advances in parallel solving of MDPs and adds sparse linear algebra techniques to significantly outperform the state-of-the-art. The method and its application are described in detail, and demonstrated with case studies that are implemented on an NVIDIA Tegra K1 System On Chip (SoC). The experimental results show execution time improvements in the range of 65 % -78% for several applications. SPVI also lifts restrictions required by other MDP solver approaches, making it more widely compatible with large classes of optimization problems.
jufoid=71852
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This article presents an integrated self-aware computing model in a Heterogeneous Multicore Architecture (HMA) to mitigate the power dissipation of an Orthogonal Frequency-Division Multiplexing (OFDM) receiver. The proposed platform consists of template-based Coarse-Grained Reconfigurable Array (CGRA) devices connected through a Network-on-Chip (NoC) around a few Reduced Instruction-Set Computing (RISC) cores. The self-aware computing model exploits Feedback Control System (FCS) which constantly monitors the execution-time of each core and dynamically scales the operating frequency of each node of the NoC depending on the worst execution-time. Therefore, the performance of the overall system is equalized towards a desired level besides mitigating the power dissipation. Measurement results obtained from Field-Programmable Gate Array (FPGA) synthesis show up to 20.2% dynamic power dissipation and 16.8% total power dissipation savings. Since FCS technique can be employed for scaling the frequency and the voltage and on the other hand, voltage supply cannot be scaled on the FPGA-based prototyped platform, the implementation is also estimated in 28nm Ultra-Thin Body and Buried oxide (UTBB) Fully-Depleted Silicon-On-Insulator (FD-SOI) Application-Specific Integrated Circuit (ASIC) technology to scale voltage in addition to frequency and get more benefits in terms of dynamic power dissipation reduction. Subsequent to synthesizing the whole platform on ASIC and scaling the voltage and frequency simultaneously as a Dynamic Voltage and Frequency Scaling (DVFS) method, significant dynamic power dissipation savings by 5.97X against Dynamic Frequency Scaling (DFS) method were obtained.
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
The increasing number of cores in System on Chips (SoC) has introduced challenges in software parallelization. As an answer to this, the dataflow programming model offers a concurrent and reusability promoting approach for describing applications. In this work, a runtime for executing Dataflow Process Networks (DPN) on multicore platforms is proposed. The main difference between this work and existing methods is letting the operating system perform Central processing unit (CPU) load-balancing freely, instead of limiting thread migration between processing cores through CPU affinity. The proposed runtime is benchmarked on desktop and server multicore platforms using five different applications from video coding and telecommunication domains. The results show that the proposed method offers significant improvements over the state-of-art, in terms of performance and reliability.
Research output: Contribution to journal › Article › Scientific › peer-review
Data compression is a common requirement for displaying large amounts of information. The goal is to reduce visual clutter. The approach given in this paper uses an analysis of a data set to construct a visual representation. The visualization is compressed using the address ranges of the memory structure. This method produces a compressed version of the initial visualization, retaining the same information as the original. The presented method has been implemented as a Memory Designer tool for ASIC, FPGA and embedded systems using IP-XACT. The Memory Designer is a user-friendly tool for model based embedded system design, providing access and adjustment of the memory layout from a single view, complementing the 'programmer's view' to the system.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The aim was to study if odors evaporated by an olfactory display prototype can be used to affect participants' cognitive and emotion-related responses to audio-visual stimuli, and whether the display can benefit from objective measurement of the odors. The results showed that odors and videos had significant effects on participants' responses. For instance, odors increased pleasantness ratings especially when the odor was authentic and the video was congurent with odors. The objective measurement of the odors was shown to be useful. The measurement data was classified with 100 % accuracy removing the need to speculate whether the odor presentation apparatus is working properly.
INT=tut-bmt,"Nieminen, Ville"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We describe three distinct approaches to visualization for multiscale materials modelling research. These have been developed with the framework of the SimPhoNy FP7 EU-project, and complement each other in their requirements and possibilities. All have been integrated via wrappers to one or more of the simulation approaches within the SimPhoNy project. In this manuscript we describe and contrast their features. Together they cover visualization needs from electronic to macroscopic scales and are suited to simulations made on personal computers, workstations or advanced High Performance parallel computers. Examples as well as recommendations for future calculations are presented.
EXT="Kulju, Sampo"
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Contribution to journal › Editorial › Scientific
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Data collection requires homogenization of the data prior to processing it. This can create a challenge to the companies since data have varicose formats and schemas. This paper discusses the implementation of data collection framework using the PLANTCockpit open source platform, which is an integration platform for business processes. The framework is extended to fetch data from heterogeneous sources and then, allow the user to select the relevant data that matches his/her needs. In addition to this, the extendable framework also makes it possible to select the output data structure based on the user's requirements. Defining such a framework can reduce company's efforts for reshaping and modifying their architectures to handle new challenges posed by the ever-changing data. The framework not only restricts one to collection and transformation, but also provides an option to perform available processing techniques on the transformed data structure. The proposed framework has been tested on a cloud-based platform provided by the Cloud Collaborative Manufacturing Networks (C2NET) project.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
There is a trend about the adoption of Knowledge Representation and Reasoning formalisms, such as ontologies, for industrial automation. For example, semantic models are used as knowledge bases that encapsulate different type of information of manufacturing systems, e.g., statuses and capabilities of their cyber and physical resources. Moreover, these models can be updated and accessed during runtime. In this context, models are becoming a critical part of the system infrastructure for both controlling and monitoring activities. However, models tend to be designed for specific purposes and not standardized. This is an issue because the employed formalisms, such as ontologies, emerged in order to bring an engineering tool for commonly classifying, defining, and sharing information. This article proposes the development of modular ontologies based on different parts of the ISA-95 standard for describing the product, process, and resource information of manufacturing systems. In addition, this research work demonstrates a set of semantic rules that may be used for inferring implicit knowledge of the ontology that permits the automatic checking of the required machines to manufacture different product variants.
INT=aut,"Seyedamir, Ahmadi"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Industry 4.0 is about the interconnectivity and digitalisation of industrial systems that need to be integrated in order to improve the efficiency of resources and, in turn, processes. Both research and commercial sectors are working towards addressing specific challenges, such as data modelling, collection and processing. A correct manipulation and interpretation of data is critical and, now, more difficult than ever due to the dramatic increment of the amount of data generated at different levels of enterprises. Ultimately, this research work presents a solution, integrated with an existing cloud-based platform, for collecting and processing real-time factory shop floor streams of data. Such solution is an IoT-based development, which consist on both IoT hub and gateway that permit the consumption and communication of device information. The required message exchange is done within state of the art technologies and protocols e.g., MQTT protocol and REST-based interface. The implementation of the solution is demonstrated through an industrial-based scenario.
INT=aut,"Iftikhar, Umer"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Cyber-physical Systems (CPS) in industrial manufacturing facilities demand a continuous interaction with different and a large amount of distributed and networked computing nodes, devices and human operators. These systems are critical to ensure the quality of production and the safety of persons working at the shop floor level. Furthermore, this situation is similar in other domains, such as logistics that, in turn, are connected and affect the overall production efficiency. In this context, this article presents some key steps for integrating three pillars of CPS (production line, logistics and facilities) into the current smart manufacturing environments in order to adopt an industrial Cyber-Physical Systems of Systems (CPSoS) paradigm. The approach is focused on the integration in several digital functionalities in a cloud-based platform to allow a real time multiple devices interaction, data analytics/sharing and machine learning-based global reconfiguration to increase the management and optimization capabilities for increasing the quality of facility services, safety and energy efficiency and industrial productivity. Conceptually, isolated systems may enhance their capabilities by accessing to information of other systems. The approach introduces particular vision, main components, potential and challenges of the envisioned CPSoS. In addition, the description of one scenario for realizing the CPSoS vision is presented. The results herein presented will pave the way for the adoption of CPSoS that can be used as a pilot for further research on this emerging topic.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
How to measure and train for adaptability has emerged as a priority in military contexts in response to emergent threats and technologies associated with asymmetric warfare. While much research effort has attempted to characterize adaptability in terms of accuracy and response time using traditional executive function cognitive tests, it remains unclear and undefined how adaptability should be measured and thus how simulation-based training should be designed to instigate and modulate adaptable behavior and skills. Adaptable reasoning is well-exemplified in the rescue effort of Apollo 13 by NASA engineers who repurposed available materials available in the spacecraft to retrieve the astronauts safely back to earth. Military leaders have anecdotally referred to adaptability as 'improvised thinking' that repurposes 'blocks of knowledge' to device alternative solutions in response to changes in conditions affecting original tasks while maintaining end-state commander's intent. We review a previous feasibility study that explored the specification of Reusable Modeling Primitives for models and simulation systems building on Dimensional Analysis and Design Structure Matrix for Complexity Management formal methods. This Dimensional Analysis Conceptual Modeling (DACM) paradigm is rooted in science and engineering critical thinking and is consistent with the stated anecdotal premises as it facilitates the objective dimensional decomposition of a problem space to guide the corresponding dimensional composition of possible solutions. Arguably, adaptability also concerns the capability to overcome contradictions, detections, and reductions, which we present in an exemplar addressing the contradiction of increased drag due to increased velocity inherent to torpedoes. We propose that the DACM paradigm may be repurposed as a critical thinking framework for teaching the identification of relevant components in a theater of military operations and how the properties of those components may be repurposed to fashion alternative solutions to tasks involving navigation, call-for-fires, line-of-sight cover, weather and atmospheric effect responses, and others.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper describes the core functionality and a proof-of-concept demonstration setup for remote 360 degree stereo virtual reality (VR) gaming. In this end-to-end scheme, the execution of a VR game is off-loaded from an end user device to a cloud edge server in which the executed game is rendered based on user's field of view (FoV) and control actions. Headset and controller feedback is transmitted over the network to the server from which the rendered views of the game are streamed to a user in real-time as encoded HEVC video frames. This approach saves energy and computation load of the end terminals by making use of the latest advancements in network connection speed and quality. In the showcased demonstration, a VR game is run in Unity on a laptop powered by i7 7820HK processor and GTX 1070 GPU. The 360 degree spherical view of the game is rendered and converted to a rectangular frame using equirectangular projection (ERP). The ERP video is sliced vertically and only the FoV is encoded with Kvazaar HEVC encoder in real time and sent over the network in UDP packets. Another laptop is used for playback with a HTC Vive VR headset. Our system can reach an end-to-end latency of 30 ms and bit rate of 20 Mbps for stereo 1080p30 format.
jufoid=58061e
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Due to their unconstrained mobility and capability to carry goods or equipment, unmanned aerial vehicles (UAVs) or drones are considered as a part of the fifth-generation (5G) wireless networks and become attractive candidates to carry a base station (BS). As 5G requirements apply to a broad range of uses cases, it is of particular importance to satisfy those during spontaneous and temporary events, such as a marathon or a rural fair. To be able to support these scenarios, mobile operators need to deploy significant radio access resources quickly and on demand. Accordingly, by focusing on 5G cellular networks, we investigate the use of drone-assisted communication, where a drone is equipped with a millimeter-wave (mmWave) BS. Being a key technology for 5G, mmWave is able to facilitate the provisioning of the desired per-user data rates as drones arrive at the service area whenever needed. Therefore, in order to maximize the benefits of mmWave-drone-BS utilization, this paper proposes a methodology for its optimized deployment, which delivers the optimal height, coordinates, and coverage radius of the drone-BS by taking into account the human body blockage effects over a mmWave-specific channel model. Moreover, our methodology is able to maximize the number of offloaded users by satisfying the target signal quality at the cell edge and considering the maximum service capacity of the drone-BS. It was observed that the mmWave-specific features are extremely important to consider when targeting efficient drone-BS utilization and thus should be carefully incorporated into analysis.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents a novel digital self-interference canceller for inband full-duplex radio transceivers. The proposed digital canceller utilizes a Volterra series with sparse memory to model the residual SI signal, and it can thereby accurately reconstruct the self-interference even under a heavily nonlinear transmitter power amplifier. To the best of our knowledge, this is the first time such a sparse-memory Volterra series has been used to model the self-interference within an inband full-duplex device. The performance of the Volterra-based canceller is evaluated with real-life measurements that incorporate also an active analog canceller. The results show that the novel digital canceller suppresses the SI by 34 dB in the digital domain, outperforming the state- of-the-art memory polynomial-based solution by a margin of 5 dB. The total amount of cancellation is nearly 110 dB with a transmit power of +30 dBm, even though a shared transmit/receive antenna is used. To the best of our knowledge, this is the highest reported cancellation performance for a shared-antenna full-duplex device with such a high transmit power level.
INT=elt,"Turunen, Matias"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
To overcome the limited coverage in traditional wireless sensor networks, mobile crowd sensing (MCS) has emerged as a new sensing paradigm. To achieve longer battery lives of user devices and incentivize human involvement, this paper presents a novel approach that seamlessly integrates MCS with wireless power transfer, named wirelessly powered crowd sensing (WPCS), for supporting crowd sensing with energy consumption and offering rewards as incentives. An optimization problem is formulated to simultaneously maximize the data utility and minimize the energy consumption for service operator, by jointly controlling wireless-power allocation at the access point (AP) as well as sensing-data size, compression ratio, and sensor transmission duration at the mobile sensor (MS). Given the fixed compression ratios, the optimal power allocation policy is shown to have a threshold-based structure with respect to a defined crowd-sensing priority function for each MS. Given fixed sensing-data utilities, the compression policy achieves the optimal compression ratio. Extensive simulations are also presented to verify the efficiency of the contributed mechanisms.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
By 2020, unmanned ships such as remotely controlled boats and autonomous vessels would become operational, marking a technological revolution for the maritime industry. Such ships are expected to serve needs ranging from coastal ferries to open sea cargo handling. In this paper we detail the security vulnerabilities of such unmanned ships. The attack surface as well as motivations for attack attempts also are discussed to provide a perspective of how and why attacks are undertaken. Finally defence strategies are proposed as countermeasures.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The evolution of the Internet to an ubiquitous computing environment where massive amounts of devices will be connected. Sharing, receiving and acting upon data has brought in a problem of security. There are as many firmware and software update procedures as there are manufacturers. Therefore it would be good if a common solution could be found. We looked for suitable mechanisms in the past three years, to be used in Internet of Things networks as well as an up and coming research and standardization work. Our findings show that there indeed are good options for firmware update mechanisms that use state-of-The-Art technologies to deliver updates in a secure manner. While not all the mechanisms were specifically targeting deployment scenarios found in the Internet of Things, we still believe the concept of such update mechanism is suitable also for IoT use and thus can be adapted trivially to IoT networks and devices. We also propose a generic four-element model for secure firmware updates.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Close to 100% employment of students and easy access to abundance of information on Internet has essentially changed student's learning practices and their earlier knowledge background, especially on rapidly progressing field of Software Engineering. On workplace they have to use technologies, which are used in practice of their employing enterprise, but often do not understand the scientific and/or technological principles on which these technologies are based. They seek explanations on Internet, but information on Internet is often low quality, one-sided and presented with business targets on mind - to get more users to technologies developed and sold by a business enterprise. Thus university has to explain basic principles of technologies what students already know and have used and correct some popular beliefs, which are supported by software vendors and based their business interests. Non-formal sources of knowledge - workplace training and Internet - do not reduce teacher's task, but force teachers constantly study all new which appears in this field, thus increase teachers workload. Students increasing use of non-formal sources of knowledge imply need for flipping the process - instead of teaching students are set to learn from provided detailed tutorials. Use of Internet and work has made self-study, seeking information from Internet sources very customary for current students, thus such flipping worked very well in a game programming course provided by the first author.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We have focused our paper on the aspects important in adapting an Information System (IS) to the user's cultural background. We are interested both in the factors related to IS development and in the use of IS. Increasingly, ISs are being developed and used in a global context. We have perceived differences in expectations of functionalities, architecture, structural properties, information search practices, web-based system properties, and user interfaces. One conclusion would be that a high quality IS reflects user behavior in its use context. In that case, the system has to model its user one way or another. Until now, the topic has been handled without meaningful effort to model user behavior. Current publications cover a wide variety of rules on how to take into account cultural differences in the IS context. In this paper, our aim is to study the current state-of-the-art of user modeling - modeling the human being as an IS user. We start with general aspects related to the role of the user in IS development and alternatives to adaptable systems. The findings are applicable in the educational context as well. More and more, the use of computers and ISs is becoming an essential part of studies: the use of MOOCs (Massively Open Online Courses) as a part or replacement for traditional face-to-face classes; flipped learning methodology emphasizing the significance of self-learning; and blended learning, including quite often computerized study content. Our focus is on the global context, in which students represent different cultures and the IS is globally available.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Performance measurement tools and techniques have become very significant in today's industries for increasing the efficiency of their processes in order to face the competitive market. The first step towards performance measurement is the real-time monitoring and gathering of the data from the manufacturing system. Applying these performance measurement techniques on real-world industry in a way that is more general and efficient is the next challenge. This paper presents a methodology for implementing the key performance indicators defined in the ISO 22400 standard-Automation systems and integration, Key performance indicators (KPIs) for manufacturing operations management. The proposed methodology is implemented on a multi robot line simulator for measuring its performance at runtime. The approach implements a knowledge-based system within an ontology model which describes the environment, the system and the KPIs. In fact, the KPIs semantic descriptions are based on the data models presented in the Key Performance Indicators Markup Language (KPIML), which is an XML implementation of models developed by the Manufacturing Enterprise Solutions Association (MESA) international organization.
EXT="Muhammad, Usman"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The evolution of industries and their needs towards the implementation of Industry 4.0 based systems has brought both new technological challenges and opportunities. This article proposes the adoption and deployment of cloud robotics at factories to enhance the control and monitoring of processes, such as handling materials multiple assemblies in single cells. The ultimate research objective of this research is the offloading computation and integrating cloud robotics into an industrial scenario. However, the investigation of state of the art techniques, tools and technologies, and the development of functional prototypes is beforehand required. Then, this article presents a small-scale system as a prototype that employs the Google cloud vision API as a resource that, in turn, is used by networked agents for supporting the decision-making in the process of handling material commodities at factory shop floor. The overall concept as well as the interaction between the main actors of the prototype is detailed. Finally, further research directions are discussed.
INT=aut,"Hussnain, Ali"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Transformation of synchronous data flow graphs (SDF) into equivalent homogeneous SDF representations has been extensively applied as a pre-processing stage when mapping signal processing algorithms onto parallel platforms. While this transformation helps fully expose task and data parallelism, it also presents several limitations such as an exponential increase in the number of actors and excessive communication overhead. Partial expansion graphs were introduced to address these limitations for multi-core platforms. However, existing solutions are not well-suited to achieve efficient scheduling on many-core architectures. In this article, we develop a new approach that employs cyclo-static data flow techniques to provide a simple but efficient method of coordinating the data production and consumption in the expanded graphs. We demonstrate the advantage of our approach through experiments on real application models.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Differential modulation has largely re-attracted the attention of academia and industry due to its advantages relating to simple implementation and no need for knowledge of channel state information. The present work analyzes the average bit error rate performance of dual-hop cooperative systems over generalized multipath fading conditions. The considered system is differentially modulated and is assumed to operate based on the amplify-and-forward relaying protocol. Therefore, the main advantage of the considered set up is that it does not require any channel state information neither at the relay nor at the destination nodes. Novel closed-form expressions are derived for the end-to-end error rate under asymmetric generalized multipath fading conditions, which are encountered in realistic wireless communication scenarios. These expressions are subsequently employed in quantifying the effect of generalized fading conditions on the achieved bit error rate performance. It is shown that the impact of multipath fading and shadowing effects is detrimental at both high and low signal-to-noise ratio regimes as the corresponding deviations are often close to an order of magnitude. The incurred difference is also significantly different than the conventional Rayleigh fading conditions, which verifies that accurate channel characterization is of paramount importance in the effective design of conventional and emerging wireless technologies. In addition, it indicates that differential modulation can be a suitable modulation scheme for relay systems, under certain conditions, since it can provide adequate performance at a reduced implementation complexity.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Novel composite fading models were recently proposed based on inverse gamma distributed shadowing conditions. These models were extensively shown to provide remarkable modeling of the simultaneous occurrence of multipath fading and shadowing phenomena in emerging wireless scenarios such as cellular, off-body and vehicle-to-vehicle communications. Furthermore, the algebraic representation of these models is rather tractable, which renders them convenient to handle both analytically and numerically. The present contribution presents the major theoretical and practical characteristics of the η - μ / inverse gamma composite fading model, followed by a thorough ergodic capacity analysis. To this end, novel analytic expressions are derived, which are subsequently used in the evaluation of the corresponding system performance. In this context, the offered results are compared with respective results from cases assuming conventional fading conditions, which leads to the development of numerous insights on the effect of the multipath fading and shadowing severity on the achieved capacity levels. It is expected that these results will be useful in the design of timely and highly demanding wireless technologies, such as wearable, cellular and inter-vehicular communications as well in wireless power transfer based applications in the context of the Internet of Things.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
An optimum packet length selection scheme to maximize the throughput of a smart utility network (SUN) is introduced under wireless local area network (WLAN) interference system. The traditional and the investigated segmented packet collision models (PCM) are compared in terms of packet error rate (PER) and maximum achievable throughput. Furthermore, we quantify the impact of minimum mean square error (MMSE) interference mitigation for the SUN in the coexistence of WLAN interfering packets over multipath Rayleigh fading channel. The effect of the distance between the WLAN transmitter and the SUN receiver on the probability of error is also investigated.
EXT="Hamila, Ridha"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Non-orthogonal multiple access (NOMA) has been recently proposed as a viable technology that can potentially improve the spectral efficiency of fifth generation (5G) wireless networks and beyond. However, in practical communication scenarios, transceiver architectures inevitably suffer from radio-frequency (RF) front-end related impairments that can lead to degradation of the overall system performance, with in-phase/quadrature-phase imbalance (IQI) constituting a major impairment in direct-conversion transceivers. In the present work, we quantify the effects of joint transmitter/receiver IQI on the performance of NOMA based multi-carrier (MC) systems under multipath fading conditions. Furthermore, we derive the asymptotic diversity order of the considered MC NOMA set up. Capitalizing on these results, we demonstrate that the effects of IQI differ considerably among NOMA users and depend on the underlying system parameters. For example, it is shown that the first sorted user appears more robust to IQI, which indicates that higher order users are more sensitive to the considered non-negligible impairment.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Contribution to journal › Editorial › Scientific
Design and implementation of smart vision systems often involve the mapping of complex image processing algorithms into efficient, real-time implementations on multicore platforms. In this paper, we describe a novel design tool that is developed to address this important challenge. A key component of the tool is a new approach to hierarchical dataflow scheduling that integrates a global scheduler and multiple local schedulers. The local schedulers are lightweight modules that work independently. The global scheduler interacts with the local schedulers to optimize overall memory usage and execution time. The proposed design tool is demonstrated through a case study involving an image stitching application for large scale microscopy images.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The increasing use of heterogeneous embedded systems with multi-core CPUs and Graphics Processing Units (GPUs) presents important challenges in effectively exploiting pipeline, task, and data-level parallelism to meet throughput requirements of digital signal processing applications. Moreover, in the presence of system-level memory constraints, hand optimization of code to satisfy these requirements is inefficient and error prone and can therefore, greatly slow down development time or result in highly underutilized processing resources. In this article, we present vectorization and scheduling methods to effectively exploit multiple forms of parallelism for throughput optimization on hybrid CPU-GPU platforms, while conforming to system-level memory constraints. The methods operate on synchronous dataflow representations, which are widely used in the design of embedded systems for signal and information processing. We show that our novel methods can significantly improve system throughput compared to previous vectorization and scheduling approaches under the same memory constraints. In addition, we present a practical case-study of applying our methods to significantly improve the throughput of an orthogonal frequency division multiplexing receiver system for wireless communications.
Research output: Contribution to journal › Article › Scientific › peer-review
Directional deafness problem is one of the most important challenges in beamforming-based channel access at mmWave frequencies, which is believed to have detrimental effects on system performance in form of excessive delays and significant packet drops. In this paper, we contribute a quantitative analysis of deafness in directional random access systems operating in unlicensed bands by relying on stochastic geometry formulations. We derive a general numerical approach that captures the behavior of deafness probability as well as provide a closed-form solution for a typical sector-shaped antenna model, which is then may be extended to a more realistic two-sector pattern. Finally, employing contemporary IEEE 802.11ad modeling numerology, we illustrate our analysis revealing the importance of deafness-related considerations and their system- level impact.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The use of extremely high frequency (EHF) bands, known as millimeter-wave (mmWave) frequencies, requires densification of cells to maintain system performance at required levels. This may lead to potential increase of interference in practical mmWave networks, thus making it the limiting factor. On the other hand, attractive utilization of dual-polarized antennas may improve over this situation by mitigating some of the interfering components, which can be employed as part of interference control techniques. In this paper, an accurate two-stage ray-based characterization is conducted that models interference-related metrics while taking into account a detailed dual- polarized antenna model. In particular, we confirm that narrower pencil-beam antennas (HPBW = 13) have significant advantages as compared to antennas with relatively narrow beams (HPBW = 20 and HPBW = 50) in the environments with high levels of interference. Additionally, we demonstrate that in the Manhattan grid deployment a transition from interference- to noise-limited regime and back occurs at the cell inter-site distances of under 90 m and over 180 m, respectively.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Recently, new opportunities for utilizing extremely high frequencies have become instrumental in developing fifth-generation(5G) mobile technology. The use of highly directional antennas in millimeter-wave (mmWave) bands poses an important question of whether two- dimensional modeling suffices to capture the resulting system performance. Accounting for the effects of human body blockage by mmWave transmissions, in this work we compare the performance of the conventional two-dimensional and the proposed three- dimensional modeling. With our stochastic geometry based approach, we consider the aggregate interference and signal-to- interference ratio (SIR) to be the main metrics of interest. Both counterpart models attempt to capture the inherent behavior of 5G mmWave systems by incorporating the effects of human body blockage and antenna directivity. We thus deliver a realistic numerical assessment by comparing the three-dimensional modeling with its two-dimensional projection to reveal the resulting discrepancy.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Current forecasts predict that the Industrial Internet of Things (IIoT) will comprise about 10 billion devices by 2020. Because of its unique and novel demands, the emerging concepts of Industry 4.0 and SmartGrid networks have recently been coined and are now well established in the technical speech. In this work, we propose a newly developed multi-platform software tool aimed at testing the capabilities of Wireless MBus (WM-Bus) networks via simulating sensor-like behavior in uni-directional communication with a remote data concentrator. Building on our developed machine-type communication gateway (MTCG) able to receive WM-Bus data, we extend the set of features and introduce new machine-type communication device (MTCD) capable of emulating the corresponding data transmissions. As utility companies lack these features, our software implementation and hardware design open the door to initial verification of WM-Bus-based data transmissions for them without the need for investing into expensive development and certification of smart meters, where WM-Bus is utilized for data transmissions.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The Internet of Things (IoT) ecosystem is evolving towards the deployment of integrated environments, wherein heterogeneous devices pool their capacities together to match wide-ranging user and service requirements. As a consequence, solutions for efficient and synergistic cooperation among objects acquire great relevance. Along this line, this paper focuses on the adoption of the promising MIFaaS (Mobile-IoT-Federation-as-a-Service) paradigm to support delay-sensitive applications for high-end IoT devices in next-to-come fifth generation (5G) environments. MIFaaS fosters the provisioning of IoT services and applications with low-latency requirements by leveraging cooperation among private/public clouds of IoT objects at the edge of the network. A performance assessment of the MIFaaS paradigm in a cellular 5G environment based on both Long Term Evolution (LTE) and the recent Narrowband IoT (NB-IoT) is presented. Obtained results demonstrate that the proposed solution outperforms classic approaches, highlighting significant benefits derived from the joint use of LTE and NB-IoT bandwidths in terms of increased number of successfully delivered IoT services.
INT=ELT, "Orsino, A."
Research output: Contribution to journal › Article › Scientific › peer-review
This paper presents a model-based design method and a corresponding new software tool, the HTGS Model-Based Engine (HMBE), for designing and implementing dataflow-based signal processing applications on multi-core architectures. HMBE provides complementary capabilities to HTGS (Hybrid Task Graph Scheduler), a recently-introduced software tool for implementing scalable workflows for high performance computing applications on compute nodes with high core counts and multiple GPUs. HMBE integrates model-based design approaches, founded on dataflow principles, with advanced design optimization techniques provided in HTGS. This integration contributes to (a) making the application of HTGS more systematic and less time consuming, (b) incorporating additional dataflow-based optimization capabilities with HTGS optimizations, and (c) automating significant parts of the HTGS-based design process using a principled approach. In this paper, we present HMBE with an emphasis on the model-based design approaches and the novel dynamic scheduling techniques that are developed as part of the tool. We demonstrate the utility of HMBE via two case studies: an image stitching application for large microscopy images and a background subtraction application for multispectral video streams.
Research output: Contribution to journal › Article › Scientific › peer-review
Outliers are samples that are generated by different mechanisms from other normal data samples. Graphs, in particular social network graphs, may contain nodes and edges that are made by scammers, malicious programs or mistakenly by normal users. Detecting outlier nodes and edges is important for data mining and graph analytics. However, previous research in the field has merely focused on detecting outlier nodes. In this article, we study the properties of edges and propose effective outlier edge detection algorithm. The proposed algorithms are inspired by community structures that are very common in social networks. We found that the graph structure around an edge holds critical information for determining the authenticity of the edge. We evaluated the proposed algorithms by injecting outlier edges into some real-world graph data. Experiment results show that the proposed algorithms can effectively detect outlier edges. In particular, the algorithm based on the Preferential Attachment Random Graph Generation model consistently gives good performance regardless of the test graph data. More important, by analyzing the authenticity of the edges in a graph, we are able to reveal underlying structure and properties of a graph. Thus, the proposed algorithms are not limited in the area of outlier edge detection. We demonstrate three different applications that benefit from the proposed algorithms: (1) a preprocessing tool that improves the performance of graph clustering algorithms; (2) an outlier node detection algorithm; and (3) a novel noisy data clustering algorithm. These applications show the great potential of the proposed outlier edge detection techniques. They also address the importance of analyzing the edges in graph mining—a topic that has been mostly neglected by researchers.
EXT="Kiranyaz, Serkan"
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we present a new software tool, called HTGS Model-based Engine (HMBE), for the design and implementation of multicore signal processing applications. HMBE provides complementary capabilities to HTGS (Hybrid Task Graph Scheduler), which is a recently-introduced software tool for implementing scalable workflows for high performance computing applications. HMBE integrates advanced design optimization techniques provided in HTGS with model-based approaches that are founded on dataflow principles. Such integration contributes to (a) making the application of HTGS more systematic and less time consuming, (b) incorporating additional dataflow-based optimization capabilities with HTGS optimizations, and (c) automating significant parts of the HTGS-based design process. In this paper, we present HMBE with an emphasis on novel dynamic scheduling techniques that are developed as part of the tool. We demonstrate the utility of HMBE through a case study involving an image stitching application for large scale microscopy images.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
Two parallel phenomena are gaining attention in human–computer interaction research: gamification and crowdsourcing. Because crowdsourcing's success depends on a mass of motivated crowdsourcees, crowdsourcing platforms have increasingly been imbued with motivational design features borrowed from games; a practice often called gamification. While the body of literature and knowledge of the phenomenon have begun to accumulate, we still lack a comprehensive and systematic understanding of conceptual foundations, knowledge of how gamification is used in crowdsourcing, and whether it is effective. We first provide a conceptual framework for gamified crowdsourcing systems in order to understand and conceptualize the key aspects of the phenomenon. The paper's main contributions are derived through a systematic literature review that investigates how gamification has been examined in different types of crowdsourcing in a variety of domains. This meticulous mapping, which focuses on all aspects in our framework, enables us to infer what kinds of gamification efforts are effective in different crowdsourcing approaches as well as to point to a number of research gaps and lay out future research directions for gamified crowdsourcing systems. Overall, the results indicate that gamification has been an effective approach for increasing crowdsourcing participation and the quality of the crowdsourced work; however, differences exist between different types of crowdsourcing: the research conducted in the context of crowdsourcing of homogenous tasks has most commonly used simple gamification implementations, such as points and leaderboards, whereas crowdsourcing implementations that seek diverse and creative contributions employ gamification with a richer set of mechanics.
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow models of computation are capable of providing high-level descriptions for hardware and software components and systems, facilitating efficient processes for system-level design. The modularity and parallelism of dataflow representations make them suitable for key aspects of design exploration and optimization, such as efficient scheduling, task synchronization, memory and power management. The lightweight dataflow (LWDF) programming methodology provides an abstract programming model that supports dataflow-based design of signal processing hardware and software components and systems. Due to its formulation in terms of abstract application programming interfaces, the LWDF methodology can be integrated with a wide variety of simulation- and implementation-oriented languages, and can be targeted across different platforms, which allows engineers to integrate dataflow modeling approaches relatively easily into existing design processes. Previous work on LWDF techniques has emphasized their application to DSP software implementation (e.g., through integration with C and CUDA). In this paper, we efficiently integrate the LWDF methodology with hardware description languages (HDLs), and we apply this HDL-integrated form of the methodology to develop efficient methods for low power DSP hardware implementation. The effectiveness of the proposed LWDF-based hardware design methodology is demonstrated through a case study of a deep neural network application for vehicle classification.
INT=tie,"Xie, Renjie"
Research output: Contribution to journal › Article › Scientific › peer-review
In this paper, we propose a novel framework, called Hierarchical MDP framework for Compact System-level Modeling (HMCSM), for design and implementation of adaptive embedded signal processing systems. The HMCSM framework applies Markov decision processes (MDPs) to enable autonomous adaptation of embedded signal processing under multidimensional constraints and optimization objectives. The framework integrates automated, MDP-based generation of optimal reconfiguration policies, dataflow-based application modeling, and implementation of embedded control software that carries out the generated reconfiguration policies. HMCSM systematically decomposes a complex, monolithic MDP into a set of separate MDPs that are connected hierarchically, and that operate more efficiently through such a modularized structure. We demonstrate the effectiveness of our new MDP-based system design framework through experiments with an adaptive wireless communications receiver.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The main target of this paper is to perform the multidimensional analysis of multipath propagation at higher frequencies i.e. 15 GHz and 28 GHz, using 'sAGA' a 3D ray tracing tool. A real world outdoor Line of Sight (LOS) microcellular environment from the Yokusuka city of Japan is considered for the analysis. The simulation data acquired from the 3D ray tracing tool includes the received signal strength, power angular spectrum and the power delay profile. The different propagation mechanisms were closely analyzed. The simulation results show the difference of propagation at two frequencies i.e. 15 GHz and 28 GHz and draw a special attention on the impact of diffuse scattering at 28 GHz. In a simple outdoor microcellular environment with a valid LOS link between the transmitter and a receiver, a path loss difference of around 5.7 dB was found between 15 GHz and 28 GHz frequency of operation. However, the propagation loss at higher frequency can be compensated by using the antenna with narrow beamwidth and larger gain.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The main target of this research work is to study the provision of indoor service (coverage) using outdoor base stations at higher frequencies i.e. 10 GHz in the context of a single building scenario. In an outdoor to indoor propagation, an angular wall loss model is used in the General Building Penetration (GBP) model for estimating the additional loss at the intercept point of the building exterior wall. A novel angular wall loss model based on a separate incidence angle in azimuth and elevation plane is proposed in this paper. In the second part of this study, an Extended Building Penetration (EBP) model is proposed, and the performance of EBP model is compared with the GBP model. In EBP model, the additional fifth path known as the 'Direct path' is proposed to be included in the GBP model. Based on the evaluation results, the impact of the direct path is found significant for the indoor users having the same or closed by height as that of the height of the transmitter.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper investigates the feasibility of a radio access system with a self-backhauling access node under full-duplex and half-duplex operation modes. In particular, after making certain simplifying assumptions, closed-form solutions for the feasibility conditions of such a radio access system are derived for both of the considered operation modes. Furthermore, the analysis incorporates given quality of service (QoS) constraints for the system, defined in terms of minimum data rates. The numerical results show that the full-duplex scheme outperforms the corresponding half-duplex scheme under most circumstances, both in terms of the highest achievable rates and the highest tolerable path losses, when given the same QoS target. However, this requires a certain amount of self-interference attenuation in the access node. Performing a similar feasibility analysis without any simplifications in the system model is an important future work item.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In-band full-duplex (FD) operation can be regarded as one of the greatest discoveries in civilian/commercial wireless communications so far in this century. The concept is significant because it can as much as double the spectral efficiency of wireless data transmission by exploiting the new-found capability for simultaneous transmission and reception (STAR) that is facilitated by advanced self-interference cancellation (SIC) techniques. As the first of its kind, this paper surveys the prospects of exploiting the emerging FD radio technology in military communication applications as well. In addition to spectrally efficient two-way data transmission, the STAR capability could give a major technical advantage for armed forces by allowing their radio transceivers to conduct electronic warfare at the same time when they are also receiving or transmitting information signals at the same frequency band. After providing a detailed introduction to FD transceiver architectures and SIC requirements in military communications, this paper outlines and analyzes some potential defensive and offensive applications of the STAR capability.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Contribution to journal › Editorial › Scientific
This paper presents an integrated self-aware computing model mitigating the power dissipation of a heterogeneous reconfigurable multicore architecture by dynamically scaling the operating frequency of each core. The power mitigation is achieved by equalizing the performance of all the cores for an uninterrupted exchange of data. The multicore platform consists of heterogeneous Coarse-Grained Reconfigurable Arrays (CGRAs) of application-specific sizes and a Reduced Instruction-Set Computing (RISC) core. The CGRAs and the RISC core are integrated with each other over a Network-on-Chip (NoC) of six nodes arranged in a topology of two rows and three columns. The RISC core constantly monitors and controls the performance of each CGRA accelerator by adjusting the operating frequencies unless the performance of all the CGRAs is optimally balanced over the platform. The CGRA cores on the platform are processing some of the most computationally-intensive signal processing algorithms while the RISC core establishes packet based synchronization between the cores for computation and communication. All the cores can access each other’s computational and memory resources while processing the kernels simultaneously and independently of each other. Besides general-purpose processing and overall platform supervision, the RISC processor manages performance equalization among all the cores which mitigates the overall dynamic power dissipation by 20.7 % for a proof-of-concept test.
Research output: Contribution to journal › Article › Scientific › peer-review
Depending on one's viewpoint, a generic standards-compatible web browser supports three, four or five built-in application rendering and programming models. In this paper, we provide an overview of the built-in client-side web application architectures. While the dominance of the base HTML/CSS/JS technologies cannot be ignored, we foresee Web Components and WebGL gaining popularity as the world moves towards more complex and even richer web applications, including systems supporting virtual and augmented reality.
EXT="Taivalsaari, Antero"
EXT="Mikkonen, Tommi"
jufoid=69204
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Dataflow modeling techniques facilitate many aspects of design exploration and optimization for signal processing systems, such as efficient scheduling, memory management, and task synchronization. The lightweight dataflow (LWDF) programming methodology provides an abstract programming model that supports dataflow-based design and implementation of signal processing hardware and software components and systems. Previous work on LWDF techniques has emphasized their application to DSP software implementation. In this paper, we present new extensions of the LWDF methodology for effective integration with hardware description languages (HDLs), and we apply these extensions to develop efficient methods for low power DSP hardware implementation. Through a case study of a deep neural network application for vehicle classification, we demonstrate our proposed LWDF-based hardware design methodology, and its effectiveness in low power implementation of complex signal processing systems.
INT=tie,"Xie, Renjie"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
As next-generation mobile networks are rapidly taking shape driven by the target standardization requirements and initial trial implementations, a range of accompanying technologies prepare to support them with more reliable wireless access and improved service provisioning. Among these are more advanced spectrum sharing options enabled by the emerging Licensed Shared Access (LSA) regulatory framework, which aims to efficiently employ the capacity of underutilized frequency bands in a controlled manner. The concept of LSA promises to equip network operators with the much needed additional spectrum on the secondary basis and thus brings changes to the existing cellular network management. Hence, additional research is in prompt demand to determine the required levels of Quality of Service (QoS) and service provisioning reliability, especially in cases of dynamic geographical and temporal LSA sharing. Motivated by this recent urge and having at our disposal a fully-functional 3GPP LTE cellular deployment, we have committed to implement and trial the principles of dynamic LSA-compatible spectrum management. This paper is our first disclosure on the comprehensive experimental evaluation of this promising technology. We expect that these unprecedented practical results together with the key lessons learned will become a valuable reference point for the subsequent integration of flexible LSA-based services, suitable for inter-operator and multi-tenant spectrum sharing.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This publication discusses how automatic verification of concurrent systems can be made more efficient by focusing on always may-terminating systems. First, making a system always may-terminating is a method formeeting a modelling need that exists independently of this publication. It is illustrated that without doing so, non-progress errors may be lost. Second, state explosion is often alleviated with stubborn, ample, and persistent set methods. They use expensive cycle or terminal strong component conditions in many cases. It is proven that for many important classes of properties, if the systems are always may-terminating, then these conditions can be left out.
Research output: Contribution to journal › Article › Scientific › peer-review
Designing applications for scalability is key to improving their performance in hybrid and cluster computing. Scheduling code to utilize parallelism is difficult, particularly when dealing with data dependencies, memory management, data motion, and processor occupancy. The Hybrid Task Graph Scheduler (HTGS) improves programmer productivity when implementing hybrid workflows for multi-core and multi-GPU systems. The Hybrid Task Graph Scheduler (HTGS) is an abstract execution model, framework, and API that increases programmer productivity when implementing hybrid workflows for such systems. HTGS manages dependencies between tasks, represents CPU and GPU memories independently, overlaps computations with disk I/O and memory transfers, keeps multiple GPUs occupied, and uses all available compute resources. Through these abstractions, data motion and memory are explicit; this makes data locality decisions more accessible. To demonstrate the HTGS application program interface (API), we present implementations of two example algorithms: (1) a matrix multiplication that shows how easily task graphs can be used; and (2) a hybrid implementation of microscopy image stitching that reduces code size by ≈ 43% compared to a manually coded hybrid workflow implementation and showcases the minimal overhead of task graphs in HTGS. Both of the HTGS-based implementations show good performance. In image stitching the HTGS implementation achieves similar performance to the hybrid workflow implementation. Matrix multiplication with HTGS achieves 1.3x and 1.8x speedup over the multi-threaded OpenBLAS library for 16k × 16k and 32k × 32k size matrices, respectively.
Research output: Contribution to journal › Article › Scientific › peer-review
Full use of the parallel computation capabilities of present and expected CPUs and GPUs requires use of vector extensions. Yet many actors in data flow systems for digital signal processing have internal state (or, equivalently, an edge that loops from the actor back to itself) that impose serial dependencies between actor invocations that make vectorizing across actor invocations impossible. Ideally, issues of inter-thread coordination required by serial data dependencies should be handled by code written by parallel programming experts that is separate from code specifying signal processing operations. The purpose of this paper is to present one approach for so doing in the case of actors that maintain state. We propose a methodology for using the parallel scan (also known as prefix sum) pattern to create algorithms for multiple simultaneous invocations of such an actor that results in vectorizable code. Two examples of applying this methodology are given: (1) infinite impulse response filters and (2) finite state machines. The correctness and performance of the resulting IIR filters and one class of FSMs are studied.
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow programming has received increasing attention in the age of multicore and heterogeneous computing. Modular and concurrent dataflow program descriptions enable highly automated approaches for design space exploration, optimization and deployment of applications. A great advance in dataflow programming has been the recent introduction of the RVC-CAL language. Having been standardized by the ISO, the RVC-CAL dataflow language provides a solid basis for the development of tools, design methodologies and design flows. This paper proposes a novel design flow for mapping RVC-CAL dataflow programs to parallel and heterogeneous execution platforms. Through the proposed design flow the programmer can describe an application in the RVC-CAL language and map it to multi- and many-core platforms, as well as GPUs, for efficient execution. The functionality and efficiency of the proposed approach is demonstrated by a parallel implementation of a video processing application and a run-time reconfigurable filter for telecommunications. Experiments are performed on GPU and multicore platforms with up to 16 cores, and the results show that for high-performance applications the proposed design flow provides up to 4 × higher throughput than the state-of-the-art approach in multicore execution of RVC-CAL programs.
Research output: Contribution to journal › Article › Scientific › peer-review
The advent of the Internet of Things (IoT) is boosting a wide range of new multimedia applications driven by an ecosystems of "smart" and highly heterogeneous devices. This introduces new challenges for industries and network operators in the design of next-to-come fifth generation (5G) wireless systems, where stringent performance requirements in terms of high data rate, improved reliability, and ultra-low latency are to be met. A viable way of development is the integration of local clouds at the edge of the network as part of the recent Edge Computing paradigm. However, poor channel conditions experienced by the devices towards the serving edge node may impede the effectiveness of managing mission-critical IoT applications. In this context, Device-to-Device (D2D) communications represent a key enabling technology, which offers decisive benefits for future mobile 5G scenarios. As proposed in this paper, edgebased IoT applications may rely on D2D transmissions between the IoT devices also in the presence of mobility. In particular, a forwarding scheme is proposed showing that whenever collaborating IoT devices fall under the coverage of neighboring cellular edge nodes, D2D communications can guarantee a significant reduction in delay and traffic load across the network. The proposed solution is validated through simulations that indicate significant improvements in terms of latency, percentage of served tasks, energy efficiency, and traffic load w.r.t. the case where all communications are forwarded over the edge nodes.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Efficient sample rate conversion is of widespread importance in modern communication and signal processing systems. Although many efficient kinds of polyphase filterbank structures exist for this purpose, they are mainly geared toward serial, custom, dedicated hardware implementation for a single task. There is, therefore, a need for more flexible sample rate conversion systems that are resource-efficient, and provide high performance. To address these challenges, we present in this paper an all-software-based, fully parallel, multirate resampling method based on graphics processing units (GPUs). The proposed approach is well-suited for wireless communication systems that have simultaneous requirements on high throughput and low latency. Utilizing the multidimensional architecture of GPUs, our design allows efficient parallel processing across multiple channels and frequency bands at baseband. The resulting architecture provides flexible sample rate conversion that is designed to address modern communication requirements, including real-time processing of multiple carriers simultaneously.
Research output: Contribution to journal › Article › Scientific › peer-review
Digital predistortion (DPD) is a widely adopted baseband processing technique in current radio transmitters. While DPD can effectively suppress unwanted spurious spectrum emissions stemming from imperfections of analog RF and baseband electronics, it also introduces extra processing complexity and poses challenges on efficient and flexible implementations, especially for mobile cellular transmitters, considering their limited computing power compared to basestations. In this paper, we present high data rate implementations of broadband DPD on modern embedded processors, such as mobile GPU and multicore CPU, by taking advantage of emerging parallel computing techniques for exploiting their computing resources. We further verify the suppression effect of DPD experimentally on real radio hardware platforms. Performance evaluation results of our DPD design demonstrate the high efficacy of modern general purpose mobile processors on accelerating DPD processing for a mobile transmitter.
Research output: Contribution to journal › Article › Scientific › peer-review
3D urban maps with semantic labels and metric information are not only essential for the next generation robots such autonomous vehicles and city drones, but also help to visualize and augment local environment in mobile user applications. The machine vision challenge is to generate accurate urban maps from existing data with minimal manual annotation. In this work, we propose a novel methodology that takes GPS registered LiDAR (Light Detection And Ranging) point clouds and street view images as inputs and creates semantic labels for the 3D points clouds using a hybrid of rule-based parsing and learning-based labelling that combine point cloud and photometric features. The rule-based parsing boosts segmentation of simple and large structures such as street surfaces and building facades that span almost 75% of the point cloud data. For more complex structures, such as cars, trees and pedestrians, we adopt boosted decision trees that exploit both structure (LiDAR) and photometric (street view) features. We provide qualitative examples of our methodology in 3D visualization where we construct parametric graphical models from labelled data and in 2D image segmentation where 3D labels are back projected to the street view images. In quantitative evaluation we report classification accuracy and computing times and compare results to competing methods with three popular databases: NAVTEQ True, Paris-Rue-Madame and TLS (terrestrial laser scanned) Velodyne.
EXT="Babahajiani, Pouria"
Research output: Contribution to journal › Article › Scientific › peer-review
During the past 15 years, the Internet revolution has redefined the industry landscape. The advent of the Internet of Things (IoT) is changing our lives by provisioning a wide range of novel applications that leverage the ecosystem of "smart" and highly heterogeneous devices. This is expected to dramatically transform manufacturing, energy, agriculture, transportation, and other industrial sectors. The Industrial Internet of Things (IIoT) brings along a new wave of Internet evolution and will offer unprecedented opportunities in Machine Type Communications (MTC) - intelligent industrial products, processes, and services that communicate with each other and with people over the global network. This paper delivers a technology overview of the currently utilized Wireless M-Bus communication protocol within the IIoT landscape together with describing a demonstration prototype development. In our trial implementation, the IQRF modules are utilized to be compatible with the protocol of interest. The constructed WM-Bus receiver is further integrated as part of a complex MTC Gateway, which receives the MTC data via a secure communication channel from various types of smart-metering devices.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The current trend in high performance and embedded signal processing consists of designing increasingly complex heterogeneous hardware architectures with non-uniform communication resources. In order to take hardware and software design decisions, early evaluations of the system non-functional properties are needed. These evaluations of system efficiency require high-level information on both the algorithms and the architecture. In this paper, we define the notion of Model of Architecture (MoA) and study the combination of a Model of Computation (MoC) and an MoA to provide a design space exploration environment for the study of the algorithmic and architectural choices. A cost is computed from the mapping of an application, represented by a model conforming a MoC onto an architecture represented by a model conforming an MoA. The cost is composed of a processing-related part and a communicationrelated part. It is an abstract scalar value to be minimized and can represent any non-functional requirement of a system such as memory, energy, throughput or latency.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Tracking algorithms have important applications in detection of humans and vehicles for border security and other areas. For large-scale deployment of such algorithms, it is critical to provide methods for their cost- and energy-efficient realization. To this end, commodity mobile devices have significant potential for use as prototyping and testing platforms due to their low cost, widespread availability, and integration of advanced communications, sensing, and processing features. Prototypes developed on mobile platforms can be tested, fine-tuned, and demonstrated in the field and then provide reference implementations for application-specific disposable sensor node implementations that are targeted for deployment. In this paper, we develop a novel, adaptive tracking system that is optimized for energy-efficient, real-time operation on off-the-shelf mobile platforms. Our tracking system applies principles of dynamic data-driven application systems (DDDAS) to periodically monitor system operating characteristics and apply these measurements to dynamically adapt the specific classifier configurations that the system employs. Our resulting adaptive approach enables powerful optimization of trade-offs among energy consumption, real-time performance, and tracking accuracy based on time-varying changes in operational characteristics. Through experiments employing an Android-based tablet platform, we demonstrate the efficiency of our proposed tracking system design for multimode detection of human and vehicle targets.
Research output: Contribution to journal › Article › Scientific › peer-review
The standard median filter based on a symmetric moving window has only one tuning parameter: the window width. Despite this limitation, this filter has proven extremely useful and has motivated a number of extensions: weighted median filters, recursive median filters, and various cascade structures. The Hampel filter is a member of the class of decsion filters that replaces the central value in the data window with the median if it lies far enough from the median to be deemed an outlier. This filter depends on both the window width and an additional tuning parameter t, reducing to the median filter when t=0, so it may be regarded as another median filter extension. This paper adopts this view, defining and exploring the class of generalized Hampel filters obtained by applying the median filter extensions listed above: weighted Hampel filters, recursive Hampel filters, and their cascades. An important concept introduced here is that of an implosion sequence, a signal for which generalized Hampel filter performance is independent of the threshold parameter t. These sequences are important because the added flexibility of the generalized Hampel filters offers no practical advantage for implosion sequences. Partial characterization results are presented for these sequences, as are useful relationships between root sequences for generalized Hampel filters and their median-based counterparts. To illustrate the performance of this filter class, two examples are considered: one is simulation-based, providing a basis for quantitative evaluation of signal recovery performance as a function of t, while the other is a sequence of monthly Italian industrial production index values that exhibits glaring outliers.
Research output: Contribution to journal › Article › Scientific › peer-review
Multirate filter banks can be implemented efficiently using fast-convolution (FC) processing. The main advantage of the FC filter banks (FC-FB) compared with the conventional polyphase implementations is their increased flexibility, that is, the number of channels, their bandwidths, and the center frequencies can be independently selected. In this paper, an approach to optimize the FC-FBs is proposed. First, a subband representation of the FC-FB is derived. Then, the optimization problems are formulated with the aid of the subband model. Finally, these problems are conveniently solved with the aid of a general nonlinear optimization algorithm. Several examples are included to demonstrate the proposed overall design scheme as well as to illustrate the efficiency and the flexibility of the resulting FC-FB.
Research output: Contribution to journal › Article › Scientific › peer-review
Underwater communication systems have drawn the attention of the research community in the last 15 years. This growing interest can largely be attributed to new civil and military applications enabled by large-scale networks of underwater devices (e.g., underwater static sensors, unmanned autonomous vehicles (AUVs), and autonomous robots), which can retrieve information from the aquatic and marine environment, perform in-network processing on the extracted data, and transmit the collected information to remote locations. Currently underwater communication systems are inherently hardware-based and rely on closed and inflexible architectural design. This imposes significant challenges into adopting new underwater communication and networking technologies, prevent the provision of truly-differentiated services to highly diverse underwater applications, and induce great barriers to integrate heterogeneous underwater devices. Software-defined networking (SDN), recognized as the next-generation networking paradigm, relies on the highly flexible, programmable, and virtualizable network architecture to dramatically improve network resource utilization, simplify network management, reduce operating cost, and promote innovation and evolution. In this paper, a software-defined architecture, namely SoftWater, is first introduced to facilitate the development of the next-generation underwater communication systems. More specifically, by exploiting the network function virtualization (NFV) and network virtualization concepts, SoftWater architecture can easily incorporate new underwater communication solutions, accordingly maximize the network capacity, can achieve the network robustness and energy efficiency, as well as can provide truly differentiated and scalable networking services. Consequently, the SoftWater architecture can simultaneously support a variety of different underwater applications, and can enable the interoperability of underwater devices from different manufacturers that operate on different underwater communication technologies based on acoustic, optical, or radio waves. Moreover, the essential network management tools of SoftWater are discussed, including reconfigurable multi-controller placement, hybrid in-band and out-of-band control traffic balancing, and utility-optimal network virtualization. Furthermore, the major benefits of SoftWater architecture are demonstrated by introducing software-defined underwater networking solutions, including the throughput-optimal underwater routing, SDN-enhanced fault recovery, and software-defined underwater mobility management. The research challenges to realize the SoftWater are also discussed in detail.
Research output: Contribution to journal › Article › Scientific › peer-review
The liquid metaphor refers to software that operates seamlessly across multiple devices owned by one or multiple users. Liquid software architectures can dynamically deploy and redeploy stateful software components and transparently adapt them to the capabilities of heterogeneous target devices. The key design goal in liquid software development is to minimize the efforts that are related to multiple device ownership (e.g., installation, synchronization and general maintenance of personal computers, smartphones, tablets, home displays, cars and wear-able devices), while keeping the users in full control of their devices, applications and data. In this paper we present a design space for liquid software, categorizing and discussing the most important architectural issues and alternatives. These alternatives represent relevant capabilities offered by emerging technologies and deployment platforms that are then positioned and compared within the design space presented in the paper.
EXT="Taivalsaari, Antero"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Smart spaces with multiple interactive devices and motion tracking capabilities are becoming more common. However, there is little research on how interaction with one device affects the usage of other devices in the space. We investigate the effects of mobile devices and physical interactive devices on gestural interaction in motion-tracked environments. For our user study, we built a smart space consisting of a gesture-controlled large display, an NFC reader and a mobile device, to simulate a system in which users can transfer information between the space and personal devices. The study with 13 participants revealed that (1) the mobile device affects gesturing as well as passive stance; (2) users may stop moving completely when they intend to stop interacting with a display; (3) interactive devices with overlapping interaction space make unintentional interaction significantly more frequent. Our findings give implications for gestural interaction design as well as design of motion-tracked smart spaces.
INT=tie,"Järvi, Antti"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Heterogeneous computing platforms with multicore central processing units (CPUs) and graphics processing units (GPUs) are of increasing interest to designers of embedded signal processing systems since they offer the potential for significant performance boost while maintaining the flexibility of software-based design flows. Developing optimized implementations for CPU-GPU platforms is challenging due to complex, inter-related design issues, including task scheduling, interprocessor communication, memory management, and modeling and exploitation of different forms of parallelism. In this paper, we present an automated, dataflow based, design framework called DIF-GPU for application mapping and software synthesis on heterogeneous CPU-GPU platforms. DIF-GPU is based on novel extensions to the dataflow interchange format (DIF) package, which is a software environment for developing and experimenting with dataflow-based design methods and synthesis techniques for embedded signal processing systems. DIF-GPU exploits multiple forms of parallelism by deeply incorporating efficient vectorization and scheduling techniques for synchronous dataflow specifications, and incorporating techniques for streamlining interprocessor communication. DIF-GPU also provides software synthesis capabilities to help accelerate the process of moving from high-level application models to optimized implementations.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
We present a technique to improve the accuracy and to reduce the computational labor in the calculation of long-range interactions in systems with periodic boundary conditions. We extend the well-known Ewald method by using a linear combination of screening Gaussian charge distributions instead of only one. This enables us to find faster converging real-space and reciprocal space summations. The combined simplicity and efficiency of our method is demonstrated, and the scheme is readily applicable to large-scale periodic simulations, classical as well as quantum. Moreover, apart from the required a priori optimization the method is straightforward to include in most routines based on the Ewald method within, e.g., density-functional, molecular dynamics, and quantum Monte Carlo calculations.
Research output: Contribution to journal › Article › Scientific › peer-review
The paper proposes a method for the detection of bubble-like transparent objects in a liquid. The detection problem is non-trivial since bubble appearance varies considerably due to different lighting conditions causing contrast reversal and multiple interreflections. We formulate the problem as the detection of concentric circular arrangements (CCA). The CCAs are recovered in a hypothesize-optimize-verify framework. The hypothesis generation is based on sampling from the partially linked components of the non-maximum suppressed responses of oriented ridge filters, and is followed by the CCA parameter estimation. Parameter optimization is carried out by minimizing a novel cost-function. The performance was tested on gas dispersion images of pulp suspension and oil dispersion images. The mean error of gas/oil volume estimation was used as a performance criterion due to the fact that the main goal of the applications driving the research was the bubble volume estimation. The method achieved 28 and 13 % of gas and oil volume estimation errors correspondingly outperforming the OpenCV Circular Hough Transform in both cases and the WaldBoost detector in gas volume estimation.
Research output: Contribution to journal › Article › Scientific › peer-review
With the increasing amount of data being published on the Web, it is difficult to analyze their content within a short time. Topic modeling techniques can summarize textual data that contains several topics. Both the label (such as category or tag) and word co-occurrence play a significant role in understanding textual data. However, many conventional topic modeling techniques are limited to the bag-of-words assumption. In this paper, we develop a probabilistic model called Bigram Labeled Latent Dirichlet Allocation (BL-LDA), to address the limitation of the bag-of-words assumption. The proposed BL-LDA incorporates the bigram into the Labeled LDA (L-LDA) technique. Extensive experiments on Yelp data show that the proposed scheme is better than the L-LDA in terms of accuracy.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific
The medium access is regarded as being one of the most challenging issues to solve in order to provide deterministic wireless communications in vehicular networks. It has been demonstrated that standard protocols fail to properly address this issue. The implementation of deterministic Medium Access Control (MAC) protocols is hampered by the fact that commercial devices do not allow modifications to the standard MAC mechanism, and the development of a device from scratch to implement one MAC scheme is an extremely laborious endeavor. However, over the last few years, the IT2S platform for vehicular communications has been developed and is now in a stage that allows implementation and testing of new solutions for the vehicular communications environment. This paper presents an overview of MAC mechanisms capable of providing deterministic real-time access and assesses the features a communications device should include in order to allow the implementation of these mechanisms. It then proposes an implementation of such features taking advantage of the white box access to the IT2S platform, which is not usually available in COTS devices.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Having a large number of applications in the marketplace is considered a critical success factor for software ecosystems. The number of applications has been claimed to determine which ecosystems holds the greatest competitive advantage and will eventually dominate the market. This paper investigates the influence of developer multi-homing (i.e., participating in more than one ecosystem) in three leading mobile application ecosystems. Our results show that when regarded as a whole, mobile application ecosystems are single-homing markets. The results further show that 3% of all developers generate more than 80% of installed applications and that multi-homing is common among these developers. Finally, we demonstrate that the most installed content actually comprises only a small number of the potential value propositions. The results thus imply that attracting and maintaining developers of superstar applications is more critical for the survival of a mobile application ecosystem than the overall number of developers and applications. Hence, the mobile ecosystem is unlikely to become a monopoly. Since exclusive contracts between application developers and mobile application ecosystems are rare, multi-homing is a viable component of risk management and a publishing strategy. The study advances the theoretical understanding of the influence of multi-homing on competition in software ecosystems.
Research output: Contribution to journal › Article › Scientific › peer-review
The Intelligent Transportation Systems concept provides the ground to enable a wide range of applications to improve traffic safety and efficiency. Innovative communication systems must be proposed taking into account, on the one hand, unstable characteristics of vehicular communications and, on the other hand, different requirements of applications. In this paper a reliable (geo-)broadcasting scheme for vehicular ad-hoc networks is proposed and analyzed. This receiver-based technique aims at fulfilling the received message integrity yet keeping the overhead at a reasonably low level. The results are compared to simulation studies carried out in the Network Simulator-3 (NS-3) simulation environment demonstrating good agreement with each other. The analysis shows that in a single-hop scenario, receiver-based reliable broadcasting can provide good reliability, while giving very little overhead for high number of receivers.
Research output: Contribution to journal › Article › Scientific › peer-review
Fog Computing is a new paradigm that has been proposed by CISCO to take full advantage of the ever growing computational capacity of the near-user or edge devices (e.g., wireless gateways and sensors). The paradigm proposes an architecture that enables the devices to host functionality of various user-centric services. While the prospects of Fog Computing promise numerous advantages, development of Fog Services remains under-investigated. This article considers an opportunity of Fog implementation for Alert Services on top of Wireless Sensor Network (WSN) technology. In particular, we focus on targeted WSN-alert delivery based on spontaneous interaction between a WSN and hand-held devices of its users. For the alert delivery, we propose a Gravity Routing concept that prioritizes the areas of high user-presence within the network. Based on the concept, we develop a routing protocol, namely the Gradient Gravity Routing (GGR) that combines targeted delivery and resilience to potential sensor-load heterogeneity within the network. The protocol has been compared against a set of state-of-the-art solutions via a series of simulations. The evaluation has shown the ability of GGR to match the performance of the compared solutions in terms of alert delivery ratio, while minimizing the overall energy consumption of the network.
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
Wireless standards are evolving rapidly due to the exponential growth in the number of portable devices along with the applications with high data rate requirements. Adaptable software based signal processing implementations for these devices can make the deployment of the constantly evolving standards faster and less expensive. The flagship technology from the IEEE WLAN family, the IEEE 802.11ac, aims at achieving very high throughputs in local area connectivity scenarios. This article presents a software based implementation for the Multiple Input and Multiple Output (MIMO) transmitter and receiver baseband processing conforming to the IEEE 802.11ac standard which can achieve transmission bit rates beyond 1Gbps. This work focuses on the Physical layer frequency domain processing. Various configurations, including 2×2 and 4×4 MIMO are considered for the implementation. To utilize the available data and instruction level parallelism, a DSP core with vector extensions is selected as the implementation platform. Then, the feasibility of the presented software-based solution is assessed by studying the number of clock cycles and power consumption of the different scenarios implemented on this core. Such Software Defined Radio based approaches can potentially offer more flexibility, high energy efficiency, reduced design efforts and thus shorter time-to-market cycles in comparison with the conventional fixed-function hardware methods.
ORG=elt,0.5
ORG=tie,0.5
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow modeling offers a myriad of tools for designing and optimizing signal processing systems. A designer is able to take advantage of dataflow properties to effectively tune the system in connection with functionality and different performance metrics. However, a disparity in the specification of dataflow properties and the final implementation can lead to incorrect behavior that is difficult to detect. This motivates the problem of ensuring consistency between dataflow properties that are declared or otherwise assumed as part of dataflow-based application models, and the dataflow behavior that is exhibited by implementations that are derived from the models. In this paper, we address this problem by introducing a novel dataflow validation framework (DVF) that is able to identify disparities between an application’s formal dataflow representation and its implementation. DVF works by instrumenting the implementation of an application and monitoring the instrumentation data as the application executes. This monitoring process is streamlined so that DVF achieves validation without major overhead. We demonstrate the utility of our DVF through design and implementation case studies involving an automatic speech recognition application, a JPEG encoder, and an acoustic tracking application.
Research output: Contribution to journal › Article › Scientific › peer-review
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The problem of how to automatically provide a desired (required) visual quality in lossy compression of still images and video frames is considered in this paper. The quality can be measured based on different conventional and visual quality metrics. In this paper, we mainly employ human visual system (HVS) based metrics PSNR-HVS-M and MSSIM since both of them take into account several important peculiarities of HVS. To provide a desired visual quality with high accuracy, iterative image compression procedures are proposed and analyzed. An experimental study is performed for a large number of grayscale test images. We demonstrate that there exist several coders for which the number of iterations can be essentially decreased using a reasonable selection of the starting value and the variation interval for the parameter controlling compression (PCC). PCC values attained at the end of the iterative procedure may heavily depend upon the coder used and the complexity of the image. Similarly, the compression ratio also considerably depends on the above factors. We show that for some modern coders that take HVS into consideration it is possible to give practical recommendations on setting a fixed PCC to provide a desired visual quality in a non-iterative manner. The case when original images are corrupted by visible noise is also briefly studied.
Research output: Contribution to journal › Article › Scientific › peer-review
Noise reduction is often performed at an early stage of the image processing path. In order to keep the processing delays small in different computing platforms, it is important that the noise reduction is performed swiftly. In this paper, the block-matching and three-dimensional filtering (BM3D) denoising algorithm is implemented on heterogeneous computing platforms using OpenCL and CUDA frameworks. To our knowledge, these implementations are the first successful open source attempts to use GPU computation for BM3D denoising. The presented GPU implementations are up to 7.5 times faster than their respective CPU implementations. At the same time, the experiments illustrate general design challenges in using massively parallel processing platforms for the calculation of complex imaging algorithms.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Mobile devices have been identified as promising platforms for interactive vision-based applications. However, this type of applications still pose significant challenges in terms of latency, throughput and energy-efficiency. In this context, the integration of reconfigurable architectures on mobile devices allows dynamic reconfiguration to match the computation and data flow of interactive applications, demonstrating significant performance benefits compared to general purpose architectures. This paper presents concepts laying on platform level adaptability, exploring the acceleration of vision-based interactive applications through the utilization of three reconfigurable architectures: A low-power EnCore processor with a Configurable Flow Accelerator co-processor, a hybrid reconfigurable SIMD/MIMD platform and Transport-Triggered Architecture-based processors. The architectures are evaluated and compared with current processors, analyzing their advantages and weaknesses in terms of performance and energy-efficiency when implementing highly interactive vision-based applications. The results show that the inclusion of reconfigurable platforms on mobile devices can enable the computation of several computationally heavy tasks with high performance and small energy consumption while providing enough flexibility.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this work, we present a novel method for approximating a normal distribution with a weighted sum of normal distributions. The approximation is used for splitting normally distributed components in a Gaussian mixture filter, such that components have smaller covariances and cause smaller linearization errors when nonlinear measurements are used for the state update. Our splitting method uses weights from the binomial distribution as component weights. The method preserves the mean and covariance of the original normal distribution, and in addition, the resulting probability density and cumulative distribution functions converge to the original normal distribution when the number of components is increased. Furthermore, an algorithm is presented to do the splitting such as to keep the linearization error below a given threshold with a minimum number of components. The accuracy of the estimate provided by the proposed method is evaluated in four simulated single-update cases and one time series tracking case. In these tests, it is found that the proposed method is more accurate than other Gaussian mixture filters found in the literature when the same number of components is used and that the proposed method is faster and more accurate than particle filters.
ORG=ase,0.75
ORG=mat,0.25
Research output: Contribution to journal › Article › Scientific › peer-review
The interest towards programming of streaming applications using dataflow models of computation has been increasing steadily in the recent years. Among the numerous dataflow formalisms, the ISO-standardized RVC-CAL dataflow language has offered a solid basis for programming tool development and research. To this date RVC-CAL programming tools have enabled transforming dataflow programs into concurrent executables for multicore processors, as well as for generating synthesizable hardware descriptions. In this paper it is shown how the RVC-CAL dataflow language can be used for programming graphics processing units (GPUs) with high efficiency. Considering the processing architectures of recent mobile and desktop computing devices, this advance is of high importance, as most consumer devices contain a graphics processing unit nowadays. To evaluate the proposed solution, the paper presents a video processing application case study. At best, the solution is shown to provide a speedup of 42× over single-threaded CPU execution.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Indoor positioning based on wireless local area network (WLAN) signals is often enhanced using pedestrian dead reckoning (PDR) based on an inertial measurement unit. The state evolution model in PDR is usually nonlinear. We present a new linear state evolution model for PDR. In simulated-data and real-data tests of tightly coupled WLAN-PDR positioning, the positioning accuracy with this linear model is better than with the traditional models when the initial heading is not known, which is a common situation. The proposed method is computationally light and is also suitable for smoothing. Furthermore, we present modifications to WLAN positioning based on Gaussian coverage areas and show how a Kalman filter using the proposed model can be used for integrity monitoring and (re)initialization of a particle filter.
Research output: Contribution to journal › Article › Scientific › peer-review
Wearable devices including smart eyewear require new interaction methods between the device and the user. In this paper, we describe our work on the combined use of eye tracking for input and haptic (touch) stimulation for output with eyewear. Input with eyes can be achieved by utilizing gaze gestures which are predefined patterns of gaze movements identified as commands. The frame of the eyeglasses offers three natural contact points with the wearer's skin for haptic stimulation. The results of two user studies reported in this paper showed that stimulation moving between the contact points was easy for users to localize, and that the stimulation has potential to make the use of gaze gestures more efficient.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
It has been known for decades that the iterative methods are perhaps the most popular approaches to solve the phase retrieval problem. Unfortunately the iterative methods often stagnate. This happens also in the case of the 1-D Discrete Phase Retrieval (1-D DPhR) problem. Recently it has been shown that some requirements in the input magnitude data might be one of the reasons why the direct method cannot solve the 1-D DPhR Problem. In this work we present some difficulties that can be encountered when one has to implement the iterative method for finding a solution of 1-D DPhR problem. We shall formulate the extended form of 1-D DPhR problem. Simulations indicate the conjecture to be true.
EXT="Rusu, Corneliu"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The stability of software is a classical topic in software engineering. This research investigates stability of software architectures in terms of an object-oriented design principle presented by Robert C. Martin. The research approach is statistical: the design principle is evaluated with a time-series cross-sectional (TSCS) regression model. The empirical sample covers a release history from the Java library Vaadin. The empirical results establish that the design principle cannot be used to characterize the library. Besides delivering this negative empirical result, the research provides the necessary methodological background that is required to understand TSCS modeling.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Traditionally, collaborative coding has been practiced in open source communities where cooperation has mostly taken place on a coordination level. Nowadays, web technology is sufficiently advanced to enable collaborative coding in real-time as group work, which eases communication in software development. In this paper this phenomenon has been studied from a knowledge transfer and learning perspective. With the aid of two different example cases (code camps), we have examined the possibilities and challenges in learning during real-time group work. Additionally, we have evaluated the effect of the structure of log data created during software development. The research frame for this study is the utilization of log data visualization in evaluating group work and further improvement of the visualization in order to support software development.
ORG=pla,0.7
ORG=tie,0.3
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Long-term evolution (LTE) and multiple input multiple output (MIMO) have earned reputations to be a cutting-edge technology, which can boost significantly wireless communication performances. The paper aims at providing LTE MIMO performances in indoor environments and, therefore, guidelines for network operators can be proposed. Medium access control throughput (MAC TP) and some system parameters in LTE network that are linked with MAC TP, such as Channel Quality Indicator (CQI), Modulation and Coding Scheme (MCS), Ranking Indicator (RI), Pre-coding Matrix Indicator (PMI), as well as MIMO utilization, are analysed. Effects of indoor propagation, Line of Sight (LoS), No-line of Sight (NLoS), strong and weak signal levels on Signal to Noise Radio (SNR) strength and MIMO utilization are clarified. In this paper, the performances of MIMO transmission mode over transmit diversity (TxDiv, Multiple Input-Single Output-MISO) and single antenna (Single Input Multiple Output-SIMO) modes are also analyzed and compared at overall manner and at channel-specific manners.
INT=elt,"Nguyen-Thanh, Duc"
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Molecular communication holds the promise to enable communication between nanomachines with a view to increasing their functionalities and opening up new possible applications. Due to some of the biological properties, bacteria have been proposed as a possible information carrier for molecular communication, and the corresponding communication networks are known as bacterial nanonetworks. The biological properties include the ability for bacteria to mobilize between locations and carry the information encoded in deoxyribonucleic acid molecules. However, similar to most organisms, bacteria have complex social properties that govern their colony. These social characteristics enable the bacteria to evolve through various fluctuating environmental conditions by utilizing cooperative and non-cooperative behaviors. This article provides an overview of the different types of cooperative and non-cooperative social behavior of bacteria. The challenges (due to non-cooperation) and the opportunities (due to cooperation) these behaviors can bring to the reliability of communication in bacterial nanonetworks are also discussed. Finally, simulation results on the impact of bacterial cooperative social behavior on the end-to-end reliability of a single-link bacterial nanonetwork are presented. The article concludes by highlighting the potential future research opportunities in this emerging field.
Research output: Contribution to journal › Article › Scientific › peer-review
As adoption of eHealth solutions advances, new computing paradigms - such as cloud computing - bring the potential to improve efficiency in managing medical health records and help reduce costs. However, these opportunities introduce new security risks which can not be ignored. In this paper, we present a forward-looking design for a privacy-preserving eHealth cloud system. The proposed solution, is based on a Symmetric Searchable Encryption scheme that allows patients of an electronic healthcare system to securely store encrypted versions of their medical data and search directly on them without having to decrypt them first. As a result, the proposed protocol offers better protection than the current available solutions and paves the way for the next generation of eHealth systems.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A novel adaptive compensation architecture for the frequency response mismatch of 2-channel time-interleaved ADC (TI-ADC) is proposed for developing high-performance self-adaptive systems. The proposed approach overcomes the existing methods in the sense that the TI-ADC mismatch identification can be performed without allocating a region where only the TI-ADC mismatch spurs are present. This is accomplished via mapping the TI-ADC problem into an I/Q mismatch problem which allows deploying complex statistical signal processing. As proof of concept, the compensation architecture is demonstrated and tested on a 16-bit TI-ADC measured hardware data.
Research output: Contribution to journal › Article › Scientific › peer-review
Print interpreting supports people with a hearing disability by giving them access to spoken language. In print interpreting, the interpreter types the spoken text in real time for the hard-of-hearing client to read. This results in dynamic text presentation. An eye movement study was conducted to compare two types of dynamic text presentation formats in print interpreting: letter-by-letter and word-by-word. Gaze path analysis with 20 hearing participants showed different types of reading behaviour during reading of two pieces of text in these two presentation formats. Our analysis revealed that the text presentation format has a significant effect on reading behaviour. Rereading and regressions occurred significantly more often with the word-by-word format than with the letter-by-letter format. We also found a significant difference between the number of regressions starting at the words that end a sentence and that of regressions starting at all other words. The frequency of rereading was significantly higher for incorrectly typed or abbreviated words than for the other words. Analysis of the post-test questionnaire found almost equal acceptance of the word-by-word and letter-by-letter formats by the participants. A follow-up study with 18 hard-of-hearing participants showed a similar trend in results. The findings of this study highlight the importance of developing print interpreting tools that allow the interpreter and the client to choose the options that best facilitate the communication. They also bring up the need to develop new eye movement metrics for analysing the reading of dynamic text, and provide first results on a new dynamic presentation context.
Research output: Contribution to journal › Article › Scientific › peer-review
In order to support anywhere and anytime services of Beyond 4G networks, new deployment solutions will be required that can cost-effectively address the capacity demand of the future and also offer consistently high bit rates and decent quality of service throughout the network coverage area. In this article we look into an advanced outdoor distributed antenna system (DAS) concept, dynamic DAS, that offers on-demand outdoor capacity in urban areas by dynamically configuring the remote antenna units to either act as individual small cells or distributed nodes of a common central cell. The performance of the investigated DAS solution is evaluated and compared with legacy macrocellular deployment in a dense urban environment. Furthermore, the analysis covers the performance evaluation, mainly from an outdoor perspective. The obtained results indicate superior performance of dynamic DAS concept in terms of coverage and SINR, network capacity and cost-efficiency as compared to legacy macrocellular network deployments.
EXT="Niemelä, Jarno"
Research output: Contribution to journal › Article › Scientific › peer-review
Typically, single coil wireless power transfer (WPT) systems which are potential utilized for in-vivo device powering are limited by the total available power at the antenna. To overcome this limitation, a multi-coil system is presented which can greatly increase the power available to the receiver coil by as much as 80% while still remaining within the regulatory limits for total available power from an individual antenna. The effects of matching based on coil separation are presented which demonstrate the dependence of the self-resonant frequency of the WPT system, and a dynamic matching solution is proposed which allows for maximum power transfer efficiency independent of coil separation or changes in body impedance.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Development of the schedule control of precast concrete supply chain has been studied. Main idea was to use BIM model created by structural engineer as a user-interface for schedule control, for saving different status information of the real-time schedule situation of the propagation of structural design, element manufacture, delivery and site erection directly to the BIM model by using a cloud-based networked service. Some of the missing software applications were programmed by the software companies participated in the project. Experiments were done in a real construction project in Finland, where the information from design, prefabrication, delivery and erection phases was synchronized between the stakeholders by using the cloud service. The most important observations and results are introduced and analyzed. A future model for intelligent BIM based schedule control concept is concluded.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
A channelizer is used to separate users or channels in communication systems. A polyphase channelizer is a type of channelizer that uses polyphase filtering to filter, downsample, and downconvert simultaneously. With graphics processing unit (GPU) technology, we propose a novel GPU-based polyphase channelizer architecture that delivers high throughput. This architecture has advantages of providing reduced complexity and optimized parallel processing of many channels, while being configurable via software. This makes our approach and implementation particularly attractive for using GPUs as DSP accelerators for communication systems.
Research output: Contribution to journal › Article › Scientific › peer-review
As it has evolved, the Internet has had to support a broadening range of networking technologies, business models and user interaction modes. Researchers and industry practitioners have realised that this trend necessitates a fundamental rethinking of approaches to network and service management. This has spurred significant research efforts towards developing autonomic network management solutions incorporating distributed self-management processes inspired by biological systems. Whilst significant advances have been made, most solutions focus on management of single network domains and the optimisation of specific management or control processes therein. In this paper we argue that a networking infrastructure providing a myriad of loosely coupled services must inherently support federation of network domains and facilitate coordination of the operation of various management processes for mutual benefit. To this end, we outline a framework for federated management that facilitates the coordination of the behaviour of bio-inspired management processes. Using a case study relating to distribution of IPTV content, we describe how Federal Relationship Managers realising our layered model of management federations can communicate to manage service provision across multiple application/storage/ network providers. We outline an illustrative example in which storage providers are dynamically added to a federation to accommodate demand spikes, with appropriate content being migrated to those providers servers under control of a bio-inspired replication process.
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow languages enable describing signal processing applications in a platform independent fashion, which makes them attractive in today's multiprocessing era. RVC-CAL is a dynamic dataflow language that enables describing complex data-dependent programs such as video decoders. To this date, design automation toolchains for RVC-CAL have enabled creating workstation software, dedicated hardware and embedded application specific multiprocessor implementations out of RVC-CAL programs. However, no solution has been presented for executing RVC-CAL applications on generic embedded multiprocessing platforms. This paper presents a dataflow-based multiprocessor communication model, an architecture prototype that uses it and an automated toolchain for instantiating such a platform and the software for it. The complexity of the platform increases linearly as the number of processors is increased. The experiments in this paper use several instances of the proposed platform, with different numbers of processors. An MPEG-4 video decoder is mapped to the platform and executed on it. Benchmarks are performed on an FPGA board.
Research output: Contribution to journal › Article › Scientific › peer-review
When implementing digital signal processing (DSP) applications onto multiprocessor systems, one significant problem in the viewpoints of performance is the memory wall. In this paper, to help alleviate the memory wall problem, we propose a novel, high-performance buffer mapping policy for SDF-represented DSP applications on bus-based multiprocessor systems that support the shared-memory programming model. The proposed policy exploits the bank concurrency of the DRAM main memory system according to the analysis of hierarchical parallelism. Energy consumption is also a critical parameter, especially in battery-based embedded computing systems. In this paper, we apply a synchronization back-off scheme on the top of the proposed high-performance buffer mapping policy to reduce energy consumption. The energy saving is attained by minimizing the number of non-essential synchronization transactions. We measure throughput and energy consumption on both synthetic and real benchmarks. The simulation results show that the proposed buffer mapping policy is very useful in terms of performance, especially in memory-intensive applications where the total execution time of computational tasks is relatively small compared to that of memory operations. In addition, the proposed synchronization back-off scheme provides a reduction in the number of synchronization transactions without degrading performance, which results in system energy saving.
Research output: Contribution to journal › Article › Scientific › peer-review
As the variety of off-the-shelf processors expands, traditional implementation methods of systems for digital signal processing and communication are no longer adequate to achieve design objectives in a timely manner. There is a necessity for designers to easily track the changes in computing platforms, and apply them efficiently while reusing legacy code and optimized libraries that target specialized features in single processing units. In this context, we propose an integration workflow to schedule and implement Software Defined Radio (SDR) protocols that are developed using the GNU Radio environment on heterogeneous multiprocessor platforms. We show how to utilize Single Instruction Multiple Data (SIMD) units provided in Graphics Processing Units (GPUs) along with vector accelerators implemented in General Purpose Processors (GPPs). We augment a popular SDR framework (i.e, GNU Radio) with a library that seamlessly allows offloading of algorithm kernels mapped to the GPU without changing the original protocol description. Experimental results show how our approach can be used to efficiently explore design spaces for SDR system implementation, and examine the overhead of the integrated backend (software component) library.
Research output: Contribution to journal › Article › Scientific › peer-review
A channelizer is a part of a receiver front-end subsystem, commonly found in various communication systems, that separates different users or channels. A modern channelizer uses advantages of polyphase filter banks to process multiple channels at the same time, allowing down conversion, downsampling, and filtering all at the same time. However, due to limitations imposed by the structure and requirements of channelizers, their usage is limited and poses significant challenges due to inflexibility using conventional implementation techniques, which are intensively hardware-based. However, with advances in graphics processing unit (GPU) technology, we now have the potential to deliver high computational throughput along with the flexibility of software-based implementation. In this paper, we demonstrate how this potential can be exploited by presenting a novel GPU-based channelizer implementation. Our implementation incorporates methods for eliminating complex buffer managements and performing arbitrary resampling on all channels simultaneously. We also introduce the notion of simultaneously processing many channels as a high data rate parallel receiver system using blocks of threads in the GPU. The multi-channel, flexible, high-throughput, and arbitrary resampling characteristics of our GPU-based channelizer make it attractive for a variety of communication receiver applications.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
An innovative wearable, partially self-powered, health monitoring and indoor localization shoe-mounted sensor module is presented. The system's novel shoe sole serves the double role of (i) medical-grade temperature probe for human body monitoring and (ii) renewable energy scavenger, which transforms the human motion to electrical energy. Mounted on the shoe is also an NFC reader for proximity-based localization purposes. An Adidas™-logo-shaped dual-band communication antenna is fabricated that exhibits great performance despite the close proximity to the high lossy human body. The proposed platform can be extended to other sensors applications, for example by embedding into the sole normal and/or shear force sensors in order to monitor the sport performances of the athletes as well as to improve the rehabilitation techniques.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Currently, Multicore Digital Signal Processor (DSP) platforms are commonly used in telecommunications baseband processing. In the next few years, high performance DSPs are likely to combine many more DSP cores for signal processing with some General-Purpose Processor (GPP) cores for application control. As the number of cores increases in new DSP platform designs, scheduling of applications is becoming a complex operation. Meanwhile, the variability of the scheduled applications also tends to increase as applications become more sophisticated. Such variations require runtime adaptivity of application scheduling. This paper extends the previous work on adaptive scheduling by using the Hybrid Flow-Shop (HFS) scheduling method, which enables the device architecture to be modeled as a pipeline of Processing Elements (PEs) with multiple alternate PEs for each pipeline stage. HFS scheduling is applied to the scheduling of 3rd Generation Partnership Project (3GPP) Long Term Evolution (LTE) telecommunication standard Uplink Physical Layer data processing (PUSCH). The experiments, conducted on an ARM Cortex-A9 GPP, show that an HFS scheduling algorithm has an overhead that increases very slowly with the number of PEs. This makes the method suitable for executing the adaptive scheduling in less than 1 ms for the 501 actors of a LTE PUSCH dataflow description executed on a 256-core architecture.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
RVC-CAL is an actor-based dataflow language that enables concurrent, modular and portable description of signal processing algorithms. RVC-CAL programs can be compiled to implementation languages such as C/C++ and VHDL for producing software or hardware implementations. This paper presents a methodology for automatic discovery of piecewise-deterministic (quasi-static) execution schedules for RVC-CAL program software implementations. Quasi-static scheduling moves computational burden from the implementable run-time system to design-time compilation and thus enables making signal processing systems more efficient. The presented methodology divides the RVC-CAL program into segments and hierarchically detects quasi-static behavior from each segment: first at the level of actors and later at the level of the whole segment. Finally, a code generator creates a quasi-statically scheduled version of the program. The impact of segment based quasi-static scheduling is demonstrated by applying the methodology to several RVC-CAL programs that execute up to 58 % faster after applying the presented methodology.
Research output: Contribution to journal › Article › Scientific › peer-review
While research on the design of heterogeneous concurrent systems has a long and rich history, a unified design methodology and tool support has not emerged so far, and thus the creation of such systems remains a difficult, time-consuming and error-prone process. The absence of principled support for system evaluation and optimization at high abstraction levels makes the quality of the resulting implementation highly dependent on the experience or prejudices of the designer. In this work we present TURNUS, a unified dataflow design space exploration framework for heterogeneous parallel systems. It provides high-level modelling and simulation methods and tools for system level performances estimation and optimization. TURNUS represents the outcome of several years of research in the area of co-design exploration for multimedia stream applications. During the presentation, it will be demonstrated how the initial high-level abstraction of the design facilitates the use of different analysis and optimization heuristics. These guide the designer during validation and optimization stages without requiring low-level implementations of parts of the application. Our framework currently yields exploration and optimization results in terms of algorithmic optimization, rapid performance estimation, application throughput, buffer size dimensioning, and power optimization.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Research output: Contribution to journal › Article › Scientific › peer-review
RVC-CAL is a dataflow language that has acquired an ecosystem of sophisticated design tools. Previous works have shown that RVC-CAL-based applications can automatically be deployed to multiprocessor platforms, as well as hardware descriptions with high efficiency. However, as RVC-CAL is a concurrent language, code generation for a single processor core requires careful application analysis and scheduling. Although much work has been done in this area, to this date no publication has reported that programs generated from RVC-CAL could rival handwritten programs on single-core processors. This paper proposes performance optimization of RVCCAL applications by actor merging at source code level. The proposed methodology is demonstrated with an IEEE 802.15.4 (ZigBee) transmitter case study. The transmitter baseband software, previously written in C, is rewritten in RVC-CAL and optimized with the proposed methodology. Experiments show that on a VLIW-flavored processor the RVC-CAL-based program achieves the performance of manually written software.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Due to the increased complexity of dynamics in modern DSP applications, dataflow-based design methodologies require significant enhancements in modeling and scheduling techniques to provide for efficient and flexible handling of dynamic behavior. In this paper, we address this problem through a new framework that is based on integrating two complementary modeling techniques, core functional dataflow (CFDF) and parameterized synchronous dataflow (PSDF). We apply, in a systematically integrated way, the structured mode-based dynamic dataflow modeling capability of CFDF together with the features of PSDF for dynamic parameter reconfiguration and quasi-static scheduling. We refer to this integrated methodology for mode - and dynamic-parameter - based modeling and scheduling as core functional parameterized synchronous dataflow (CF-PSDF). Through a wireless communication case study involving MIMO detection, we demonstrate the utility of design and implementation using CF-PSDF graphs. Experimental results on this case study demonstrate the efficiency and flexibility of our proposed new CF-PSDF based design methodology.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In recent work, a graphical modeling construct called "topological patterns" has been shown to enable concise representation and direct analysis of repetitive dataflow graph sub-structures in the context of design methods and tools for digital signal processing systems (Sane et al. 2010). In this paper, we present a formal design method for specifying topological patterns and deriving parameterized schedules from such patterns based on a novel schedule model called the scalable schedule tree. The approach represents an important class of parameterized schedule structures in a form that is intuitive for representation and efficient for code generation. Through application case studies involving image processing and wireless communications, we demonstrate our methods for topological pattern representation, scalable schedule tree derivation, and associated dataflow graph code generation.
Research output: Contribution to journal › Article › Scientific › peer-review
Dataflow models of computation are widely used for the specification, analysis, and optimization of Digital Signal Processing (DSP) applications. In this paper a new meta-model called PiMM is introduced to address the important challenge of managing dynamics in DSP-oriented representations. PiMM extends a dataflow model by introducing an explicit parameter dependency tree and an interface-based hierarchical compositionality mechanism. PiMM favors the design of highly-efficient heterogeneous multicore systems, specifying algorithms with customizable trade-offs among predictability and exploitation of both static and adaptive task, data and pipeline parallelism. PiMM fosters design space exploration and reconfigurable resource allocation in a flexible dynamic dataflow context.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
The zero-intermediate frequency zero-crossing demodulator (ZIFZCD) is extensively used for demodulating continuous phase frequency shift keying (CPFSK) signals in low power and low cost devices. ZIFZCD has previously been implemented as hardwired circuits. Many variations have been suggested to the ZIFZCD algorithm for different modulation methods and channel conditions. To support all these variants, a programmable processor based implementation of the ZIFZCD is needed. This paper describes a programmable software implementation of ZIFZCD on an application specific processor (ASP). The ASP is based on transport triggered architecture (TTA) and provides an ideal low power platform for ZIFZCD implementation due to its simplicity. The designed processor operates at a maximum clock frequency of 250 MHz and has gate count of 134 kGE for a 32-bit TTA processor and 76 kGE for a 16-bit processor. The demodulator has been developed as a part of an open source radio implementation for wireless sensor nodes.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Parallelization of Digital Signal Processing (DSP) software is an important trend for Multi Processor System-on-Chip (MPSoC) implementation. The performance of DSP systems composed of parallelized computations depends on the scheduling technique, which must in general allocate computation and communication resources for competing tasks, and ensure that data dependencies are satisfied. In this paper, we formulate a new type of parallel task scheduling problem called Parallel Actor Scheduling (PAS) for MPSoC mapping of DSP systems that are represented as Synchronous DataFlow (SDF) graphs. In contrast to traditional SDF -based scheduling techniques, which focus on exploiting graph level (inter-actor) parallelism, the PAS problem targets the integrated exploitation of both intra- and inter-actor parallelism for platforms in which individual actors can be parallelized across multiple processing units. We address a special case of the PAS problem in which all of the actors in the DSP application or subsystem being optimized can be parallelized. For this special case, we develop and experimentally evaluate a two-phase scheduling framework with two work flows - particle swarm optimization with a mixed integer programming formulation, and particle swarm optimization with a fast heuristic based on list scheduling. We demonstrate that our PAS-targeted scheduling framework provides a useful range of trade-off's between synthesis time requirements and the quality of the derived solutions.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Remote communication between people typically relies on audio and vision although current mobile devices are increasingly based on detecting different touch gestures such as swiping. These gestures could be adapted to interpersonal communication by using tactile technology capable of producing touch stimulation to a user's hand. It has been suggested that such mediated social touch would allow for new forms of emotional communication. The aim was to study whether vibrotactile stimulation that imitates human touch can convey intended emotions from one person to another. For this purpose, devices were used that converted touch gestures of squeeze and finger touch to vibrotactile stimulation. When one user squeezed his device or touched it with finger(s), another user felt corresponding vibrotactile stimulation on her device via four vibrating actuators. In an experiment, participant dyads comprising a sender and receiver were to communicate variations in the affective dimensions of valence and arousal using the devices. The sender's task was to create stimulation that would convey unpleasant, pleasant, relaxed, or aroused emotional intention to the receiver. Both the sender and receiver rated the stimulation using scales for valence and arousal so that the match between sender's intended emotions and receiver's interpretations could be measured. The results showed that squeeze was better at communicating unpleasant and aroused emotional intention, while finger touch was better at communicating pleasant and relaxed emotional intention. The results can be used in developing technology that enables people to communicate via touch by choosing touch gesture that matches the desired emotion.
Research output: Contribution to journal › Article › Scientific › peer-review
Autonomous wireless sensors are a key technology for Ambiant Intelligence. For several years, chipless-electromagnetics sensors are studied to overcome the limitations of other passive sensors like low interrogation distance. In this paper we present a new concept of passive temperature sensor based on the electromagnetic coupling between an RF capacitor and dielectric liquid moving inside a SU8 micro-channel. The concept is validated using water as dielectric liquid with a full scale variation of S11 versus temperature around 8dB at 29.75GHz
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Local Binary Pattern (LBP) is texture operator used in preprocessing for object detection, tracking, face recognition and fingerprint matching. Many of these applications are performed on embedded devices, which poses limitations on the implementation complexity and power consumption. As LBP features are computed pixelwise, high performance is required for real time extraction of LBP features from high resolution video. This paper presents an application-specific instruction processor for LBP extraction. The compact, yet powerful processor is capable of extracting LBP features from 1280 × 720p (30 fps) video with a reasonable 304 MHz clock rate. With a low power consumption and an area of less than 16k gates the processor is suitable for embedded devices. Experiments present resource and power consumption measured on an FPGA board, along with processor synthesis results. In terms of latency, our processor requires 17.5 × less clock cycles per LBP feature than a workstation implementation and only 2.0 × more than a hardwired ASIC.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper proposes an improved version of a fully distributed routing protocol, that is applicable for cloud computing infrastructure. Simulation results shows the protocol is ideal for discovering cloud services in a scalable manner with minimum latency.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
This paper presents a novel implementation of graphics processing unit (GPU) based symbol timing recovery using polyphase interpolators to detect symbol timing error. Symbol timing recovery is a compute intensive procedure that detects and corrects the timing error in a coherent receiver. We provide optimal sample-time timing recovery using a maximum likelihood (ML) estimator to minimize the timing error. This is an iterative and adaptive system that relies on feedback, therefore, we present an accelerated implementation design by using a GPU for timing error detection (TED), enabling fast error detection by exploiting the 2D filter structure found in the polyphase interpolator. We present this hybrid/heterogeneous CPU and GPU architecture by computing a low complexity and low noise matched filter (MF) while simultaneously performing TED. We then compare the performance of the CPU vs. GPU based timing recovery for different interpolation rates to minimize the error and improve the detection by up to a factor of 35. We further improve the process by utilizing GPU optimization and performing block processing to improve the throughput even more, all while maintaining the lowest possible sampling rate.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In this paper, a monopole antenna backed by an inkjet-printed electromagnetic band gap ground (EBG) plane on paper substrate is proposed for wearable applications with drastically enhanced communication range. This novel design approach for WBAN and wearable biomonitoring applications alleviates the on-body antenna's performance degradation, which may cause a significant degradation of the wireless system's performance as well. The communication range improvement compared to conventional antenna is demonstrated by using a benchmarking commercial wireless temperature sensor module. In addition, the advantages and the integrability of the proposed wearable antenna topology into mobile wireless on-body health care systems is discussed in detail.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Dataflow modeling offers a myriad of tools to improve optimization and analysis of signal processing applications, and is often used by designers to help design, implement, and maintain systems on chip for signal processing. However, maintaining and upgrading legacy systems that were not originally designed using dataflow modeling can be challenging. To facilitate maintenance, designers often convert legacy code to dataflow graphs, a process that can be difficult and time consuming. We propose a method to facilitate this conversion process by automatically detecting the dataflow models of the core functions. The contribution of this work is twofold. First, we introduce a generic method for instrumenting dataflow graphs that can be used to measure various statistics and extract run-time information. Second, we use this instrumentation technique to demonstrate a method that facilitates the conversion of legacy code to dataflow-based implementations. This method operates by automatically detecting the dataflow model of the core functions being converted.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
In recent years, parameterized dataflow has evolved as a useful framework for modeling synchronous and cyclo-static graphs in which arbitrary parameters can be changed dynamically. Parameterized dataflow has proven to have significant expressive power for managing dynamics of DSP applications in important ways. However, efficient hardware synthesis techniques for parameterized dataflow representations are lacking. This paper addresses this void; specifically, the paper investigates efficient field programmable gate array (FPGA)-based implementation of parameterized cyclo-static dataflow (PCSDF) graphs. We develop a scheduling technique for throughput-constrained minimization of dataflow buffering requirements when mapping PCSDF representations of DSP applications onto FPGAs. The proposed scheduling technique is integrated with an existing formal schedule model, called the generalized schedule tree, to reduce schedule cost. To demonstrate our new, hardware-oriented PCSDF scheduling technique, we have designed a real-time base station emulator prototype based on a subset of long-term evolution (LTE), which is a key cellular standard.
Research output: Contribution to journal › Article › Scientific › peer-review
Multidimensional synchronous dataflow (MDSDF) provides an effective model of computation for a variety of multidimensional DSP systems that have static dataflow structures. In this paper, we develop new methods for optimized implementation of MDSDF graphs on embedded platforms that employ multiple levels of parallelism to enhance performance at different levels of granularity. Our approach allows designers to systematically represent and transform multi-level parallelism specifications from a common, MDSDF-based application level model. We demonstrate our methods with a case study of image histogram implementation on a graphics processing unit (GPU). Experimental results from this study show that our approach can be used to derive fast GPU implementations, and enhance trade-off analysis during design space exploration.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Web crawlers are essential to many Web applications, such as Web search engines, Web archives, and Web directories, which maintain Web pages in their local repositories. In this paper, we study the problem of crawl scheduling that biases crawl ordering toward important pages. We propose a set of crawling algorithms for effective and efficient crawl ordering by prioritizing important pages with the well-known PageRank as the importance metric. In order to score URLs, the proposed algorithms utilize various features, including partial link structure, inter-host links, page titles, and topic relevance. We conduct a large-scale experiment using publicly available data sets to examine the effect of each feature on crawl ordering and evaluate the performance of many algorithms. The experimental results verify the efficacy of our schemes. In particular, compared with the representative RankMass crawler, the FPR-title-host algorithm reduces computational overhead by a factor as great as three in running time while improving effectiveness by 5% in cumulative PageRank.
Research output: Contribution to journal › Article › Scientific › peer-review
Emerging Digital Signal Processing (DSP) algorithms and wireless communications protocols require dynamicadaptation and online reconfiguration for the implementedsystems at runtime. In this paper, we introduce the conceptof Partial Expansion Graphs (PEGs) as an implementationmodel and associated class of scheduling strategies. PEGsare designed to help realize DSP systems in terms of formsand granularities of parallelism that are well matched to thegiven applications and targeted platforms. PEGs also facilitatederivation of both static and dynamic scheduling techniques,depending on the amount of variability in task execution timesand other operating conditions. We show how to implementefficient PEG-based scheduling methods using real time operating systems, and to re-use pre-optimized libraries of DSPcomponents within such implementations. Empirical resultsshow that the PEG strategy can 1) achieve significant speedupson a state of the art multicore signal processor platform forstatic dataflow applications with predictable execution times,and 2) exceed classical scheduling speedups for applicationshaving execution times that can vary dynamically. This abilityto handle variable execution times is especially useful as DSPapplications and platforms increase in complexity and adaptive behavior, thereby reducing execution time predictability.
Research output: Chapter in Book/Report/Conference proceeding › Conference contribution › Scientific › peer-review
Cooperative communications have been recommended to exploit the inherent spatial diversity gains in multiuser wireless systems without the need of multiple transceivers at each node. This is achieved when wireless nodes help to each other to send multiple independent transmission paths to the destination. The advantage of cooperation can be exploited significantly by allocating the power of the system optimally. Thus, in this paper we first derive the approximate symbol error rate (SER) for multi-node cooperative networks employing decode-and-forward (DF) protocol with a maximum ratio combining (MRC) at the receiving terminals in Racian fading channels. Using the approximated SER expression, optimal power allocation (OPA) scheme under different line-of-sight (LOS) scenarios is investigated. Numerical and simulation results are presented to illustrate the performance improvement due to OPA of the cooperative networks.