Heuristics for Arc Routing Problems and Their Applications

Arc Routing Problems (ARPs) are a type of routing problem that finds routes of minimum total cost covering the edges or arcs in a graph representing street or road networks. They find application in many essential services such as residential waste collection, winter gritting, and others. Being NP-hard, solutions are usually found using heuristic methods. This dissertation contributes to heuristics for ARP, with a focus on the Capacitated Arc Routing Problem (CARP) with additional constraints. In operations such as residential waste collection, vehicle breakdown disruptions occur frequently. A new variant Capacitated Arc Re-routing Problem for Vehicle Break-down (CARP-VB) is introduced to address the need to re-route using only remaining vehicles to avoid missing services. A new heuristic Probe is developed to solve CARP-VB. Experiments on benchmark instances show that Probe is better in reducing the makespan and hence effective in reducing delays and avoiding missing services. In addition to total cost, operators are also interested in solutions that are attractive, that is, routes that are contiguous, compact, and non-overlapping to manage the work. Operators may not adopt a solution that is not attractive even if it is optimum. They are also interested in solutions that are balanced in workload to meet equity requirements. A new multi-objective memetic algorithm, MA-ABC is developed, that optimizes three objectives: Attractiveness, makespan, and total cost. On testing with benchmark instances, MA-ABC was found to be effective in providing attractive and balanced route solutions without affecting the total cost. Changes in the problem specification such as demand and topology occurs frequently in business operations. Machine learning be applied to learn the distribution behind these changes and generate solutions quickly at time of inference. Splice is a machine learning framework for CARP that generates closer to optimum solutions quickly using a graph neural network and deep Q-learning. Splice can solve several variants of node and arc routing problems using the same architecture without any modification. Splice was trained and tested using randomly generated instances. Splice generated solutions faster that are also better in comparison to popular metaheuristics.
Time Sensitive Networking in Multimedia and Industrial Control Applications

Ethernet based technologies are emerging as the ubiquitous de facto form of communication due to their interoperability, capacity, cost, and reliability. Traditional Ethernet is designed with the goal of delivering best effort services. However, several real time and control applications require more precise deterministic requirements and Ultra Low Latency (ULL), that Ethernet cannot be used for. Current Industrial Automation and Control Systems (IACS) applications use semi-proprietary technologies that provide deterministic communication behavior for sporadic and periodic traffic, but can lead to closed systems that do not interoperate effectively. The convergence between the informational and operational technologies in modern industrial control networks cannot be achieved using traditional Ethernet. Time Sensitive Networking (TSN) is a suite of IEEE standards designed by augmenting traditional Ethernet with real time deterministic properties ideal for Digital Signal Processing (DSP) applications. Similarly, Deterministic Networking (DetNet) is a Internet Engineering Task Force (IETF) standardization that enhances the network layer with the required deterministic properties needed for IACS applications. This dissertation provides an in-depth survey and literature review on both standards/research and 5G related material on ULL. Recognizing the limitations of several features of the standards, this dissertation provides an empirical evaluation of these approaches and presents novel enhancements to the shapers and schedulers involved in TSN. More specifically, this dissertation investigates Time Aware Shaper (TAS), Asynchronous Traffic Shaper (ATS), and Cyclic Queuing and Forwarding (CQF) schedulers. Moreover, the IEEE 802.1Qcc, centralized management and control, and the IEEE 802.1Qbv can be used to manage and control scheduled traffic streams with periodic properties along with best-effort traffic on the same network infrastructure. Both the centralized network/distributed user model (hybrid model) and the fully-distributed (decentralized) IEEE 802.1Qcc model are examined on a typical industrial control network with the goal of maximizing scheduled traffic streams. Finally, since industrial applications and cyber-physical systems require timely delivery, any channel or node faults can cause severe disruption to the operational continuity of the application. Therefore, the IEEE 802.1CB, Frame Replication and Elimination for Reliability (FRER), is examined and tested using machine learning models to predict faulty scenarios and issue remedies seamlessly.
Improving on 802.11: Streaming Audio and Quality of Service

Ad hoc wireless networks present several interesting problems, one of which is Medium Access Control (MAC). Medium Access Control is a fundamental problem deciding who get to transmit next. MAC protocols for ad hoc wireless networks must also be distributed, because the network is multi-hop. The 802.11 Wi-Fi protocol is often used in ad hoc networking. An alternative protocol, REACT, uses the metaphor of an auction to compute airtime allocations for each node, then realizes those allocations by tuning the contention window parameter using a tuning protocol called SALT. 802.11 is inherently unfair due to how it returns the contention window to its minimum size after successfully transmitting, while REACT’s distributed auction nature allows nodes to negotiate an allocation where all nodes get a fair portion of the airtime. A common application in the network is audio streaming. Audio streams are dependent on having good Quality of Service (QoS) metrics, such as delay or jitter, due to their real-time nature.

Experiments were conducted to determine the performance of REACT/SALT compared to 802.11 in a streaming audio application on a physical wireless testbed, w-iLab.t. Four experiments were designed, using four different wireless node topologies, and QoS metrics were collected using Qosium. REACT performs better in these these topologies, when the mean value is calculated across each run. For the butterfly and star topology, the variance was higher for REACT even though the mean was lower. In the hidden terminal and exposed node topology, the performance of REACT was much better than 802.11 and converged more tightly, but had drops in quality occasionally.
Interaction Testing, Fault Location, and Anonymous Attribute-Based Authorization

This dissertation studies three classes of combinatorial arrays with practical applications in testing, measurement, and security. Covering arrays are widely studied in software and hardware testing to indicate the presence of faulty interactions. Locating arrays extend covering arrays to achieve identification of the interactions causing a fault by requiring additional conditions on how interactions are covered in rows. This dissertation introduces a new class, the anonymizing arrays, to guarantee a degree of anonymity by bounding the probability a particular row is identified by the interaction presented. Similarities among these arrays lead to common algorithmic techniques for their construction which this dissertation explores. Differences arising from their application domains lead to the unique features of each class, requiring tailoring the techniques to the specifics of each problem.

One contribution of this work is a conditional expectation algorithm to build covering arrays via an intermediate combinatorial object. Conditional expectation efficiently finds intermediate-sized arrays that are particularly useful as ingredients for additional recursive algorithms. A cut-and-paste method creates large arrays from small ingredients. Performing transformations on the copies makes further improvements by reducing redundancy in the composed arrays and leads to fewer rows.

This work contains the first algorithm for constructing locating arrays for general values of $d$ and $t$. A randomized computational search algorithmic framework verifies if a candidate array is $(\bar{d},t)$-locating by partitioning the search space and performs random resampling if a candidate fails. Algorithmic parameters determine which columns to resample and when to add additional rows to the candidate array. Additionally, analysis is conducted on the performance of the algorithmic parameters to provide guidance on how to tune parameters to prioritize speed, accuracy, or a combination of both.

This work proposes anonymizing arrays as a class related to covering arrays with a higher coverage requirement and constraints. The algorithms for covering and locating arrays are tailored to anonymizing array construction. An additional property, homogeneity, is introduced to meet the needs of attribute-based authorization. Two metrics, local and global homogeneity, are designed to compare anonymizing arrays with the same parameters. Finally, a post-optimization approach reduces the homogeneity of an anonymizing array.
Locating Arrays: Construction, Analysis, and Robustness

Modern computer systems are complex engineered systems involving a large collection of individual parts, each with many parameters, or factors, affecting system performance. One way to understand these complex systems and their performance is through experimentation. However, most modern computer systems involve such a large number of factors that thorough experimentation on all of them is impossible. An initial screening step is thus necessary to determine which factors are relevant to the system's performance and which factors can be eliminated from experimentation.

Factors may impact system performance in different ways. A factor at a specific level may significantly affect performance as a main effect, or in combination with other main effects as an interaction. For screening, it is necessary both to identify the presence of these effects and to locate the factors responsible for them. A locating array is a relatively new experimental design that causes every main effect and interaction to occur and distinguishes all sets of d main effects and interactions from each other in the tests where they occur. This design is therefore helpful in screening complex systems.

The process of screening using locating arrays involves multiple steps. First, a locating array is constructed for all possibly significant factors. Next, the system is executed for all tests indicated by the locating array and a response is observed. Finally, the response is analyzed to identify the significant system factors for future experimentation. However, simply constructing a reasonably sized locating array for a large system is no easy task and analyzing the response of the tests presents additional difficulties due to the large number of possible predictors and the inherent imbalance in the experimental design itself. Further complications can arise from noise in the system or errors in testing.

This thesis has three contributions. First, it provides an algorithm to construct locating arrays using the Lovász Local Lemma with Moser-Tardos resampling. Second, it gives an algorithm to analyze the system response efficiently. Finally, it studies the robustness of the analysis to the heavy-hitters assumption underlying the approach as well as to varying amounts of system noise.
Maximizing Routing Throughput with Applications to Delay Tolerant Networks

Many applications require efficient data routing and dissemination in Delay Tolerant Networks (DTNs) in order to maximize the throughput of data in the network, such as providing healthcare to remote communities, and spreading related information in Mobile Social Networks (MSNs). In this thesis, the feasibility of using boats in the Amazon Delta Riverine region as data mule nodes is investigated and a robust data routing algorithm based on a fountain code approach is designed to ensure fast and timely data delivery considering unpredictable boat delays, break-downs, and high transmission failures. Then, the scenario of providing healthcare in Amazon Delta Region is extended to a general All-or-Nothing (Splittable) Multicommodity Flow (ANF) problem and a polynomial time constant approximation algorithm is designed for the maximum throughput routing problem based on a randomized rounding scheme with applications to DTNs. In an MSN, message content is closely related to users’ preferences, and can be used to significantly impact the performance of data dissemination. An interest- and content-based algorithm is developed where the contents of the messages, along with the network structural information are taken into consideration when making message relay decisions in order to maximize data throughput in an MSN. Extensive experiments show the effectiveness of the above proposed data dissemination algorithm by comparing it with state-of-the-art techniques.
Covering arrays: algorithms and asymptotics

Modern software and hardware systems are composed of a large number of components. Often different components of a system interact with each other in unforeseen and undesired ways to cause failures. Covering arrays are a useful mathematical tool for testing all possible t-way interactions among the components of a system.

The two major issues concerning covering arrays are explicit construction of a covering array, and exact or approximate determination of the covering array number---the minimum size of a covering array. Although these problems have been investigated extensively for the last couple of decades, in this thesis we present significant improvements on both of these questions using tools from the probabilistic method and randomized algorithms.

First, a series of improvements is developed on the previously known upper bounds on covering array numbers. An estimate for the discrete Stein-Lovász-Johnson bound is derived and the Stein- Lovász -Johnson bound is improved upon using an alteration strategy. Then group actions on the set of symbols are explored to establish two asymptotic upper bounds on covering array numbers that are tighter than any of the presently known bounds.

Second, an algorithmic paradigm, called the two-stage framework, is introduced for covering array construction. A number of concrete algorithms from this framework are analyzed, and it is shown that they outperform current methods in the range of parameter values that are of practical relevance. In some cases, a reduction in the number of tests by more than 50% is achieved.

Third, the Lovász local lemma is applied on covering perfect hash families to obtain an upper bound on covering array numbers that is tightest of all known bounds. This bound leads to a Moser-Tardos type algorithm that employs linear algebraic computation over finite fields to construct covering arrays. In some cases, this algorithm outperforms currently used methods by more than an 80% margin.

Finally, partial covering arrays are introduced to investigate a few practically relevant relaxations of the covering requirement. Using probabilistic methods, bounds are obtained on partial covering arrays that are significantly smaller than for covering arrays. Also, randomized algorithms are provided that construct such arrays in expected polynomial time.
Fixed verse generation using neural word embeddings

For the past three decades, the design of an effective strategy for generating poetry that matches that of a human’s creative capabilities and complexities has been an elusive goal in artificial intelligence (AI) and natural language generation (NLG) research, and among linguistic creativity researchers in particular. This thesis presents a novel approach to fixed verse poetry generation using neural word embeddings. During the course of generation, a two layered poetry classifier is developed. The first layer uses a lexicon based method to classify poems into types based on form and structure, and the second layer uses a supervised classification method to classify poems into subtypes based on content with an accuracy of 92%. The system then uses a two-layer neural network to generate poetry based on word similarities and word movements in a 50-dimensional vector space.

The verses generated by the system are evaluated using rhyme, rhythm, syllable counts and stress patterns. These computational features of language are considered for generating haikus, limericks and iambic pentameter verses. The generated poems are evaluated using a Turing test on both experts and non-experts. The user study finds that only 38% computer generated poems were correctly identified by nonexperts while 65% of the computer generated poems were correctly identified by experts. Although the system does not pass the Turing test, the results from the Turing test suggest an improvement of over 17% when compared to previous methods which use Turing tests to evaluate poetry generators.
Analysis and visualization of OpenFlow rule conflicts

In traditional networks the control and data plane are highly coupled, hindering development. With Software Defined Networking (SDN), the two planes are separated, allowing innovations on either one independently of the other. Here, the control plane is formed by the applications that specify an organization's policy and the data plane contains the forwarding logic. The application sends all commands to an SDN controller which then performs the requested action on behalf of the application. Generally, the requested action is a modification to the flow tables, present in the switches, to reflect a change in the organization's policy. There are a number of ways to control the network using the SDN principles, but the most widely used approach is OpenFlow.

With the applications now having direct access to the flow table entries, it is easy to have inconsistencies arise in the flow table rules. Since the flow rules are structured similar to firewall rules, the research done in analyzing and identifying firewall rule conflicts can be adapted to work with OpenFlow rules.

The main work of this thesis is to implement flow conflict detection logic in OpenDaylight and inspect the applicability of techniques in visualizing the conflicts. A hierarchical edge-bundling technique coupled with a Reingold-Tilford tree is employed to present the relationship between the conflicting rules. Additionally, a table-driven approach is also implemented to display the details of each flow.

Both types of visualization are then tested for correctness by providing them with flows which are known to have conflicts. The conflicts were identified properly and displayed by the views.
An evaluation of SDN based network virtualization techniques

With the software-defined networking trend growing, several network virtualization controllers have been developed in recent years. These controllers, also called network hypervisors, attempt to manage physical SDN based networks so that multiple tenants can safely share the same forwarding plane hardware without risk of being affected by or affecting other tenants. However, many areas remain unexplored by current network hypervisor implementations. This thesis presents and evaluates some of the features offered by network hypervisors, such as full header space availability, isolation, and transparent traffic forwarding capabilities for tenants. Flow setup time and throughput are also measured and compared among different network hypervisors. Three different network hypervisors are evaluated: FlowVisor, VeRTIGO and OpenVirteX. These virtualization tools are assessed with experiments conducted on three different testbeds: an emulated Mininet scenario, a physical single-switch testbed, and also a remote GENI testbed. The results indicate that network hypervisors bring SDN flexibility to network virtualization, making it easier for network administrators to define with precision how the network is sliced and divided among tenants. This increased flexibility, however, may come with the cost of decreased performance, and also brings additional risks of interoperability due to a lack of standardization of virtualization methods.
