A minimal model for gene expression, consisting of a switchable promoter together with the resulting messenger RNA, is equivalent to a Poisson channel with a binary Markovian input process. Determining its capacity is an optimization problem with respect to two parameters: the average sojourn times of the promoter's active (ON) and inactive (OFF) state. An expression for the mutual information is found by exploiting the link with filtering theory. For fixed peak power, three bandwidth-like constraints are imposed by lower-bounding (i) the average sojourn times (ii) the autocorrelation time and (iii) the average time until a transition. OFF-favoring optima are found for all three constraints, as commonly encountered for the Poisson channel. In addition, constraint (i) exhibits a region that favors the ON state, and (iii) shows ON-favoring local optima.
Recently, we developed a systematic framework for defining and inferring flows of information about a specific message in neural circuits , . We defined a computational model of a neural circuit consisting of computational nodes and transmissions being sent between these nodes over time. We then gave a formal definition of information flow pertaining to a specific message, which was capable of identifying paths along which information flowed in such a system. However, this definition also had some non-intuitive properties, such as the existence of "orphans"---nodes from which information flowed out, even though no information flowed in. In part, these non-intuitive properties arose because we restricted our attention to measures that were functions of transmissions at a single time instant, and measures that were observational rather than counterfactual. In this paper, we consider alternative definitions, including one that is a function of transmissions at multiple time instants, one that is counterfactual, and a new observational definition. We show that a definition of information flow based on counterfactual causal influence (CCI) guarantees the existence of information paths while also having no orphans. We also prove that no observational definition of information flow that satisfies the information path property can match CCI in every instance. Furthermore, each of the definitions we examine (including CCI) is shown to have examples in which the information flow can take a non-intuitive path. Nevertheless, we believe our framework remains more amenable to drawing clear interpretations than classical tools used in neuroscience, such as Granger Causality.
For DNA data storage to become a feasible technology, all aspects of the encoding and decoding pipeline must be optimized. Writing the data into DNA, which is known as DNA synthesis, is currently the most costly part of existing storage systems. As a step toward more efficient synthesis, we study the design of codes that minimize the time and number of required materials needed to produce the DNA strands. We consider a popular synthesis process that builds many strands in parallel in a step-by-step fashion using a fixed supersequence S. The machine iterates through S one nucleotide at a time, and in each cycle, it adds the next nucleotide to a subset of the strands. The synthesis time is determined by the length of S. We show that by introducing redundancy to the synthesized strands, we can significantly decrease the number of synthesis cycles. We derive the maximum amount of information per synthesis cycle assuming S is an arbitrary periodic sequence. To prove our results, we exhibit new connections to cost-constrained codes.
The ``bee-identification problem'' was formally defined by Tandon, Tan and Varshney [IEEE Trans. Commun., vol. 67, 2019], and the error exponent was studied. This work extends the results for the ``absentee bees'' scenario, where a small fraction of the bees are absent in the beehive image used for identification. For this setting, we present an exact characterization of the bee-identification error exponent, and show that independent barcode decoding is optimal, i.e., joint decoding of the bee barcodes does not result in a better error exponent relative to independent decoding of each noisy barcode. This is in contrast to the result without absentee bees, where joint barcode decoding results in a significantly higher error exponent than independent barcode decoding. We also define and characterize the `capacity' for the bee-identification problem with absentee bees, and prove the strong converse for the same.