In this paper we consider the problem of actively learning the mean values of distributions associated with a finite number of options (arms). The algorithms can select which option to generate the next sample from in order to produce estimates with equally good precision for all the distributions. When an algorithm uses sample means to estimate the unknown values then the optimal solution, assuming full knowledge of the distributions, is to sample each option proportional to its variance. In this paper we propose an incremental algorithm that asymptotically achieves the same loss as an optimal rule. We prove that the excess loss suffered by this algorithm, apart from logarithmic factors, scales as n-3/2, which we conjecture to be the optimal rate. The performance of the algorithm is illustrated in a simple problem.
We consider continuous state, continuous action batch reinforcement learning where the goal is to learn a good policy from a sufficiently rich trajectory generated by some policy. We study a variant of fitted Q-iteration, where the greedy action selection is replaced by searching for a policy in a restricted set of candidate policies by maximizing the average action-values. We provide a rigorous analysis of this algorithm, proving what we believe is the first finite-time bound for value-function based algorithms for continuous state and action problems.
We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems when the training data is composed of the trajectory of some fixed behaviour policy. The algorithm studied is fitted policy iteration where in successive iterations the action-value functions of the intermediate policies are obtained by means of approximate value iteration. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance. The bounds depend on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used. One of the main novelties of the paper is that new smoothness constraints are introduced allowing us to significantly extend the scope of previous results.
We consider batch reinforcement learning problems in continuous space, expected total discounted-reward Markovian Decision Problems. As opposed to previous theoretical work, we consider the case when the training data consists of a single sample path (trajectory) of some behaviour policy. In particular, we do not assume access to a generative model of the environment. The algorithm studied is policy iteration where in successive iterations the Q-functions of the intermediate policies are obtained by means of minimizing a novel Bellman-residual type error. PAC-style polynomial bounds are derived on the number of samples needed to guarantee near-optimal performance where the bound depends on the mixing rate of the trajectory, the smoothness properties of the underlying Markovian Decision Problem, the approximation power and capacity of the function set used.
It is shown by earlier results that the minimax expected (test) distortion redundancy of empirical vector quantizers with three or more levels designed from n independent and identically distributed data points is at least \Omega(1/\sqrt{n}) for the class of distributions on a bounded set. In this paper, a much simpler construction and proof for this are given with much better constants. There are similar bounds for the training distortion of the empirically optimal vector quantizer with three or more levels. These rates, however, do not hold for a one-level quantizer. Here the two-level quantizer case is clarified, showing that it already shares the behavior of the general case. Given that the minimax bounds are proved using a construction that involves discrete distributions, one suspects that for the class of distributions with uniformly bounded continuous densities, the expected distortion redundancy might decrease as o(1/\sqrt{n}) uniformly. It is shown as well that this is not so, proving that the lower bound for the expected test distortion remains true for these subclasses.
We consider the rate of convergence of the expected distortion redundancy of empirically optimal vector quantizers. Earlier results show that the mean-squared distortion of an empirically optimal quantizer designed from n independent and identically distributed source samples converges uniformly to the optimum at a rate O(1 / \sqrt{n}), and that this rate is sharp in the minimax sense. We prove that for any fixed source distribution supported on a given finite set, the convergence rate is O(1/n) (faster than the minimax lower bound), where the corresponding constant depends on the distribution. For more general source distributions, we provide conditions implying a little bit worse O(log n/n) rate of convergence. In particular, scalar distributions having strictly log-concave densities with bounded support (such as the truncated Gaussian distribution) satisfy these conditions.
Given an i.i.d. sample (X1,...,Xn) drawn from an unknown discrete distribution P on a countably infinite set, we consider the problem of estimating the entropy of P. We show that the plug-in estimate is universally consistent and that, without further assumptions, no rate-of-convergence results can be obtained for any sequence of entropy estimates. Under additional conditions we get convergence rates for the plug-in estimate and for an estimate based on match-lengths. The behavior of the expected error of the plug-in estimate is shown to be in sharp contrast to the finite-alphabet case.
Let X be a real-valued random variable with finite expectation denoted by m.
Let (X1,...,Xn) be an i.i.d. sample
drawn from this distribution of X.
By the strong law of the large numbers the average
We show that there exist individual lower bounds corresponding to the upper bounds on the rate of convergence of nonparametric pattern recognition which are arbitrarily close to Yang's minimax lower bounds, if the a posteriori probability function is in the classes used by Stone and others. The rates equal to the ones on the corresponding regression estimation problem. Thus for these classes classification is not easier than regression estimation either in individual sense.
Binary search trees are important in data storing, and the running time of several algorithms is related to the height of the tree. The data can be modeled by a random sequence of numbers. On the other hand, random binary search trees can be used in certain computer graphics applications. The case when the random input sequence consists of independent identically distributed (i.i.d.) variables has been heavily studied. In this case the height has order log n, asymptotically, where n is the number of nodes. We study the case when the input sequence is an ordinary RAndom WAlk, that is, the partial sums of an i.i.d. sequence. We show that if the basic distribution satisfies some regularity conditions then the height has order square root of n. We give an exact limit law, where the limit distribution is the Erdõs-Kac-Rényi distribution L(z). As a by-product, we prove a limit theorem for random walks.
Known minimax lower bounds for learning state that for each sample size n, and learning rule gn, there exists a distribution of the observation X and a concept C to be learnt such that the expected error of gn is at least a constant times V/n, where V is the VC dimension of the concept class. However, these bounds do no tell anything about the rate of decrease of the error for a fixed distribution and fixed concept.
In this paper we investigate minimax lower bounds in such - much stronger - sense. We show that for several natural d-parameter concept classes, for any sequence of learning rules {gn}, there exists a fixed distribution of X and a fixed concept C such that the expected error is larger than a constant times d/n for infinitely many n.
We also show that it is neither the VC dimension, nor the rate of increase of the shatter coefficients of the class that determine the asymptotic behavior of the concept class.
Back to the publications