In this article we investigate the influence of a Pareto-like noise model on the performance of an artificial neural network used to predict a nonlinear time series. A Pareto-like noise model is, in contrast to a Gaussian noise model, based on a power law distribution which has long tails compared to a Gaussian distribution. This allows for larger fluctuations in the deviation between predicted and observed values of the time series. We define an optimization procedure that minimizes the mean squared error of the predicted time series by maximizing the likelihood function based on the Pareto-like noise model. Numerical results for an artificial time series show that this noise model gives better results than a model based on Gaussian noise demonstrating that by allowing larger fluctuations the parameter space of the likelihood function can be search more efficiently. As a consequence, our results may indicate a more generic characteristics of optimization problems not restricted to problems from time series prediction.