publications
publications & preprints in reversed chronological order * denotes equal contribution
2024
- arXivChronos: Learning the language of time seriesAbdul Fatir Ansari, Lorenzo Stella, Caner Turkmen, and 8 more authors2024
We introduce Chronos, a simple yet effective framework for pretrained probabilistic time series models. Chronos tokenizes time series values using scaling and quantization into a fixed vocabulary and trains existing transformer-based language model architectures on these tokenized time series via the cross-entropy loss. We pretrained Chronos models based on the T5 family (ranging from 20M to 710M parameters) on a large collection of publicly available datasets, complemented by a synthetic dataset that we generated via Gaussian processes to improve generalization. In a comprehensive benchmark consisting of 42 datasets, and comprising both classical local models and deep learning methods, we show that Chronos models: (a) significantly outperform other methods on datasets that were part of the training corpus; and (b) have comparable and occasionally superior zero-shot performance on new datasets, relative to methods that were trained specifically on them. Our results demonstrate that Chronos models can leverage time series data from diverse domains to improve zero-shot accuracy on unseen forecasting tasks, positioning pretrained models as a viable tool to greatly simplify forecasting pipelines.
2023
- NeurIPSAdd and thin: Diffusion for temporal point processesDavid Lüdke, Marin Biloš, Oleksandr Shchur, and 2 more authorsAdvances in Neural Information Processing Systems, 2023
Autoregressive neural networks within the temporal point process (TPP) framework have become the standard for modeling continuous-time event data. Even though these models can expressively capture event sequences in a one-step-ahead fashion, they are inherently limited for long-term forecasting applications due to the accumulation of errors caused by their sequential nature. To overcome these limitations, we derive ADD-THIN, a principled probabilistic denoising diffusion model for TPPs that operates on entire event sequences. Unlike existing diffusion approaches, ADD-THIN naturally handles data with discrete and continuous components. In experiments on synthetic and real-world datasets, our model matches the state-of-the-art TPP models in density estimation and strongly outperforms them in forecasting.
- GRLUsing deep learning for flexible and scalable earthquake forecastingKelian Dascher-Cousineau, Oleksandr Shchur, Emily E Brodsky, and 1 more authorGeophysical Research Letters, 2023
Seismology is witnessing explosive growth in the diversity and scale of earthquake catalogs. A key motivation for this community effort is that more data should translate into better earthquake forecasts. Such improvements are yet to be seen. Here, we introduce the Recurrent Earthquake foreCAST (RECAST), a deep-learning model based on recent developments in neural temporal point processes. The model enables access to a greater volume and diversity of earthquake observations, overcoming the theoretical and computational limitations of traditional approaches. We benchmark against a temporal Epidemic Type Aftershock Sequence model. Tests on synthetic data suggest that with a modest-sized data set, RECAST accurately models earthquake-like point processes directly from cataloged data. Tests on earthquake catalogs in Southern California indicate improved fit and forecast accuracy compared to our benchmark when the training set is sufficiently long (>10^4 events). The basic components in RECAST add flexibility and scalability for earthquake forecasting without sacrificing performance.
- AutoMLAutoGluon–TimeSeries: AutoML for probabilistic time series forecastingOleksandr Shchur, Ali Caner Turkmen, Nick Erickson, and 4 more authorsIn International Conference on Automated Machine Learning, 2023
We introduce AutoGluon-TimeSeries - an open-source AutoML library for probabilistic time series forecasting. Focused on ease of use and robustness, AutoGluon-TimeSeries enables users to generate accurate point and quantile forecasts with just 3 lines of Python code. Built on the design philosophy of AutoGluon, AutoGluon-TimeSeries leverages ensembles of diverse forecasting models to deliver high accuracy within a short training time. AutoGluon-TimeSeries combines both conventional statistical models, machine-learning based forecasting approaches, and ensembling techniques. In our evaluation on 29 benchmark datasets, AutoGluon-TimeSeries demonstrates strong empirical performance, outperforming a range of forecasting methods in terms of both point and quantile forecast accuracy, and often even improving upon the best-in-hindsight combination of prior methods.
2021
- NeurIPSDetecting Anomalous Event Sequences with Temporal Point ProcessesOleksandr Shchur, Ali Caner Türkmen, Tim Januschowski, and 2 more authorsIn Neural Information Processing Systems, NeurIPS, 2021
Automatically detecting anomalies in event data can provide substantial value in domains such as healthcare, DevOps, and information security. In this paper, we frame the problem of detecting anomalous continuous-time event sequences as out-of-distribution (OoD) detection for temporal point processes (TPPs). First, we show how this problem can be approached using goodness-of-fit (GoF) tests. We then demonstrate the limitations of popular GoF statistics for TPPs and propose a new test that addresses these shortcomings. The proposed method can be combined with various TPP models, such as neural TPPs, and is easy to implement. In our experiments, we show that the proposed statistic excels at both traditional GoF testing, as well as at detecting anomalies in simulated and real-world data.
- IJCAINeural Temporal Point Processes: A ReviewOleksandr Shchur, Ali Caner Türkmen, Tim Januschowski, and 1 more authorIn International Joint Conference on Artificial Intelligence, IJCAI, 2021
Temporal point processes (TPP) are probabilistic generative models for continuous-time event sequences. Neural TPPs combine the fundamental ideas from point process literature with deep learning approaches, thus enabling construction of flexible and efficient models. The topic of neural TPPs has attracted significant attention in the recent years, leading to the development of numerous new architectures and applications for this class of models. In this review paper we aim to consolidate the existing body of knowledge on neural TPPs. Specifically, we focus on important design choices and general principles for defining neural TPP models. Next, we provide an overview of application areas commonly considered in the literature. We conclude this survey with the list of open challenges and important directions for future work in the field of neural TPPs.
2020
- NeurIPSFast and Flexible Temporal Point Processes with Triangular MapsOleksandr Shchur, Nicholas Gao, Marin Biloš, and 1 more authorIn Neural Information Processing Systems, NeurIPS, 2020
Oral
Temporal point process (TPP) models combined with recurrent neural networks provide a powerful framework for modeling continuous-time event data. While such models are flexible, they are inherently sequential and therefore cannot benefit from the parallelism of modern hardware. By exploiting the recent developments in the field of normalizing flows, we design TriTPP — a new class of non-recurrent TPP models, where both sampling and likelihood computation can be done in parallel. TriTPP matches the flexibility of RNN-based methods but permits orders of magnitude faster sampling. This enables us to use the new model for variational inference in continuous-time discrete-state systems. We demonstrate the advantages of the proposed framework on synthetic and real-world datasets.
- ICLRIntensity-free Learning of Temporal Point ProcessesOleksandr Shchur*, Marin Biloš*, and Stephan GünnemannIn International Conference on Learning Representations, ICLR, 2020
Spotlight
Temporal point processes are the dominant paradigm for modeling sequences of events happening at irregular intervals. The standard way of learning in such models is by estimating the conditional intensity function. However, parameterizing the intensity function usually incurs several trade-offs. We show how to overcome the limitations of intensity-based approaches by directly modeling the conditional distribution of inter-event times. We draw on the literature on normalizing flows to design models that are flexible and efficient. We additionally propose a simple mixture model that matches the flexibility of flow-based models, but also permits sampling and computing moments in closed form. The proposed models achieve state-of-the-art performance in standard prediction tasks and are suitable for novel applications, such as learning sequence embeddings and imputing missing data.
- DLGOverlapping Community Detection with Graph Neural NetworksOleksandr Shchur, and Stephan GünnemannIn Deep Learning for Graphs Workshop, DLG, 2020
Community detection is a fundamental problem in machine learning. While deep learning has shown great promise in many graphrelated tasks, developing neural models for community detection has received surprisingly little attention. The few existing approaches focus on detecting disjoint communities, even though communities in real graphs are well known to be overlapping. We address this shortcoming and propose a graph neural network (GNN) based model for overlapping community detection. Despite its simplicity, our model outperforms the existing baselines by a large margin in the task of community recovery. We establish through an extensive experimental evaluation that the proposed model is effective, scalable and robust to hyperparameter settings. We also perform an ablation study that confirms that GNN is the key ingredient to the power of the proposed model.
2019
- GEMDual-primal Graph Convolutional NetworksFederico Monti, Oleksandr Shchur, Aleksandar Bojchevski, and 3 more authorsIn Graph Embedding and Mining Workshop, GEM, 2019
In recent years, there has been a surge of interest in developing deep learning methods for non-Euclidean structured data such as graphs. In this paper, we propose Dual-Primal Graph CNN, a graph convolutional architecture that alternates convolution-like operations on the graph and its dual. Our approach allows to learn both vertex- and edge features and generalizes the previous graph attention (GAT) model. We provide extensive experimental validation showing state-of-the-art results on a variety of tasks tested on established graph benchmarks, including CORA and Citeseer citation networks as well as MovieLens, Flixter, Douban and Yahoo Music graph-guided recommender systems.
2018
- ICMLNetGAN: Generating Graphs via Random WalksAleksandar Bojchevski*, Oleksandr Shchur*, Daniel Zügner*, and 1 more authorIn International Conference on Machine Learning, ICML, 2018
Long oral
We propose NetGAN – the first implicit generative model for graphs able to mimic real-world networks. We pose the problem of graph generation as learning the distribution of biased random walks over the input graph. The proposed model is based on a stochastic neural network that generates discrete output samples and is trained using the Wasserstein GAN objective. NetGAN is able to produce graphs that exhibit well-known network patterns without explicitly specifying them in the model definition. At the same time, our model exhibits strong generalization properties, as highlighted by its competitive link prediction performance, despite not being trained specifically for this task. Being the first approach to combine both of these desirable properties, NetGAN opens exciting avenues for further research.
- R2LPitfalls of Graph Neural Network EvaluationOleksandr Shchur*, Maximilian Mumme*, Aleksandar Bojchevski, and 1 more authorIn Relational Representation Learning Workshop, R2L, 2018
Semi-supervised node classification in graphs is a fundamental problem in graph mining, and the recently proposed graph neural networks (GNNs) have achieved unparalleled results on this task. Due to their massive success, GNNs have attracted a lot of attention, and many novel architectures have been put forward. In this paper we show that existing evaluation strategies for GNN models have serious shortcomings. We show that using the same train/validation/test splits of the same datasets, as well as making significant changes to the training procedure (e.g. early stopping criteria) precludes a fair comparison of different architectures. We perform a thorough empirical evaluation of four prominent GNN models and show that considering different splits of the data leads to dramatically different rankings of models. Even more importantly, our findings suggest that simpler GNN architectures are able to outperform the more sophisticated ones if the hyperparameters and the training procedure are tuned fairly for all models.
- ICDMWAnomaly Detection in Car-Booking GraphsOleksandr Shchur, Aleksandar Bojchevski, Mohamed Farghal, and 2 more authorsIn International Conference on Data Mining Workshops, ICDM, 2018
The use of car-booking services has gained massive popularity in the recent years – which led to an increasing number of fraudsters that try to game these systems. In this paper we describe a framework for fraud detection in car-booking systems. Our core idea lies in casting this problem as an instance of anomaly detection in temporal graphs. Specifically, we use unsupervised techniques, such as dense subblock discovery, to detect suspicious activity. The proposed framework is able to adapt to the variations in the data inherent to the car-booking setting, and detects fraud with high precision. This work is performed in collaboration with Careem, where the described framework is currently being deployed in production.