Contact Info:
Saharon Rosset
School of Mathematical Sciences
Tel Aviv University, Tel Aviv, Israel
Phone: +972 3 6408820 (remove numbers)

I am an Associate Professor in the Statistics department at Tel Aviv University, which I joined in 2007.
Prior to that, I graduated with a Ph.D. in Statistics from
Stanford University in summer 2003,
where I worked with Jerry Friedman and Trevor Hastie.

My thesis: Topics in Regularization and Boosting.
I spent four years at IBM Research, in the DAR group.
My research interests are in Statistical Genetics, Statistical Learning theory and applications and
Data Mining.


Statistical Learning: Fall 2007/8; Spring 2009; Spring 2011; Spring 2013; Fall 2014/5;
Spring 2017

Introduction to Statistics: Spring 2008; Spring 2009  (Course page on Moodle)

Statistics for Computer Science : Fall 2008/9; Fall 2009/10; Fall 2010/11;
Fall 2011/12; Fall 2012/13; Fall 2013/14; Fall 2014/15; Fall 2015/16  (
Course page on Moodle)

Regression : Fall 2010/11

Topics in Statistical Genetics: Spring 2010; Spring 2012; Spring 2014;
Spring 2016

Statistics M.Sc. seminar: Fall 2009/10; Fall 2011/12; Spring 2017

Statistics B.Sc. seminar: Spring 2012; Spring 2013; Spring 2014

Bootstrap and Resampling Methods: Fall 2012/13

Statistics of Big Data: Spring 2015

Introduction to Statistical Learning: Spring 2016



Group members

Current PhD+ students: Keren Levinstein Hallak (PhD), Assaf Rabinowicz (PhD), Amit Moscovich-Eiger (postdoc)

Past: Ronny Luss (Post-Doc, 2009-2011), David Golan (PhD 2014), Shlomi Lifshits (PhD 2015, joint with Yaniv Assaf), Amichai Painsky (PhD 2016, joint with Meir Feder), Omer Weissbrod (PhD 2017, joint with Dan Geiger of Technion), Giora Simchoni (MSc 2011), Adi Sarid (MSc 2011), Shachar Kaufman (MSc 2012), Amichai Painsky (MSc 2012) , Slava Borodovski (MSc 2012), Lital Bridavsky (MSc 2013), Roee Eilat (MSc 2016), Ayala Neudorfer (MSc 2017), Avner Abrami (summer student 2014)


Publications (in bold – members of our research group)


Amichai Painsky, Saharon Rosset. (2017).

Cross-Validated Variable Selection in Tree-Based Methods Improves Predictive Performance.
IEEE Transactions on Pattern Analysis and Machine Intelligence, to appear (accepted 12/2016)

Shay Tzur, Saharon Rosset. (2017).

Strictly conserved tri-nucleotide motif ‘CAT’ is associated with TAS DNA protein binding sites in human mitochondrial DNA control region.
Mitochondrial DNA, Vol 28, Issue 2, 250–253 (2017).



Sagi Shporer, Benny Chor, Saharon Rosset, David Horn. (2016).

Inversion symmetry of DNA k-mer counts: validity and deviations.
BMC Genomics, 17:696.

Yaron Granot, Omri Tal, Saharon Rosset, Karl Skorecki. (2016).

On the Apportionment of Population Structure.
PLOS One, DOI:10.1371/journal.pone.0160413 August 9, 2016.

Omer Weissbrod, Dan Geiger, Saharon Rosset. (2016).

Multikernel linear mixed models for complex phenotype prediction.
Genome Research, 26 (7), 969–979 (July 2016).

Regev Schweiger, Shachar Kaufman, Reijo Laaksonen, Marcus E. Kleber, Winfried März, Eleazar Eskin, Saharon Rosset, Eran Halperin. (2016).

Fast and Accurate Construction of Confidence Intervals for Heritability.
American Journal of Human Genetics, Vol. 98, No. 6, p1181–1192, 2 June 2016.

Amichai Painsky, Saharon Rosset, Meir Feder. (2016).

Generalized Independent Component Analysis Over Finite Alphabets.
IEEE Transactions on Information Theory, Vol. 62, No. 2, Feb. 2016.

Amichai Painsky, Saharon Rosset. (2016).

Isotonic Modeling with Non-differentiable Loss Functions with Application to Lasso Regularization.
IEEE Transactions on Pattern Analysis and Machine Intelligence, Vol. 32, No. 2, Feb. 2016.


David Golan, Eric Lander, Saharon Rosset. (2014).

Measuring Missing Heritability: Inferring the Contribution of Common Variants.
Proceedings of the National Academy of Sciences, Vol. 111, No. 49, E5272–E5281, 2014.

David Golan, Saharon Rosset. (2014).

Effective Genetic Risk Prediction Using Mixed Models.
American Journal of Human Genetics, Volume 95, Issue 4, p383–393, 2 October 2014.

Shachar Kaufman, Saharon Rosset. (2014).

When Does More Regularization Imply Fewer Degrees of Freedom? Sufficient Conditions and Counterexamples.
Biometrika, 101 (4): 771–784, Dec. 2014.

Saharon Rosset, Ehud Aharoni, Hani Neuvirth. (2014).

Novel Statistical Tools for Management of Public Databases Facilitate Community-Wide Replicability and Control of False Discovery.
Genetic Epidemiology, Vol. 38, Issue 5, pages 477–481, July 2014.

Shachar Kaufman, Saharon Rosset. (2014).

Exploiting Population Samples to Enhance Genome Wide Association Studies of Disease.
Genetics, Vol. 197, 337–349, May 2014.

(Press release from Genetics Society of America)

(Blog post from GSA website)

Ronny Luss, Saharon Rosset. (2014).

Generalized Isotonic Regression.
Journal of Computational and Graphical Statistics, Vol. 23, No. 1, 192-210.

Ehud Aharoni, Saharon Rosset. (2014).

Generalized Alpha Investing: Definitions, Optimality Results, and Application to Public Databases.
Journal of Royal Statistical Society, Series B, Volume 76, Issue 4, 771–794, September 2014.

Amichai Painsky, Saharon Rosset. (2014).

Optimal Set Cover Formulation for Exclusive Row Biclustering of Gene Expression.
Journal of Computer Science and Technology (JCST),
29(3): 423–435, May 2014.
(long version of SDM-2012 paper)


David Golan, Saharon Rosset. (2013).

Statistical Modeling of coverage in High-Throughput Data.
Chapter 4 in Deep Sequencing Data Analysis, N. Shomron (ed.), Springer, 2013.

Saharon Rosset. (2013).

Practical Sparse Modeling: an Overview and Two Examples from Genetics.
Chapter 3 in Practical Applications of Sparse Modeling, I. Rish et al. (eds.), MIT Press, 2013, to appear (accepted 5/13).




Amichai Painsky, Saharon Rosset. (2012).

Exclusive Row Biclustering for Gene Expression Using a Combinatorial Auction Approach.
IEEE Conference on Data Mining (ICDM-2012).

Shachar Kaufman, Saharon Rosset, Claudia Perlich, Ori Stitleman. (2012).

Leakage in Data Mining: Formulation, Detection, and Avoidance.
ACM Transactions on Knowledge Discovery from Data, Vol. 6 No. 4, December 2012.

(long version of KDD-2011 paper by same name)

David Golan, Saharon Rosset. (2012).

Comment on “The Predictive Capacity of Personal Genome Sequencing”.
Science Translational Medicine 4, 135le4 (2012).

David Golan, Yaniv Erlich, Saharon Rosset. (2012).

Weighted Pooling - Practical and Cost Effective Techniques for Pooled High Throughput Sequencing .
Bioinformatics, Vol. 28, pages i197–i206 (Proceedings ISMB 2012).

Doron M. Behar, Mannis van Oven, Saharon Rosset, Mait Metspalu, Eva-Liis Loogväli, Nuno M. Silva, Toomas Kivisild, Antonio Torroni, Richard Villems. (2012).

A “Copernican” Reassessment of the Human Mitochondrial DNA Tree from its Root .
The American Journal of Human Genetics, Volume 90, Issue 4, 675-684, 6 April 2012.

Shay Tzur, Saharon Rosset, Walter Wasser, Doron Behar, Karl Skorecki. (2012).

APOL1 allelic variants are associated with lower age of dialysis initiation and thereby increased dialysis vintage in African and Hispanic Americans with non-diabetic end-stage kidney disease.
Nephrology, Dialysis and Transplantation, 27(4):1498-505, 2012.

Melyssa Gymrek, David Golan, Saharon Rosset, Yaniv Erlich. (2012).

lobSTR: A short tandem repeat profiler for personal genomes.
Genome Research, 22(6) : 1154-62, June 2012, doi: 10.1101/gr.135780.111 .

Giles Hooker, Saharon Rosset. (2012).
Prediction-Based Regularization Using Data Augmented Regression.
Statistics and Computing, Volume 22, Issue 1, pp 237-249.

Ronny Luss, Saharon Rosset and Moni Shahar. (2012).
Efficient Regularized Isotonic Regression with Application to Gene-Gene Interaction Search.
Annals of Applied Statistics, Volume 6, Number 1, pp 253-283.
Matlab code implementing the IRP algorithm.




Sijian Wang, Bin Nan, Saharon Rosset and Ji Zhu. (2011).
Random Lasso.
Annals of Applied Statistics, Vol.  5, No. 1, 468-485.

Ehud Aharoni, Hani Neuvirth and Saharon Rosset. (2011).
The Quality Preserving Database: A Computational Framework for Encouraging Collaboration, Enhancing Power and Controlling False Discovery.
IEEE Transactions on Computational Biology and Bioinformatics, Sep-Oct;8(5):1431-7.

Saharon Rosseta, Shay Tzura, Walter Wasser, Doron Behar and Karl Skorecki. (2011).
The population genetics of chronic kidney disease: insights from the MYH9–APOL1 locus.
Nature Reviews Nephrology, 7, 313-326 (June 2011) | doi:10.1038/nrneph.2011.52.

(aequal contribution)

David Golan, Saharon Rosset. (2011).
Accurate Estimation of Heritability in Genome Wide Studies using Random Effects Models.
Bioinformatics (Proceedings of ISMB-ECCB11), Volume27, Issue13, Pp. i317-i323.

Shachar Kaufman, Saharon Rosset and Claudia Perlich. (2011).
Leakage in Data Mining: Formulation, Detection, and Avoidance.
Best paper award winner at KDD-2011.

Doron M. Behar, Einat Kedem, Saharon Rosset, Yonas Haileselassie, Shay Tzur, Zipi Kra-Oz, Walter G. Wasser, Yotam Shenhar, Eduardo Shahar, Gamal Hassoun, Carcom Maor, Dawit Wolday, Shimon Pollack, Karl Skorecki. (2011).
Absence of APOL1 Risk Variants Protects against HIV-Associated Nephropathy in the Ethiopian Population.
American Journal of Nephrology, 2011,34:452-459.


Doron M. Behara, Saharon Rosseta, Shay Tzura, Sara Selig, Guennady Yudkovsky, Sivan Bercovici, Jeffrey B. Kopp, Cheryl A. Winkler, George W. Nelson, Walter G. Wasser and Karl Skorecki. (2010).
African ancestry allelic variation at the MYH9 gene contributes to increased susceptibility to non-diabetic end-stage kidney disease in Hispanic Americans.
Human Molecular Genetics, Vol. 19, No. 9 1816–1827. doi:10.1093/hmg/ddq040

 (aequal contribution)

Doron M. Behar, Bayazit Yunusbayev, Mait Metspalu, Ene Metspalu, Saharon Rosset, Jüri Parik, Siiri Rootsi, Gyaneshwer Chaubey, Ildus Kutuev, Guennady Yudkovsky, Elza K. Khusnutdinova, Oleg Balanovsky, Ornella Semino, Luisa Pereira, David Comas, David Gurwitz, Batsheva Bonne-Tamir, Tudor Parfitt, Michael F. Hammer, Karl Skorecki and Richard Villems. (2010).
The genome-wide structure of the Jewish people.
Nature, Vol. 466 238–242. doi:10.1038/nature09103

Saharon Rosset, Claudia Perlich, Grzegorz Swirszcz. Yan Liu and Prem Melville (2010).
Medical Data Mining: Lessons from Winning Two Competitions.
Data Mining and Knowledge Discovery Journal, Vol. 20, Num. 3, 439–468.

Osnat Ravid-Amir and Saharon Rosset. (2010).
Maximum Likelihood Estimation of Locus-Specific Mutation Rates in Y-chromosome Short Tandem Repeats.
Bioinformatics (proceedings of ECCB10) 26(18): i440-i445.

Shay Tzura, Saharon Rosseta, Revital Shemer, Guennady Yudkovsky, Sara Selig, Ayele Tarekegn, Endashaw Bekele, Neil Bradman, Walter G. Wasser, Doron M. Behar and Karl Skorecki . (2010).
Missense mutations in the APOL1 gene are highly associated with end stage kidney disease risk previously attributed to the MYH9 gene.
Human Genetics, Vol. 128, Issue 3 345–350.
(aequal contribution)

Ronny Luss, Saharon Rosset and Moni Shahar. (2010).
Decomposing Isotonic Regression for Efficiently Solving Large Problems.
NIPS 2010.

Richard Lawrence, Claudia Perlich, Saharon Rosset, et al. (10 authors). (2010).
Operations Research Improves Sales Force Productivity at IBM.
INFORMS Interfaces, Vol. 40, No. 1, January-February 2010, pp. 33-46.


Aurelie Lozano, Naoki Abe, Yan Liu and Saharon Rosset. (2009).
Grouped graphical Granger modeling for gene expression regulatory networks discovery.
Bioinformatics 25(12):i110-i118 (proceedings of ISMB09); doi:10.1093/bioinformatics/btp199.

Aurelie Lozano, Naoki Abe, Yan Liu and Saharon Rosset. (2009).
Grouped graphical Granger modeling methods for temporal causal modeling.

Michael F. Hammer, Doron M. Behar, Tatiana M. Karafet, Fernando L. Mendez, Brian Hallmark, Tamar Erez, Lev A. Zhivotovsky, Saharon Rosset, Karl Skorecki. (2009).
Extended Y chromosome haplotypes resolve multiple and unique lineages of the Jewish priesthood.
Human Genetics,
 Volume 126, Number 5 / November, 2009 DOI 10.1007/s00439-009-0727-5

Ji Zhu, Hui Zou, Saharon Rosset and Trevor Hastie. (2009).
Multi-class AdaBoost.
Statistics and its Interface, volume 2, issue 3. 

Saharon Rosset. (2009).
Bi-Level Path Following for Cross Validated Solution of Kernel Quantile Regression.
Journal of Machine Learning Research, 10(Nov):2473−2505, 2009
(short version from ICML-08).



Saharon Rosset, Spencer Wells, David Soria-Hernanz, Chris Tyler-Smith, Ajay Royyuru, Doron Behar. (2008). Maximum Likelihood Estimation of Site-Specific Mutation Rates in Human Mitochondrial DNA from Partial Phylogenetic Classification. Genetics 180 : 1511–1524. DOI: 10.1534/genetics.108.091116.


Claudia Perlich, Prem Melville, Yan Liu, Grzegorz Swirszcz, Richard Lawrence, Saharon Rosset. (2008). Breast Cancer Identification: KDD CUP Winner's Report. SIGKDD Explorations, vol. 10, issue 2, 39-42


Prem Melville, Saharon Rosset, Richard Lawrecne. (2008). Customer Targeting Models Using Actively-Selected Web Content. KDD-08.


Doron M Behar, Ene Metspalu, Toomas Kivisild, Saharon Rosset, Shay Tzur, Yarin Hadid, Guennady Yudkovsky, Dror Rosengarten, Luisa Pereira, Antonio Amorim, Ildus Kutuev, David Gurwitz, Batsheva Bonne-Tamir, Richard Villems and Karl Skorecki. (2008). Counting the Founders: The Matrilineal Genetic Ancestry of the Jewish Diaspora. PLoS ONE 3(4): e2062. DOI:10.1371/journal.pone.0002062.

Doron M Behar, Richard Villems, Himla Soodyall, Jason Blue-Smith, Luisa Pereira, Ene Metspalu, Rosaria Scozzari, Heeran Makkan, Shay Tzur, David Comas, Jaume Bertranpetit, Lluis Quintana-Murci, Chris Tyler-Smith, R. Spencer Wells and Saharon Rosset. (2008). The Dawn of Human Matrilineal Diversity. American Journal of Human Genetics, 82(5) : 1130-1140. DOI:10.1016/j.ajhg.2008.04.002.


Saharon Rosset, Claudia Perlich and Yan Liu. (2007).
Making the Most of Your Data: KDD Cup 2007 "How Many Ratings" Winner's Report. SIGKDD Explorations, vol. 9, issue 2.

Claudia Perlich, Saharon Rosset, Rick Lawrence, Bianca Zadrozny. (2007). High Quantile Modeling for Customer Wallet Estimation with Other Applications. KDD-07.

Saharon Rosset, Grzegorz Swirszcz, Nathan Srebro, Ji Zhu. (2007). l1 Regularization in Infinite Dimensional Feature Spaces . COLT-07.

Doron M Behar, Saharon Rosset, Jason Blue-Smith, Oleg Balanovsky, Shay Tzur, David Comas, R. John Mitchell, Lluis Quintana-Murci, Chris Tyler-Smith, R. Spencer Wells and The Genographic Consortium. (2007). The Genographic Project Public Participation Mitochondrial DNA Database . PLoS Genetics Vol. 3, No. 6, e104.

Saharon Rosset, Ji Zhu. (2007). Piecewise Linear Regularized Solution Paths. Annals of Statistics, 35(3). (Earlier longer versions 1, 2).

Claudia Perlich, Saharon Rosset. (2007). Identifying Bundles of Product Options using Mutual Information Clustering. SIAM Data Mining 07 (SDM-07).

Saharon Rosset. (2007). Efficient Inference on Known Phylogenetic Trees Using Poisson Regression. Proc. of the 5th European Conference on Computational Biology (ECCB-2006), Bioinformatics 23: e142-e147.

Saharon Rosset, Claudia Perlich, Bianca Zadrozny. (2007). Ranking-Based Evaluation of Regression Models. Knowledge and Information Systems, Vol. 12, No. 3. (short version from ICDM-05)


Srujana Merugu,
Saharon Rosset, Claudia Perlich. (2006). A New Multi-View Regression Method with an Application to Customer Wallet Estimation. KDD-06.

Saharon Rosset, Rick Lawrence. (2006). Data Enhanced Predictive Modeling for Sales Targeting. SIAM Data Mining 06 (SDM-06).


Rob Tibshirani, Michael Saunders,
Saharon Rosset, Ji Zhu, Keith Knight. (2005). Sparsity and Smoothness via the Fused Lasso. Journal of the Royal Statistical Society Series B, Vol. 67 No. 1.

Saharon Rosset. (2005). Robust Boosting and Its Relation to Bagging. KDD-05.

Sofus Macskassy, Foster Provost, Saharon Rosset. (2005). ROC Confidence Bands: An Empirical Evaluation . ICML-05.

Saharon Rosset, Claudia Perlich, Bianca Zadrozny, Srujana Merugu, Sholom Weiss, Rick Lawrence. (2005). Customer Wallet Estimation . NYU workshop on CRM and Data Mining.


Saharon Rosset, Ji Zhu, Trevor Hastie. (2004). Boosting as a Regularized Path to A Maximum Margin Classifier. Journal of Machine Learning Research, 5(Aug):941-973.

Saharon Rosset, Ji Zhu. (2004). Discussion of "Least Angle Regression" by Efron et al. . Annals of Statistics, April 2004.

Jerry Friedman, Trevor Hastie,
Saharon Rosset, Rob Tibshirani, Ji Zhu. (2004). Discussion of three boosting papers. Annals of Statistics, February 2004.



Saharon Rosset, Ji Zhu. (2004). Corrected Proof of the Result of "A Prediction Error Property of the Lasso" by Huang(2003). Australia and New Zealand Journal of Statistics, 46(3):505-510.

Trevor Hastie,
Saharon Rosset, Rob Tibshirani, Ji Zhu. (2004). The Entire Regularization Path for the Support Vector Machine. Journal of Machine Learning Research, 5(Oct):1391--1415. R package. (short version presented at NIPS 2004)

Saharon Rosset, Ji Zhu, Hui Zou, Trevor Hastie. (2004). A Method for Inferring Label Sampling Mechanisms in Semi-Supervised Learning. NIPS 2004.

Saharon Rosset. (2004). Tracking Curved Regularized Optimization Solution Paths. NIPS 2004.

Saharon Rosset. (2004). Model Selection via the AUC. ICML-04.


Saharon Rosset, Ji Zhu, Trevor Hastie. (2003). Margin Maximizing Loss Functions. NIPS 2003.

Ji Zhu, Saharon Rosset, Trevor Hastie, Rob TIbshirani. (2003). 1-norm Support Vector Machines. NIPS 2003.

Saharon Rosset, Einat Neumann. (2003). Integrating Customer Value Considerations into Predictive Modeling. ICDM-03.

Saharon Rosset, Einat Neumann, Uri Eick, Nurit Vatnik. (2003). Lifetime Value Models for Decision Support. Data Mining and Knowledge Discovery Journal, Vol. 7, 321-339.

    2002 and earlier

Saharon Rosset, Einat Neumann, Uri Eick, Nurit Vatnik, Shuki Idan. (2002). Lifetime Value Modeling and Its Use for Customer Retention Planning. KDD-02.
Winner of best application paper award.

Saharon Rosset, Eran Segal. (2002). Boosting Density Estimation. NIPS-2002.

Saharon Rosset. (2002). Value Weighted Analysis: Building Prediction Models for Data with Observation Weights. Uncompleted technical report. Draft available.

Saharon Rosset, Einat Neumann, Uri Eick, Nurit Vatnik, Shuki Idan. (2001). Evaluation of Prediction Models for Campaign Planning. KDD-01.

Saharon Rosset, Aron Inger. (2000). KDD-Cup 99: Knowledge Discovery In a Charitable Organization's Donor Database. SIGKDD Explorations 1(2): 85-90 (2000)

Saharon Rosset, Uzi Murad, Einat Neumann, Yizhak Idan, and Gadi Pinkas. (1999). Discovery of Fraud Rules for Telecommunications: Challenges and Solutions, KDD-99: 409-413.

Saharon Rosset. (1998).
Ranking: Methods for Flexible Evaluation and Efficient Comparison of Classification Performance. KDD-98.



Visitor number  web counter  since 1/31/07