This bug is about considering converting the analysis code used to produce the aggregates for the gathered data to Python from R. Our pipeline makes extensive use of Python and introducing an R kernel might come with additional problems we don't want to face within the scope of this project. The skinned down RAPPOR code is available on Alejandro's Github at https://github.com/Alexrs95/rappor with some updated documentation. The documentation about how to run the simulation lives at https://docs.google.com/document/d/1xi-3liU7wWOUaL_QEOA8vvNFCc4ULjLelth1QWNwSPk/edit?ts=595547c8#heading=h.jt46muo7hp9y We should get an understanding of: - does the R code require exotic libraries only available in R? - is there any speed concern in using the Python libraries compared to the R ones?
Assignee: nobody → fhartmann
Priority: -- → P1
By now we finished evaluating the R code base, and started working on the reimplementation in Python. In terms of performance, the Python version generally performs a little bit better than the original R implementation. Generally, we were able to find equivalent Python libraries for most libraries used in R. The only exception is limSolve which provides lsei, a least squares implementation that allows the user to encode additional constraints. We evaluated several possible replacements. Using a nonnegative least squares solver allows us to encode the first constraint and generally performs very similar in terms of the coefficients that it finds. By using this replacement, we can get results that are very comparable to the results from the original implementation.  https://cran.r-project.org/web/packages/limSolve/limSolve.pdf  https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.nnls.html
Status: NEW → RESOLVED
Last Resolved: 10 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.