Closed
Bug 1386554
Opened 7 years ago
Closed 7 years ago
Consider converting the R analysis code for RAPPOR to python
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Dexter, Assigned: fhartmann)
References
Details
(Whiteboard: [measurement:client:tracking])
This bug is about considering converting the analysis code used to produce the aggregates for the gathered data to Python from R. Our pipeline makes extensive use of Python and introducing an R kernel might come with additional problems we don't want to face within the scope of this project. The skinned down RAPPOR code is available on Alejandro's Github at https://github.com/Alexrs95/rappor with some updated documentation. The documentation about how to run the simulation lives at https://docs.google.com/document/d/1xi-3liU7wWOUaL_QEOA8vvNFCc4ULjLelth1QWNwSPk/edit?ts=595547c8#heading=h.jt46muo7hp9y We should get an understanding of: - does the R code require exotic libraries only available in R? - is there any speed concern in using the Python libraries compared to the R ones?
Reporter | ||
Updated•7 years ago
|
Assignee: nobody → fhartmann
Priority: -- → P1
Assignee | ||
Comment 1•7 years ago
|
||
By now we finished evaluating the R code base, and started working on the reimplementation in Python. In terms of performance, the Python version generally performs a little bit better than the original R implementation. Generally, we were able to find equivalent Python libraries for most libraries used in R. The only exception is limSolve[1] which provides lsei, a least squares implementation that allows the user to encode additional constraints. We evaluated several possible replacements. Using a nonnegative least squares solver[2] allows us to encode the first constraint and generally performs very similar in terms of the coefficients that it finds. By using this replacement, we can get results that are very comparable to the results from the original implementation. [1] https://cran.r-project.org/web/packages/limSolve/limSolve.pdf [2] https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.nnls.html
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•