Closed Bug 1480438 Opened 6 years ago Closed 5 years ago

Consider alternatives to pastebin.mozilla.org

Categories

(Cloud Services :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mhoye, Unassigned)

References

Details

With the pending exit from SCL3, Pastebin.mozilla.org is being decommissioned on September 1st, and any dependencies on it (./mach pastebin) will shortly fail. I would like to understand our use cases for Pastebin, demand on its services, and explore possible replacements.

To frame this discussion early: I don't believe that an unauthenticated, public-facing web service on Mozilla's server will be a viable replacement. The ability to understand and audit who is using our systems, if we decide to host this replacement ourselves, should be a requirement.
Blocks: 1480362
Phabricator has a Paste app. Perhaps we could enable that on https://phabricator.services.mozilla.com/.

You can see it on the phabricator projects site at https://secure.phabricator.com/paste/

Pasting is available from the web interface or from the cli via `arc paste`.
Using the additional functions of phabricator will require careful examination and testing prior to enabling them.

For this reason, I don't think Paste is a good substitute for pastebin in the deadline required.
If I understand the situation, the primary (only?) use case for pastebin is "I would like a zero-friction way to create a URL that points to a block of text" where "zero friction" means both "developers can so programmatically via mach" and "humans can do so with minimal barriers to participation". 

Is that a fair way to describe how people use it?
I tend to use pastebin for the way that :mhoye notes in #3. Generally, though, it's exceptionally short term as it's usually related to some IRC or Slack conversation, and seldom needs to persist longer than 24 hours. Github "gists" are a potential candidate, but there's no expiry for those. I suppose I could craft my own solution out of AWS services, but it would not be my first preference. 

I'm ok with having some form of semi-persistent login associated with whatever the pastebin like tool might be. (Logging in once per month would be ok, logging in every time would not be.)
Likewise. I do, however, in many occasions, have a need for a pastebin-like for non-text, and things like gist are annoying because they require authentication, which is unconvenient to setup on loaners (most of the time, when I do need it for non-text, it's to exfiltrate data from a loaner).
Yeah, I use pastebin a lot for "here's a fragment of code for you to look at" and occasionally for "I'm on some AWS machine and mach pastebin is the easiest way to get some small amount of data off the machine". Gists are poor for the latter usecase because having to authenticate with GitHub would be a pain. OTOH having a super-short expiry time (like 1 hour) would be fine because the data is typically re-downloaded almost immediately.
Just so we're clear, is "exfiltrating from a loaner" a real thing that occurs often enough that we should support it, or is this a polite euphemism for something more unsavory?
I don't know what the "unsavoury" thing you have in mind is, but "I have written a small bit of code on a loaner and want to get it off that machine" is a real use case I have. This is obviously a hacky solution to that, so if there's a better one I'd be happy to use it instead. But it is certainly a real thing that actually happens.
(In reply to Mike Hoye [:mhoye] from comment #7)
> Just so we're clear, is "exfiltrating from a loaner" a real thing that
> occurs often enough that we should support it, or is this a polite euphemism
> for something more unsavory?

It is definitely a real thing that occurs often enough to be worth supporting. Anyone with level 1 credentials can create a copy of a (at least linux, level 1) CI job with an interactive shell, to be able to debug things in the exact environment that CI runs in. These machines don't have any credentials, and it would be inappropriate to put them there, since anyone with the appropriate access can connect to them. The main way to interact with them is also a shell in a browser, so there isn't an easy way to copy&paste larger amounts of text or any kind of binary data off of them, which may be needed to use better tools to interact with the generated data.
I have definitely exfiltrated data from loaners on numerous occasions, though I hadn't thought of using mach pastebin for it -- good idea! Or it was... (I've relied on arduous cut & paste, or funky curl commands to POST to a server.)
My main use case with pastebin is for the mrgiggles IRC bot https://bitbucket.org/sfink/mrgiggles that frequently needs to display large amounts of text and I don't want to spam the IRC channel. Currently, he posts to pastebin and puts that link in the IRC channel. He uses a bash script to talk to pastebin.

github gists will probably work as a replacement, though I haven't tried working out the authentication yet. And I'd probably prefer some sort of expiration on them, but it doesn't really matter much.

Something where I could upload snippets of HTML would enable some nice features, actually. Some of mrgiggles' output (long mangled function names, or even longer unmangled ones) is excessively verbose and it would be very useful to hyperlink to searchfox. Hm... now that I'm looking at it, gist + rawgit.com looks like just the right thing.
(In reply to Mike Hoye [:mhoye] from comment #0)
> With the pending exit from SCL3, Pastebin.mozilla.org is being
> decommissioned on September 1st

Is there a public discussion that explains the rationale for shutting down with short notice a tool that lots of us use?

> The ability to understand and audit who is using our systems, if we decide
> to host this replacement ourselves, should be a requirement.

Why? It's very convenient when trying to help a user over IRC to give them an easy way to share error messages from the console, or an about:memory report.
Flags: needinfo?(mhoye)
I use pastebin extensively in the same way florian indicates - quick, frictionless here's-a-block-of-text for comment on (stack traces, patch snippets, pseudocode, log snippits, etc).  Typically on IRC, because I don't tend to discuss those sorts of things on slack.  90% of them are "keep for a day"; 9% are keep for a month; rarely I select keep forever so that I could go back to IRC archives and still see them if I needed to.  (Though I now know that that didn't work, apparently.... did it break at some point?  Was it purposely disabled while leaving the option there???)

Mostly I don't *want* this stuff saved in github or elsewhere - it's transitory.  Saving it there (where searches may find it forever I presume) or having to manually delete it is a problem....
> Is there a public discussion that explains the rationale for shutting down with short notice a tool that lots of us use?

Unfortunately not; this has been an internal communications failure on our part. The machine Pastebin lives on is in the SCL3 data center that we're vacating, and has been formally unsupported for a very long time. In particular, somehow the existence of "mach pastebin" never got communicated to that team, nor the fact that they were supporting a well-used developer tool, and not just keeping an unused and under-loved legacy system alive until it finally kicked the bucket.

While we've been looking into this, we've also learned that nothing in Pastebin is older than 30 days - the "forever" option doesn't and apparently has never worked, and we've never noticed until now. As far as I know it's never had an official owner, much less a formal SLA. It's just a thing we've always had and never thought that much about.

So there's a lot of not-great historical reasons and not-great moving parts contributing to our current not-great situation, but we are where we are. At this point:

- SCL3 is going away September 1, full stop. In its current incarnation Pastebin is going away with it.
- We don't really know how much Pastebin gets used - we have no historical data here, but anecdotes do seem to be piling up.

>> The ability to understand and audit who is using our systems, if we decide
>> to host this replacement ourselves, should be a requirement.
>
> Why? 

The biggest reason is that the combination of "modern media is driven by ad-revenue rageclicks" and "some people who dislike us can buy large-scale media attention" means that if we are running a service that lets some rando with a grudge dump whatever horrific shit they like on hardware we operate, we're effectively hosting our own attack surface.
Flags: needinfo?(mhoye)
(In reply to Mike Hoye [:mhoye] from comment #14)
> > Is there a public discussion that explains the rationale for shutting down with short notice a tool that lots of us use?

> While we've been looking into this, we've also learned that nothing in
> Pastebin is older than 30 days - the "forever" option doesn't and apparently
> has never worked, and we've never noticed until now.

Probably because the main use case is effortless temporary sharing for the length of an IRC discussion. Due to different timezones and people sometimes being off for a day, 24h may not be enough, but keeping the data for a week would be enough for almost all cases I remember.

> - SCL3 is going away September 1, full stop. In its current incarnation
> Pastebin is going away with it.

SCL3 is going away, sure. That doesn't explain what makes migration of pastebin to any other place impossible. Especially as there a still a couple weeks left.

> The biggest reason is that the combination of "modern media is driven by
> ad-revenue rageclicks" and "some people who dislike us can buy large-scale
> media attention" means that if we are running a service that lets some rando
> with a grudge dump whatever horrific shit they like on hardware we operate,
> we're effectively hosting our own attack surface.

The same argument could be used to justify closing wiki.mozilla.org or bugzilla.mozilla.org, so I don't find this point convincing.
(In reply to Florian Quèze [:florian] from comment #15)
> > The biggest reason is that the combination of "modern media is driven by
> > ad-revenue rageclicks" and "some people who dislike us can buy large-scale
> > media attention" means that if we are running a service that lets some rando
> > with a grudge dump whatever horrific **** they like on hardware we operate,
> > we're effectively hosting our own attack surface.
> 
> The same argument could be used to justify closing wiki.mozilla.org or
> bugzilla.mozilla.org, so I don't find this point convincing.

The fact that you actually need to go through the process of creating an account at both of those sites is actually a deterrent to rageposting (this surprised me, but it appears to be true).  We also have active moderation of both of those sites, which is harder in a site like pastebin that, by its very nature, has less monitoring of its use (not to mention mhoye's observation that it is essentially unowned and unmaintained).
Have we had any significant problems with pastebin usage?  I'm not expecting there have been no problems, but enough to justify removing it?  

I use pastebin every few days, and when I use it I typically post 1-10 items that day.

Can we add a small amount of friction to first-time-use if that's what's needed?  ("what is 1 plus 10?"  "Who is the chairperson of Mozilla?")  If it's needed...  or when we find it's needed.
It's not a question of justifying its removal; that is non-negotiably happening. In its current incarnation Pastebin.m.o is unsupported abandonware that lives in a DC that we're vacating, per plans (and contracts, and an enormous amount of effort) that have been underway for most of a year now. We just aren't in a situation where we can punt on this.

What we have to justify is what, if anything, we replace it with. With that in mind, our most important question is, "is the functionality pastebin provides important, and used frequently enough for us to justify figuring out what a replacement might look like"?

Given the state of this bug, I think the answer to this is probably yes. 

If so, the next questions are:

- What are our functional requirements? 
- What if any are our security requirements?
- Should we be providing this service ourselves, in-house, or should we buy something off some shelf somewhere?
I think we can tease apart our use cases a bit here, and that we have three slightly different things we want:

- A minimal-friction way for humans to share textual data with developers
- A zero-auth way to extract data from systems into which we do not want to put any credentials.
- A command-line tool for turning a text file into a URL

Does that seem reasonable?
(In reply to Mike Hoye [:mhoye] from comment #19)
> - A minimal-friction way for humans to share textual data with developers
> - A zero-auth way to extract data from systems into which we do not want to
> put any credentials.
> - A command-line tool for turning a text file into a URL
> 
> Does that seem reasonable?

Sounds right to me, as long as "humans" is understood to include both developers and random non-spammer people showing up on IRC. (Which is probably why you said "humans".)
(In reply to Mike Hoye [:mhoye] from comment #19)
> I think we can tease apart our use cases a bit here, and that we have three
> slightly different things we want:
> 
> - A minimal-friction way for humans to share textual data with developers
> - A zero-auth way to extract data from systems into which we do not want to
> put any credentials.
> - A command-line tool for turning a text file into a URL

I use all three of these, and in fact will use base64 and split with the second use-case to extract smallish files from loaners/containers/untrusted systems also.  I don't know if you want to consider that a distant fourth or just a plea for a larger upload limit on number 2. =)
I remain deeply opposed to zero-auth systems, but I don't see a way to provide "a zero-auth way to extract data from systems" that doesn't involve a zero-auth system. So here we are I guess. 

I would like to propose:

- That we evaluate and stand up a current, maintained Pastebin equivalent, whose requirements are:
  - an API to support "mach pastebin"
  - a modest if not five-nines SLA ("the service is very likely to be there!") and
  - a single, very short expiration time (60 minutes? 90? Not a day, not forever) to mitigate risks.

- That we implement something like "mach gist", with github authentication, for anything that we would like to be longer-lived. 

There are a few good candidates for "current, maintained pastebin equivalent", as well as python "gist" commands that wouldn't introduce any new dependencies to our dev environment.
We have Firefox Notes and Firefox Screenshots. Instead of building new infra, can we build on what we have?
While I read everything and still don't understand what are you scared about for a simple paste service: you can't be hold responsible for what people are pasting on it, you just host the service. With the same logic comments on articles or any other platforms including Facebook or Twitter wouldn't allow anyone using it, because well, they could say whatever they want on it.
I would also add that it wasn't an issue for… well, quite some years up until now.

Which is why I can suggest you to take a look at PrivateBin: https://github.com/PrivateBin/PrivateBin

I'm a regular user on a friend's server, it allow simple pasts, it's encrypted client-side with the key in the URL, which means it's not possible to know what the server is storing. It allows to pick the expiration time as well as burning on first read, and can have a password set on it. I'll not list everything, but there's also syntax highlighting.

While I never used it, it seems a CLI also exists: https://github.com/PrivateBin/PrivateBin-Cli
(In reply to Mike Hoye [:mhoye] from comment #22)
> I remain deeply opposed to zero-auth systems, but I don't see a way to
> provide "a zero-auth way to extract data from systems" that doesn't involve
> a zero-auth system. So here we are I guess. 
> 
> I would like to propose:
> 
> - That we evaluate and stand up a current, maintained Pastebin equivalent,
> whose requirements are:
>   - an API to support "mach pastebin"
>   - a modest if not five-nines SLA ("the service is very likely to be
> there!") and
>   - a single, very short expiration time (60 minutes? 90? Not a day, not
> forever) to mitigate risks.

IIUC the source code behind https://dpaste.de (code: https://github.com/bartTC/dpaste docs: http://dpaste.readthedocs.io/) has been considered a few times by ops/IT (and even sec reviewed at somepoint?) and matches those requirements.
(In reply to Jannis Leidel [:jezdez] from comment #25)
> IIUC the source code behind https://dpaste.de (code:
> https://github.com/bartTC/dpaste docs: http://dpaste.readthedocs.io/) has
> been considered a few times by ops/IT (and even sec reviewed at somepoint?)
> and matches those requirements.

Any reason to not use https://dpaste.de as is? i.e. don't host something that's already running if we aren't adding anything to it
Who runs dpaste.de? Building tools that send traffic to their service, free or not, without contacting them ahead of time seems like it's in poor faith. If we're going to depend on somebody's random internet service, we should give them a heads up and maybe offer them some sort of support.
I review of the dpaste.de instance was initiated in bug 1484295, which I'm working on now.
Can't access 1484295 (access restricted), but any reason not to consider PrivateBin that I talked about in comment 24? It seems to be much more featured from what I can see and read.
I have made bug 1484295 public for the sake of better transparency into dpaste sec eval as well as other considerations discussed there.

TLDR: Perhaps we could use Firefox::Send (https://send.firefox.com/), which has strong security properties, is actively managed, and is already the defacto standard for sync log support.
(In reply to Jonathan Claudius [:claudijd] (use NEEDINFO) from comment #30)
> I have made bug 1484295 public for the sake of better transparency into
> dpaste sec eval as well as other considerations discussed there.
> 
> TLDR: Perhaps we could use Firefox::Send (https://send.firefox.com/), which
> has strong security properties, is actively managed, and is already the
> defacto standard for sync log support.

I guess I'll go with an answer here as the other bug seems to be strongly focused over dpaste.
Isn't the sync log you're talking about just a single use case among others?

I see several flaws to using send.firefox.com as a paste service… starting with the fact it isn't a paste service. Which means, if I'm not mistaken because it changed, that you can't see the content directly online and need to download the file, you're here losing the quickness advantage of such service. Plus expiration time seems to be only short, and content can be seen only once, here again it seems out of topic for many use cases, including the support you're talking about, it's likely that not only one person will want to look at it.

About the "fear" of what could be hosted on such service, I'm still thinking it's not an issue as you can't be hold responsible for what's hosted if it's open to everyone and you download forbidden/illegal content once it's signaled.
The best still is to have zero-knowledge of what's hosted, with client-side encryption.

Perhaps we all might find PrivateBin - https://privatebin.info an adequately authenticated and secure PasteBin replacement. The source for the project is open, on GitHub, and it could be 'self hosted' by Mozilla so the trust should not be too complicated to work out.

I never came back here to resolve this, I'm sorry. We've had a conversation with the DPaste author, and it looks like we'll be standing up a Pastebin replacement in the next few weeks (exact schedule tbd) based on DPaste.

Among the findings in the linked secreview that are compelling are:

  • we can choose arbitraily long, not-easily-guessable URLs as the default behavior of the tool,
  • we can set and enforce short-term expiration (I'm pushing for 90 minutes), and
  • while the dpaste client is presently written in Ruby, the author has told us that he'd be willing to accept a python tool into the main repo as well, offering us the possibility of bringing back a replacement for "mach pastebin" without adding any additional dependencies to m-c.

I'm leaving this open pending the new service actually going to prod, but I think that most of the decisions here have been made.

fwiw, I've been using github gists via a simple script I wrote https://github.com/hotsphink/sfink-tools/blob/master/bin/mkgist. But it definitely does not meet the expiration requirements here; the gists are pretty much permanent. (The urls are long and ugly too.)

Sadly, I suspect everyone has now settled on a different solution at this point, and it'll be some time before we all reconverge on a single solution. But it's still good to have a blessed one.

as aside, we removed mach pastebin in bug 1480362

FWIW, I've been using paste.rs and others, and I liked that the client is as simple as a curl command.

https://privatebin.net/ looks really nice and seems to have a better UX than dpaste. Have we rejected it for some reason or just didn't consider?

paste.mozilla.org is up and running, so I'm closing this up as solved.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.