Closed Bug 793376 Opened 8 years ago Closed 7 years ago

Make the profiler symbols *_EXPORT

Categories

(Core :: Gecko Profiler, defect)

defect
Not set

Tracking

()

RESOLVED WONTFIX

People

(Reporter: cjones, Assigned: dhylands)

References

Details

I don't know which macro is currently in fashion to use for __declspec(dllimport) / __attribute__((visibility ("default"), but we should use that for the profiler hooks.

The gecko profiler is an incredibly useful tool.

Problem is, the design of the profiler environment (controller and ui) is rapidly diverging from what's needed on multi-process and memory-constrained builds of gecko.  Fine, the profiler authors have no reason to care about that configuration (it's 0% of current shipping marketshare exclcuding b2g dogfooders, as of today), but we have projects in flight that need the profiler to work in this configuration, as do business partners (who want just *a* profiler).

So let's diverge.  We make the useful information MOZ_EXPORT, and any tool can hook in to consume it.  b2g forks the sampling code and LD_PRELOADs its lib and everyone succeeds without asking other user to limit allocations of gigantic JS object, e.g.

Objections?
I don't get what added value you'd have with the profiler symbols being exported. What are you after?
LD_PRELOAD="libexternalprofiledataconsumer.so"
I'm working on addressing bug 789667 by streaming the profile data over the remote debugging protocol, which would lift the memory requirement to build js objects. Would that be enough for you, instead of going through hoops to get the profiling data with an LD_PRELOAD?
Does the remote debug protocol work support subprocesses?
(In reply to Chris Jones [:cjones] [:warhammer] from comment #4)
> Does the remote debug protocol work support subprocesses?

Not directly at the moment, but eventually it will. In the meantime, you can connect directly to a subprocess remote debug server, provided you set it to start on a different port than the main process.
This is exactly my point in comment 0.
Assignee: nobody → dhylands
(In reply to Chris Jones [:cjones] [:warhammer] from comment #6)
> This is exactly my point in comment 0.

I still think exposing the profiler internals to a LD_PRELOADed lib is overkill.
Overkill of what?  The patch required here will be a small refactor of one C++ method, and the addition of a MOZ_EXPORT label.

Overkill is taking binary sample data that occupies a few MB of memory, using it to inflate a JS object that's probably a 10x less efficient represenation, then serializing that in memory to what's probably a 100x less efficient representation.  Then doing that for all gecko processes in the system (10+).  *Then* having each of those processes send their 100x inefficient serialized data to *one* process, like heat-seeking missiles, and having that one process repeat those steps for each heat-seeking missile it receives.  (Although to be fair, that support isn't on the roadmap.)

On a machine with 256MB RAM.

You understand that this approach has no hope of working for b2g, right?  I'm not asking you guys to change the design of this.  All I'm asking is that we land one small patch that will allow b2g to take advantage of this fantastically useful tool, which partners also want to use.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> Overkill of what?  The patch required here will be a small refactor of one
> C++ method, and the addition of a MOZ_EXPORT label.
> 
> Overkill is taking binary sample data that occupies a few MB of memory,
> using it to inflate a JS object that's probably a 10x less efficient
> represenation, then serializing that in memory to what's probably a 100x
> less efficient representation.

I'm telling you this is going away.

> Then doing that for all gecko processes in
> the system (10+).

I doubt it's a good idea to profile that many processes at the same time anyways.

>  *Then* having each of those processes send their 100x
> inefficient serialized data to *one* process, like heat-seeking missiles,
> and having that one process repeat those steps for each heat-seeking missile
> it receives.  (Although to be fair, that support isn't on the roadmap.)

The details of multi-process support aren't even clear at this point.

> On a machine with 256MB RAM.
> 
> You understand that this approach has no hope of working for b2g, right? 

You understand that I'm working on making remote profiling work for b2g, right?
(In reply to Mike Hommey [:glandium] from comment #9)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #8)
> > Overkill of what?  The patch required here will be a small refactor of one
> > C++ method, and the addition of a MOZ_EXPORT label.
> > 
> > Overkill is taking binary sample data that occupies a few MB of memory,
> > using it to inflate a JS object that's probably a 10x less efficient
> > represenation, then serializing that in memory to what's probably a 100x
> > less efficient representation.
> 
> I'm telling you this is going away.
> 

That wasn't my understanding.  My understanding was that we were going to continue inflating the huge JS object but then somehow serialize it to JSON in a streaming fashion.  And then the main b2g process will read all those bytes into memory, from all processes being profiled, and then hand them off to a JSON parser.  Then we'll hold onto all the JS objects in the main b2g process while they're being streamed out again.

How does that help?

> > Then doing that for all gecko processes in
> > the system (10+).
> 
> I doubt it's a good idea to profile that many processes at the same time
> anyways.
> 

It's useful, and there's no support for selective profiling, so ...

> >  *Then* having each of those processes send their 100x
> > inefficient serialized data to *one* process, like heat-seeking missiles,
> > and having that one process repeat those steps for each heat-seeking missile
> > it receives.  (Although to be fair, that support isn't on the roadmap.)
> 
> The details of multi-process support aren't even clear at this point.
> 

We have a 100% concrete plan to make this tool useful for b2g, with minimal changes to upstream code.  It's easy, there's no uncertainty.  We can continue to profile all subprocesses.  It'll be far cheaper than even the current implementation.

> > On a machine with 256MB RAM.
> > 
> > You understand that this approach has no hope of working for b2g, right? 
> 
> You understand that I'm working on making remote profiling work for b2g,
> right?

Nope, but I *really* appreciate it! :)  We should definitely not duplicate work.

However, I suggest that we make this problem much easier on ourselves.

Look at it from the perspective of a user of the tool.  I'm developer X; I want to get profile data for my app.  I don't care how the tool works, or how the data is sent around; you could encode the data into pig latin and then use carrier pigeons to transport it, for all I care.  What I *do* care about is a tool that works and gives me useful data.

JSON and the JS debugging protocol just aren't the right tools for the job of transmitting regular, binary data on a memory-constrained system.
> JSON and the JS debugging protocol just aren't the right tools for the job
> of transmitting regular, binary data on a memory-constrained system.

The point is, the receiver of that data is a desktop browser. Not any of the b2g processes. As long as we can make transmission of that data low on memory consumption (and we can), there is no problem.
In fact, before even reaching the point of being able to use the thing with the remote debugging protocol, my current work is going to make saving the profile in a file a low memory consuming process.
But that involves a nightmare of socket management.  What do we gain by doing that?

Why make the problem harder than it is?
Sorry, mid-aired.  Comment 13 is in response to comment 11.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #13)
> But that involves a nightmare of socket management.

It doesn't have to be.

> What do we gain by doing that?

A tool that can be used by people, contrary to what we have that works when it wants to, if at all, or requires manual work, that I won't qualify as being usable.
(In reply to Mike Hommey [:glandium] from comment #15)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #13)
> > But that involves a nightmare of socket management.
> 
> It doesn't have to be.
> 

You said in comment 9 that the details of multi-process support aren't even clear.  They're not to me either.

You either have one debug socket that content processes multiplex over through on-device IPC mechanisms, and ride the coattails of existing on-device process-manage code, which would be a memory hog.  Or you have a debug socket per content process being profiled.  I don't even know if we have support for socket-per-process.  With socket-per-process, you then have to manage nontrivial state in the code driving the profiling, and have to handle processes dying at absolutely any point during profiling operations.

Compare this to
 $ adb shell kill -12 `pidof b2g`
 $ adb pull /data/local/tmp/profile-data /tmp/profile-data
 $ adb shell rm -r /data/local/tmp/profile-data

> > What do we gain by doing that?
> 
> A tool that can be used by people, contrary to what we have that works when
> it wants to, if at all, or requires manual work, that I won't qualify as
> being usable.

What does that have to do with the debugging protocol?  I was specifically asking what the debugging protocol gains us technically over the vastly simpler approach of dumping to file.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #16)
> Compare this to
>  $ adb shell kill -12 `pidof b2g`

You forgot the most important step: wait an arbitrary amount of time.

>  $ adb pull /data/local/tmp/profile-data /tmp/profile-data
>  $ adb shell rm -r /data/local/tmp/profile-data

That's not really appealing for people who just want something that works, is it?
 
> > > What do we gain by doing that?
> > 
> > A tool that can be used by people, contrary to what we have that works when
> > it wants to, if at all, or requires manual work, that I won't qualify as
> > being usable.
> 
> What does that have to do with the debugging protocol?  I was specifically
> asking what the debugging protocol gains us technically over the vastly
> simpler approach of dumping to file.

It gains us that we can know in what state the profiler is, what configuration is uses ; it allows to get the profile data without relying on an arbitrary long or short sleep() waiting for the dump to be there and complete ; it allows to change the profiler configuration ; it actually allows to start and stop the profiler at arbitrary times instead of having to restart b2g with an environment variable; etc.

Sure, you can do some of those without the debugging protocol, but you won't get a great user experience out of it.
(In reply to Mike Hommey [:glandium] from comment #17)
> (In reply to Chris Jones [:cjones] [:warhammer] from comment #16)
> > Compare this to
> >  $ adb shell kill -12 `pidof b2g`
> 
> You forgot the most important step: wait an arbitrary amount of time.
> 

That's not a hard problem to solve.

> >  $ adb pull /data/local/tmp/profile-data /tmp/profile-data
> >  $ adb shell rm -r /data/local/tmp/profile-data
> 
> That's not really appealing for people who just want something that works,
> is it?
>  

It's all happening inside some tool, so those people don't care one way or another.

> > > > What do we gain by doing that?
> > > 
> > > A tool that can be used by people, contrary to what we have that works when
> > > it wants to, if at all, or requires manual work, that I won't qualify as
> > > being usable.
> > 
> > What does that have to do with the debugging protocol?  I was specifically
> > asking what the debugging protocol gains us technically over the vastly
> > simpler approach of dumping to file.
> 
> it actually
> allows to start and stop the profiler at arbitrary times instead of having
> to restart b2g with an environment variable; etc.

I seem to recall that this wasn't possible last it was discussed, with the current code.  I think it had to do with missing sample labels that are pushed on the stack before we spin the event loop.

> Sure, you can do some of those without the debugging protocol, but you won't
> get a great user experience out of it.

You can do all of this without the debug protocol.  I'm not arguing that it's not *cleaner* to use the debug protocol; I'm sure it's much cleaner.  It's a matter of which is technically more feasible.  This is extremely important given that partners need a working profiler asap.

This implementation decision has nothing to do with user experience.  The tool hides all this.
Depends on: 794206
A bit late to the bug:

(In reply to Chris Jones [:cjones] [:warhammer] from comment #0)
> Fine, the profiler authors have no reason to care about
> that configuration (it's 0% of current shipping marketshare exclcuding b2g
> dogfooders, as of today)

I certainly care about this configuration.

I think there's still one important use case for dumping profiles using signals: Being able to dump profiles when the application is hanging. I typically think people should use a debugger but people are asking for profiles instead because it provide a good execution trace.

I would love to make the profiler work as a preloaded library. It could expose symbols that application could use weakly. This would let us use the library in other applications. I've grown quite found of using this tool for understanding the execution of applications. But this isn't the use case your asking for.

As far as profile size goes I originally wanted to create a file format that was more packed but Jeff argued that this is a job better suited for a compression library and I think that's a good point. If we stream the output then we certainly wont have any problems with OOM. I'm still not sold on the JSON format since it is much bigger then the original circular binary buffer but it is trivial to read and parse and I'm not in a rush to deal with binary headaches and endianness.

For supporting the multiprocess configurations with the remote debugging protocol and discussion with glandium and panos it seems like we want to converge on having only one socket exposed and multiplex the request to the right process/thread. In the short term this may be significantly far away that we want to consider another alternative such as having a debug server on each process. Perhaps we could recommend for the OS/application to keep a small range of port free for debugging/profiling.

For controling the profiler from command line we can use either signals (the interface is limited, it can only use default features and is start/dump only but we could extend that) or soon we will be able to use the debug protocol from command line using xpcshell (not ideal requiring a ~40mb dependency for a simple script but it works and shares code).

If you really need symbols exposed then we should at least have a vague idea of what we're exposing and it's use case. If you just want to work around OOM saving and control issues I would recommend rushing the issues above instead.
(In reply to Benoit Girard (:BenWa) from comment #19)
> If you just want to work around OOM saving and control issues I would
> recommend rushing the issues above instead.

If that is really a pressing issue, I have a patch that saves the same json data to file without using js objects. It's an intermediate step in my work to streaming json over the debugging protocol, but if it's needed for yesterday, that can certainly land as-is.
(In reply to Mike Hommey [:glandium] from comment #20)
> (In reply to Benoit Girard (:BenWa) from comment #19)
> > If you just want to work around OOM saving and control issues I would
> > recommend rushing the issues above instead.
> 
> If that is really a pressing issue, I have a patch that saves the same json
> data to file without using js objects. It's an intermediate step in my work
> to streaming json over the debugging protocol, but if it's needed for
> yesterday, that can certainly land as-is.

I don't see much harm in landing that except that its going to conflict with Julian' change. If the patch is small and simple let's land it now.
(In reply to Benoit Girard (:BenWa) from comment #21)
> I don't see much harm in landing that except that its going to conflict with
> Julian' change. If the patch is small and simple let's land it now.

Why would Julian's changes touch the get_profile part? That being said, if it should be landed, i'd rather land after I'm done making it send to an nsIOutputStream, which will be more like the final implementation (currently, it uses an std::ostream).
Because Julian' change bring some much needed refactoring (code moves) and are very invasive to add support for the unwinding thread which changes how we handle saving because another thread owns the buffer.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.