Closed Bug 907280 Opened 11 years ago Closed 11 years ago

Module list cron output broken

Categories

(Socorro :: Backend, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ted, Assigned: rhelmer)

Details

(Whiteboard: [qa-])

Looks like it broke at the beginning of July. The last good file is:
https://crash-analysis.mozilla.com/crash_analysis/modulelist/20130629-modulelist.txt

Then this one and all the following ones are very broken:
https://crash-analysis.mozilla.com/crash_analysis/modulelist/20130702-modulelist.txt

The ones in between those two have a permissions issue so I can't load them.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #0)
> Looks like it broke at the beginning of July. The last good file is:
> https://crash-analysis.mozilla.com/crash_analysis/modulelist/20130629-
> modulelist.txt
> 
> Then this one and all the following ones are very broken:
> https://crash-analysis.mozilla.com/crash_analysis/modulelist/20130702-
> modulelist.txt


Investigating, thanks!


> The ones in between those two have a permissions issue so I can't load them.

This should be fixed, please let me know:
$ chmod o+r /mnt/crashanalysis/crash_analysis/modulelist/*.txt
Assignee: nobody → rhelmer
Status: NEW → ASSIGNED
The ones I couldn't access were broken too, so this is the first bad one:
https://crash-analysis.mozilla.com/crash_analysis/modulelist/20130630-modulelist.txt

The regression range here is between 2013-06-29 and 2013-06-30 then (the mtime on that 20130630 file is 02-Jul-2013 though, so presumably the code or configuration change happened before then).
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #2)
> The ones I couldn't access were broken too, so this is the first bad one:
> https://crash-analysis.mozilla.com/crash_analysis/modulelist/20130630-
> modulelist.txt
> 
> The regression range here is between 2013-06-29 and 2013-06-30 then (the
> mtime on that 20130630 file is 02-Jul-2013 though, so presumably the code or
> configuration change happened before then).

Thanks for finding the regression range. I ran this by hand on the hadoop cluster, the problem is definitely happening there (and not some later intermediate step) - I'll check the Socorro releases around that time but I don't believe the code here has changed recently. It's possible that the hadoop cluster was upgraded or that we need to rebuild some of our custom JARs if that's the case.

I am also exploring the possibility that there's a problem in the data itself, adding some debugging to the pig job.
Target Milestone: --- → 58
tmary has been looking through this with me, here's some more detail.

The pig job in comment 4 depends on:

1) the socorro-toolbox JAR from https://github.com/mozilla-metrics/socorro-toolbox/
2) the akela JAR from https://github.com/mozilla-metrics/akela

We're still using a bunch of old pig and hbase dependencies in #1, #2 has been upgraded but it's throwing a JSON exception when I attempt to use it.

I am slowly slogging through this, not sure who knows about these tools like socorro-toolbox and akela that can help, but would be much appreciated!
(In reply to Robert Helmer [:rhelmer] from comment #5)
> tmary has been looking through this with me, here's some more detail.
> 
> The pig job in comment 4 depends on:
> 
> 1) the socorro-toolbox JAR from
> https://github.com/mozilla-metrics/socorro-toolbox/
> 2) the akela JAR from https://github.com/mozilla-metrics/akela
> 
> We're still using a bunch of old pig and hbase dependencies in #1, #2 has
> been upgraded but it's throwing a JSON exception when I attempt to use it.

Specifically the exception is:

Pig Stack Trace
---------------
ERROR 2998: Unhandled internal error. com/fasterxml/jackson/core/JsonParseException

java.lang.NoClassDefFoundError: com/fasterxml/jackson/core/JsonParseException
	at java.lang.Class.forName0(Native Method)
	at java.lang.Class.forName(Class.java:247)
	at org.apache.pig.impl.PigContext.resolveClassName(PigContext.java:505)
	at org.apache.pig.impl.PigContext.getClassForAlias(PigContext.java:639)
	at org.apache.pig.parser.LogicalPlanBuilder.buildUDF(LogicalPlanBuilder.java:1402)
	at org.apache.pig.parser.LogicalPlanGenerator.func_eval(LogicalPlanGenerator.java:8381)
	at org.apache.pig.parser.LogicalPlanGenerator.projectable_expr(LogicalPlanGenerator.java:9926)
	at org.apache.pig.parser.LogicalPlanGenerator.var_expr(LogicalPlanGenerator.java:9700)
	at org.apache.pig.parser.LogicalPlanGenerator.expr(LogicalPlanGenerator.java:9051)
	at org.apache.pig.parser.LogicalPlanGenerator.flatten_generated_item(LogicalPlanGenerator.java:6973)
	at org.apache.pig.parser.LogicalPlanGenerator.generate_clause(LogicalPlanGenerator.java:15920)
	at org.apache.pig.parser.LogicalPlanGenerator.foreach_plan(LogicalPlanGenerator.java:14312)
	at org.apache.pig.parser.LogicalPlanGenerator.foreach_clause(LogicalPlanGenerator.java:14179)
	at org.apache.pig.parser.LogicalPlanGenerator.op_clause(LogicalPlanGenerator.java:1623)
	at org.apache.pig.parser.LogicalPlanGenerator.general_statement(LogicalPlanGenerator.java:799)
	at org.apache.pig.parser.LogicalPlanGenerator.statement(LogicalPlanGenerator.java:517)
	at org.apache.pig.parser.LogicalPlanGenerator.query(LogicalPlanGenerator.java:392)
	at org.apache.pig.parser.QueryParserDriver.parse(QueryParserDriver.java:184)
	at org.apache.pig.PigServer$Graph.parseQuery(PigServer.java:1600)
	at org.apache.pig.PigServer$Graph.registerQuery(PigServer.java:1547)
	at org.apache.pig.PigServer.registerQuery(PigServer.java:518)
	at org.apache.pig.tools.grunt.GruntParser.processPig(GruntParser.java:991)
	at org.apache.pig.tools.pigscript.parser.PigScriptParser.parse(PigScriptParser.java:412)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:194)
	at org.apache.pig.tools.grunt.GruntParser.parseStopOnError(GruntParser.java:170)
	at org.apache.pig.tools.grunt.Grunt.exec(Grunt.java:84)
	at org.apache.pig.Main.run(Main.java:604)
	at org.apache.pig.Main.main(Main.java:157)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
	at java.lang.reflect.Method.invoke(Method.java:597)
	at org.apache.hadoop.util.RunJar.main(RunJar.java:208)
Caused by: java.lang.ClassNotFoundException: com.fasterxml.jackson.core.JsonParseException
	at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
	at java.security.AccessController.doPrivileged(Native Method)
	at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
	at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
	... 33 more
OK harsha helped me track this down - we're getting (hadoop-specific) snappy compressed data in the output - I suspected this before and tried uncompressing with https://code.google.com/p/snappy/ but didn't know that hadoop's snappy impl is incompatible :/

Workaround is pretty simple, at least for the moment:

SET mapred.output.compress false;

This is now fixed in-place, I am going to start backfilling now, and file some bugs to fix this all up in the meantime.
Ran a backfill yesterday using one of the cleaned up module lists: "Uploaded 12000 symbol files".
Target Milestone: 58 → 59
OK this is fixed up, but is running out my my homedir and crontab on sp-admin01. Also, we need to come up with a sane way to deploy code to the cherry-gw hadoop server.

These things should have happened in bug 880048, going to followup there.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [qa-]
You need to log in before you can comment on or make changes to this bug.