tarasglek: All Posts
Posted on 2012-04-30
Done:
- bug 748417: python script to generate metrics-friendly json from our histogram definitions
- lots of reviews on Yoric's File API
- bug 743877: poked at tab delays due to settimeouts
Posted on 2011-11-07
Done:
- bug 699942: made Yes into a bigger button
- bug 697860: simplified tri-state, got rid of [x]
- Requested approval on landing 688223, 697860, 691951 on aurora. But looks like we'll be landing on beta instead. Fun.
- spent a couple days looking at making gold spit out section names for arm trampolines. Failed.
- interviews, 1:1s, etc
Next:
- land 688223, 697860, 691951 on beta. Enjoy awesome telemetry prompting in ff9/10.
- Move on from telemetry prompting to misc telemetry bugs
Posted on 2011-10-31
Done:
- Telemetry ui discussions
- bug 697860: tri-state patch for telemetry
- bug 697724: feedback-ed metrics ping wip
- bug 690585: landed startup interruptions
- checked in a few patches for people
Next:
- telemetry ui(or backout) stuff
- help with incremental decompression
Posted on 2011-10-25
Done:
- bug 690585: land startup-interrupted telemetry
- bug 668392: relanded addons+persona telemetry
- demoed fast-start
- Tried to shepherd telemetry ui into getting fixed(no progress)
Next:
- push telemetry ui stuff
- help with incremental decompression
Posted on 2011-10-10
Done:
- bug 689256: had some friday fun diagnosing this mind-bending orange :)
- bug 688223 landed telemetry reprompt
- bug 691951, bug 691951: filed + discussed our telemetry prompting and how it's broken. Can't get any traction on getting this fixed.
- bug 692979 discussed cache telemetry (and how crap our cache perf is).
- bug 481815, discussed how our updater service can improve startup speed with bbondy. he filed followups on defrag, prefetch nuking
- bug 689142 discussed/reviewed mak's solution to capture places stats in telemetry
Next:
- pray that our faststart prototype is ready for next week (incr decompression, async sqlite, fsync NOP, etc)
- bug 690585: land startup-interrupted telemetry
Posted on 2011-10-03
Done:
- bug 688223 - Telemetry opt-in change r+ed
- bug 296795 - landed for a community contributor
- 1/1s + goals
Next:
- bug 688104 - figure out MOZGFXOPTIMIZE_MOBILE perf review
- wrap up goals
- bug 688223 - land telemetry opt in change
Posted on 2011-09-26
Done:
- Caught up with email + review queue
- a-OKed telemetry for ff7
- poked at DXR contracting
- blogged about ff7 startup + telemetry
Next:
- Bug 688223 Re-prompt user for telemetry opt-in if privacy policy changes (edit)
- Figure out MOZGFXOPTIMIZE_MOBILE benchmarking needed if any
Posted on 2011-09-19
Done:
- Plumbers
- Allhands
Next:
- Catch up with email backlog
Posted on 2011-08-22
Done:
- Bug 680197 - link to the wiki from telemetry code
- bug 680508 - disable telemetry gathering for non-chrome
- Work on changing telemetry prompting
- Connected with linaro technical stuff
- Worked with metrics team on narrowing down their requirements
- dxr planning
Next:
- flesh out dxr ui plans
- more telemetry prodding and pushing
Posted on 2011-08-15
Done:
- bug 668392 - landed
- bug 678085 - landed
- Started a thread on dev.platform "Idea: No patch left behind"
Next:
- misc telemetry bugs
Posted on 2011-08-10
Done:
- bug 668392 - Waiting on Mossop's r+
- telemetry ui discussions with metrics people
- android dev rom stuff
Next:
- more of the same
Posted on 2011-08-01
Done:
- bug 666309 - Fixed some casting issues in telemetry that need to be landed
- Met with johnath on Telemetry + IO work.
- bug 673727 - landed tracking thread-io-waits due to sqlite
- Working on setting a mozilla development ROM with some help from community
Next:
- Wrap up sqlite telemetry
- Get mozrom going
Posted on 2011-07-25
Done:
- bug 668378 - Landed sqlite io telemetry!
- Refreshed about:telemetry addon at https://addons.mozilla.org/en-US/firefox/addon/abouttelemetry/
- Discussed startup work with mobile
- Bug 672651 - Landed - Track cache init times via telemetry
Next:
Figure out the next set of telemetry tasks:
- stuff to help metrics team
- stuff for mobile team
Posted on 2011-07-19
Done:
bug 668378 - sqlite io telemetry
Next:
bug 668378 - sqlite io telemetry
Posted on 2011-07-11
Done:
- Spent time doing elastic search and pig queries on Telemetry data
- bug 670008 to deal with invalid data being reported
Next:
- Start on telemetrying io
- wrap up bug 670008
Posted on 2011-07-05
Done:
- Q3 goals
- bug 666309 - Support boolean/double jsvals passed to telemetry
- bug 666707 - Land shutdown success tracking
- bug 661573 - Telemetry private mode
- Bug 653936 - landed leak fix in StartupCache
- bug 668355 - landed plugin enumeration telemetry
- bug 668312 - landed a fix so only registered histograms are reported via telemetry
Next:
3 day week. * Learn to access telemetry data in metrics hbase
Posted on 2011-06-27
Done:
Deployed telemetry =D bug 661573 - Private mode awaiting review bug 664486 - Reviewed page fault reporting work bug 653831 - fancy shutdown measurement WIP bug 653936 - Fixed memory leak, awaiting review bug 666707 - Added a probe to report unsuccessful shutdown rate(similar to bug 653831, but more naive, should be landed sooner) bug 665805 - Adjusted telemetry to work with official metrics server
Next:
- Figure out Q3 goals
- bug 666309 - Support boolean/double jsvals passed to telemetry
- bug 666707 - Land shutdown success tracking
Posted on 2011-06-20
Done:
- bug 661574 - landed - Telemetry: Create a histogram directory
- bug 661573 - Finishing up private mode still
- Got security r+ for telemetry
- Got r+ from privacy people
Next:
- bug 664845: Waiting on webdev to update privacy policy :(
- bug 661573: Land telemetry private mode
- bug 652657: Land telemetry ui
- Figure out why metrics telemetry server is rejecting our data
- Write docs, blog posts, etc to promote telemetry usage
Posted on 2011-06-15
Done:
This is late.
- Spent last week in mountain view: telemetry meetings with metrics people, Nathan Froyd, privacy people, sicking + William regarding refactoring, jcranmer on DXR
- bug 661574: histogram telemetry stuff is almost done reviewing
- bug 661573: no telemetry in private mode almost done too
- Telemetry privacy policy ready to go(waiting on liz)
Posted on 2011-06-01
Done:
- PTO
Next:
- Continue tying up telemetry loose ends
Posted on 2011-05-23
Done:
- Telemetry security review, telemetry feature review,
- Reviewed glandium's preloading stuff
- Fixed telemetry idle-daily + cleaned API (bug 657411,bug 657480,bug 657709)
- bug 652657: helped glandium push telemetry UI a little
Next:
- Plan out telemetry in FF7
- vacation till May25-June
Posted on 2011-05-16
Done:
- bug 585196: Landed telemetry. Blogged it http://blog.mozilla.com/tglek/2011/05/13/firefox-telemetry/
- bug 627591: bug closed for now. glandium going to land a better version in bug 632404.
Next:
- Shepard remaining telemetry bits to completion.
- Start integrating about:telemetry in(hopefully land it)
- Land more useful telemetry probes
Posted on 2011-05-10
Done:
bug 627591: Landed preloading bug 585196: Telemetry review got delayed..posted another patch today
Next:
bug 585196: Land telemetry(pref-ed off) bug 627591: Preloading seems to not be functioning correctly. Testing to figure out why
Posted on 2011-05-03
Done:
- Test server up
- bug 649502 landed =D
- bug 585196 client-side Telemetry reporting posted for review
Next:
- bug 585196: Land telemetry(pref-ed off)
- bug 627591: Respin the preloading patch and get it landed.
Posted on 2011-04-25
Done:
- bug 649502: Imported new histogram code from chromium. Got most reviews, one more to land.
- bug 649502: Exported histograms via JS so they can be easily gathered from JS side(and tested). Waiting on review
- Listed Perf goals for Q2
- interview
- met with Alice to discuss talos future
Next:
- Wrap up histograms into a telemetry addon, deploy a test server
- Land 649502(optimistic)
Posted on 2011-04-18
Done:
- workweek.
- bug 585196: Focused on getting telemetry going.
Next:
- bug 585196: Start getting reviews on telemetry
- Set team-perf goals
Posted on 2011-04-01
Done:
- bug 637286: Diagnosing/fixing a freeze in jar code took a while
- Perf-measurement planning: talos stuff + telemetry
- bug 481815: Made us of windows pipes to get ondemand silent updating machinery to work. Now just need to figure out why firefox isn't calling my updater
Next:
- onsite
- Hopefully get basic telemetry going
Posted on 2011-03-28
Done:
- Chatted to various people about memory telemetry. Added to https://wiki.mozilla.org/Performance/MemShrink
- bug 481815: Reached proof of concept stage. Took way too long due to learning Windows + updating details. On my machines Firefox is now silent-updated =D. Currently it updates on windows startup, which totally ruins windows startup.
Next:
- bug 481815: Figure out the fancy windows footwork to allow the updater service to be updated. Get updater to talk to the service so the update can happen ondemand instead of on windows startup.
Posted on 2011-03-21
Done:
- bug 600713: Helped verify mwu's font cache patch
- bug 641691: Filed first bug regarding "delayed init" code not actually delaying until after first paint. bug 612190 will provide us with a ts replacement bug
- some reviews of glandium's preload stuff
- bug 481815: More code reading/planning on windows update logic
Next:
- bug 481815: get basic updating working. Investigate what(if any) changes are needed to make updater run without UI. What kind of data race issues we'll run into if user tries to start firefox during an update, etc.
Posted on 2011-03-14
Done:
- bug 627591: Released "Start Faster" addon on AMO. blogged about it on https://blog.mozilla.com/tglek/2011/03/10/start-faster-addon/ to get some testing.
- bug 633615: Confirmed that microsoft update fixes this. Infact, starting firefox with directwrite backend is now faster than with gdi
- some misc reviews
- bug 481815: Discussed with rstrong how updating would work from a service. Studied the code. I think I have some idea on how to implement it now. spent a few hours on friday porting prbool checker to llvm.
Next:
- bug 641614: bhearsum helped me realize that omni.jar doesn't get ordered on non en-US l10n repacks :(
- investigate why we paint later than expected
- bug 481815: continue with service + updating
Posted on 2011-03-07
Done:
bug 627591: Finished admin service prototype, integrated it into an addon. Installing the addon does seem to halve startup speed for some people. Haven't gotten any negative data yet. reviewed bugs 611163,637341 Bug 637461: fixed data race in StartupCache.cpp Tried chrome's .exe reordering on Firefox
Next:
bug 627591: wrap up the startup-speedup addon enough for AMO deployment(almost there) bug 585196: Hopefully deploy a telemetry server + deploy addon for it on AMO.
Posted on 2011-02-28
Done:
bug 585196: got a prototype addon + server for telemetry work. Waiting on place to deploy it. Might integrate it with zippity(http://starkravingfinkle.org/blog/2011/02/zippity-using-the-crowd-to-collect-performance-data/) bug 627591: Worked on a an admin service(bug 481815) that clears out prefetch catches. Should have an addon that does this ready soon.
Next:
more bug 627591 and bug 585196
Posted on 2011-02-22
Done:
bug 627591: Agreed with Rob Strong that the only way to deal with prefetch is to add prefetch cleaning + preloading functionality into a windows service running as administrator(ie bug 481815).
bug 585196: Have a plan on getting telemetry going, got prototype server + clientside addon.
Next:
bug 585196: get telemetry deployed bug 481815: Look into getting an admin service going
Posted on 2011-02-14
Done:
bug 627591: Have an approach that is an overall win bug 632177: No progress bug 632526: Fixed startup time reporting on android reviewed elfhack fixes by glandium
Next:
- bug 627591: Work out how(if?) preload could land in Firefox
- bug 632177: Investigate where 25mb of memory went on android
- See about reducing file IO on windows to not confuse prefetch so much
Posted on 2011-02-08
Done:
bug 627591: Have an approach that should not cause regressions on computers that regressed...still speedup on others. bug 618912: experimented with an html chrome on fennec. It's fast. Discovered that currently the full chrome for various reasons causes 50% more memory use(25mb) so filed bug 632177, to track the investigation. So far mimetypes turned out to suck up 6seconds of startup, almost 10mb of ram. Next need to figure out why fennec is loading the webkit libraries :)
Next:
bug 627591: still waiting on some data here. bug 632177: Reduce firefox memory usage by another 15mb..hopefully :)
Posted on 2011-01-31
Done:
two weeks ago:
- landed bug 586859
- bug 588873 was deemed too risky for ff4 by bsmedberg
- bug 626814: Poked at a few startup cache design issues in fennec
- bug 620931:reviewed xulrunner omnijar bits from glandium
- bug 625612: landed some startup diagnostics for android
- bug 466445: Contacted code-sourcery regarding some gcc issues
- bug 627591: figured out how to halve cold startup time on windows!
last week:
- spent 99% of my time on bug 627591. It's a super-frustrating hack. Works great except for users with ridiculously slow harddrives.
- reviewed a bunch of elfhack fixes(628618,628627,628232)
Next:
Figure out what to do with the preloading hack
Posted on 2011-01-18
Done:
- bug 522375: landed startup-time measurement =D and followup fixups such as bug 625478.
- bug 608042: Reviewed patches, debugged library loading performance on android
- startup comparison vs 3.6/chrome on Windows.
Next:
my todos from previous week since those got bumped by higher priority stuff: * Resolve bugs that failed to land completely: bug 588873 and bug 586859
Posted on 2011-01-10
Done:
- bug 586859: landed non-threading part of this. Threading part possibly caused an intermittent shutdown crash, got backed out.
- above made bug 593349 go away. No more extreme fragmentation.
- bug 562406: Landed omnijar startup cache(but didn't enable). This will be useful for fennec
- bug 588873: this bounced pretty badly. Need to spend more quality time with it :(
- Worked with Sheila(she did most of the work) on figuring how to blacklist addons. Sheila brought bug 593743 to my attention as at-risk and needed for addon blocking to proceed.
So I took the good parts of 593743 and reworked them into:
- bug 522375 which gathers core startup info and exposes it to js. Now we just need bug 623950 to have a good picture of startup and addons that influence it.
- misc android startup investigations
Next:
- Try to get startup-measurement into ff4
- Resolve bugs that failed to land completely
Posted on 2011-01-04
Done:
- blogged https://blog.mozilla.com/tglek/2010/12/29/faster-plugin-enumeration-help-wanted/
- bug 585196: Started working on telemetry again. Figured out some good startup hooks. Got derailed by measuring startup time within a process. glandium helped me a lot there. Settled on a pretty precise measurement.
- Setup android toolchain
- tested my startup measuring code on android.
- got distracted by things I found, spent time looking on android startup + mem usage
- bug 622723: use ashmem on android. Made our memory backed by files..ie swap.
- got my reviews =D
Next:
- bug 586859, bug 562406 land various startup cache things that got review
- bug 588873: land fasl fix, hope it doesn't break things again
- wrap up brief android digression
- figure out what to focus on next. ie fennec startup perf, perf tests, ....? lots of huge tasks to choose from.
Posted on 2010-12-27
Done:
- bug 620534: landed mac plugin enumeration slowness fix
- bug 606145: reviewed glandium's elfhack
- bug 621580: confirmed that java plugin causes bonus io blogged about the font mess https://blog.mozilla.com/tglek/2010/12/21/rude-surprise-startup-overhead-of-windows-font-apis/
- bug 617048: submitted updated patch for approval
Next:
Waiting on dwitte/bsmedberg/cjones on final reviews. Misc startup investigations in meantime. Something along the lines of https://docs.google.com/document/pub?id=1-Pk-mPFJS5tPdnIXQ78VyU7yfxxJki2eSf_twr8Xl9M
Posted on 2010-12-21
Done:
- bug 616271: tried to hack on urlclassifier vacuum during workweek, got lost in qi
- workweek: l10n/l20n planning, font discussions, etc
- bug 620103 + 620114: Fixed damage caused by plugin-enumeration work in bug 616271
Next:
- bug 620534: finish up plugin enumeration fixups(mac still sucks at it)
- xmas
Posted on 2010-12-13
Done:
- Bug 614423 - plugin enumeration speedups landed.
- bug 618912 - WIP: urlclassifier enumeration
Next:
- bug 588873 - ugly crashyness
- workweek
- maybe follow up on review of startup cache bugs + land em.
Posted on 2010-12-06
Done:
- bug 616271: Debugged plugin enumeration performance + suggested workaround
- bug 602792: Made a dwrite-less build to figure out how a fix will perform(much better)
- bug 588873: Posted new patch, landed it. Ended up fixing the bug AND speeding up startup =D
- bug 616256: Dont stat() every file on directory listing...helps plugin enumeration
- bug 609785: Reworked jsloader startup-cache name mangling patch
- Finally heard from AVG, replied to AVG guy about their plugin perf. Doesn't sound promising.
Next:
- Got a lot of outstanding patches wating on review.
- Wrap up plugin perf
- Investigate a few other obvious startup suspects(url-classifier + sync)
Posted on 2010-11-29
Done:
bug 614423: Plugin enumeration. Fixed up functiontimers enough to diagnose this... Figured out the cause.
Next:
bug 614423: fix it or get someone else to fix it
Posted on 2010-11-22
Done:
bug 595812: Discussed l20n design bug 600713: Experimented with workarounds, looked at chrome code bug 559964: Reproduced sunspider hang on trunk. Lowering optimization flags did not fix it. Still hoping the problem goes away once bug 609543 is fixed. Diagnosed some plugin-related startup perf issues. sgreenlay filed/fixed bug 613679 as a result.
Next:
Fix functiontimer bugs so I can better debug plugin-related startup slowdowns.
Posted on 2010-11-15
Done:
- bug 586859 startup-cache-on-a-thread: got my first feedbacks
- bug 562406 omnijar-startup cache submitted for review
- bug 588873: exceptioninpage catching: Landed! Turned out my failures were random oranges(for some reason these were rather consistent on try for a period)
- bug 559964: GCC 4.5 ended in tears again. This time it tripped up a spidermonkey hang, which also seems to happen on tracemonkey tree:bug 609543. Will wait until that bug is resolved. Need to play with 4.5 flags a bit more.
- bug 611837: Turns out we f*sync() a lot on first startup. Investigating how much of an overhead that is.
Next:
- Address review comments on 586859, 562406
- bug 559964 on try. Try playing with compilers flags some more.
- bug 612131: fix mobile crash
Posted on 2010-11-08
Done:
- bug 586859 startup-cache-on-a-thread: posted patches for review
- Talked to Kev about blacklisting AVG addons, still no word from AVG...grr
- bug 562406 omnijar-startup cache: worked out remaining design issues in bug 609785. mwu measured a 5% improvement(expecting a bigger win desktop..and not just for first startup) in first startup on android. Need to spend a few minutes getting sync-url caching working and submit for review.
- bug 610040 fragmentation avoidance strategy happens to cause an OOM. Posted a fix.
Next:
bug 588873: figure out crashyness on try... Gonna give this one last try. bug 562406: Finish this.
Posted on 2010-11-02
Done:
- Vacation
- bug 588873: Try server reports issues with try/catching memory exceptions. Chasing my tail on this. Spent a few days on it so far, no idea what is causing issues yet..will debug more later
- bug 586859: Moved startup cache writing to a thread...doing a bunch of optimizations while at it too.
- Investigated anti-virus performance degradation some more. I think we need to actively block antivirus crap. Contacted Kev about it. Need to investigate more
Next:
- bug 588873: figure out crashyness on try
- bug 586859: get off-main-thread startup cache reviewed
- bug 562406: ponder whether to push startup cache omnijaring for ff4
Posted on 2010-10-19
Done:
bug 593614: Confirmed that the slowness was indeed due to fonts. Worked on windows cold startup measurement. bug 562406: startup cache omnijaring works on try. bug 501563: Created a tracking bug for improvements needed to startup-measuring infrastructure
Next:
bug 588873: Try server reports issues with try/catching memory exceptions. Need to investigate more Look at leftover benh bugs
Posted on 2010-10-11
Done:
- Try server reports issues with try/catching memory exceptions. Need to investigate more bug 588873.
- Investigated xperf logs of slow startup in bug 593614.
- bug 600713: Diagnosed cause
- bug 588607: reviewed mwu's follow library loader patch
- bug 502176: reviewed megapatch
- bug 595924: inherited this from benedict, posted new patch.
- bug 562406: Polished up startup cache omnijaring. It almost works on windows now
- Touched up my parts of gcc paper
Next:
- bug 562406: Put up a startup cache patch for review, produce builds on try.
- bug 588873: Try server reports issues with try/catching memory exceptions. Need to investigate more
I took over a bunch of benect's bugs. Would be nice to make some progress there.
Posted on 2010-10-04
Done:
bug 593614: Spent a lot of time studying why some people are suffering from terrible startup. Filed bug 601682(minor), bug 600713(major). Firefox sync contributions are still worth investigating. bug 588873: fastload, tried adding try/catch, which confused try-server. bug 588607: r-ed mwu's library loader on some minor issues Some gcc paper work
Next:
bug 600713: help fix this bug 562406: get some progress on startup cache omnijaring
Posted on 2010-09-28
Done:
I wrote up a big status report yesterday and failed to submit it :(
- bug 559964: Confirmed that gcc 4.5(with our gcc fix + mozilla fix) passes try.
- bug 598416: sprinkle mmap jar code with exception handling on windows. submitted patch, got r+, waiting on blocker status to land it
- bug 562406: omnijaring startup cache, got a reasonable approximation of what the final patch would look like
- bug 592422: reviewed preallocation of individual cache files
- bug 478129: reviewed size limiting of preallocated aggregated cache files
- bug 597702: reviewed nested jar crashfix
- Worked on gcc summit paper
Next:
- Some gcc paper finishing
- bug 562406: finish startup cache omnijaring
- bug 588607: review mwu's library loader hack
- bug 593614: work with community member on diagnosing ridiculously slow startups
- bug 559964: Do social engineering to get gcc 4.5 + higher optimization flags + pgo landed
Posted on 2010-09-20
Done:
- Workweek
- bug 592520 landed, no more massive disk cache fragmentation. Smaller fragmentation issues remain. Jst driving the rest of the issues.
- bug 596429 landed fasl crash fix.
- bug 594172 landed. Seriously reduced startup cache fragmentation.
- Poked various gcc and mozilla people to fix remaining gcc 4.5 issues. It appears that bugs are now fixed in both products. I'm back to waiting on bug 559964, ie turning on gcc 4.5 for trunk.
- bug 562406: got a proof of concept working.
- bug 595473: Triplechecked that some commonly available zip software an cope with optimized jar layout.
- blogged about jar work going into ff4 http://blog.mozilla.com/tglek/2010/09/14/firefox-4-jar-jar-jar/
Next:
- bug 562406: Produce a useful patch for startup cache omnijaring.
- Work on gcc paper
Posted on 2010-09-13
Done:
bug 589368: Get approval/land l10n repacking bug 590242: Land not opening of omnijar 3x bug 593349: Figure out a workaround for startup cache fragmentation ....got patch waiting for review in bug 594172 bug 595473: Briefly researched zip programs that work with optimized jar format. bug 594611: Finally did investigations of what fails on gcc 4.5(just ctypes).
Next:
Workweek Play with pre-generating startup cache during pgo bug 594172: land it
Posted on 2010-09-07
Done:
bug 581606: Landed less-sqlite-fragmentation Blogged http://blog.mozilla.com/tglek/2010/09/07/fighting-fragmentation-sqlite/ bug 592520: do not fragment the hell out of cache Did a lot of measuring/studying of cache fragmentation. Summarized in http://groups.google.com/group/mozilla.dev.platform/browse_thread/thread/4f0a51d6720f614e
Next:
bug 589368: Get approval/land l10n repacking bug 590242: Land not opening of omnijar 3x bug 593349: Figure out a workaround for startup cache fragmentation
Posted on 2010-08-30
Done:
- bug 589368: "Few wrinkles" took a bit longer than I expected. Posted a patch for review.
- bug 581606: Did some more polish on less-sqlite-fragmentation patch. Ready for landing.
- bug 533038: Reviewed mwu's jar changes
- Bug 590242: Do not open omnijar 3x. got r+
Next:
- Land various bits from above as I get approvals.
- Investigate why 4.5 is so busted
- Perhaps finally get to telemetry
Posted on 2010-08-23
Done:
- bug 559964: build team to switched on 4.5.1. Things broke even more than with 4.5.
- bug 589368: jar repacking for l10n. Mostly done, few wrinkles left to iron out.
- Landed jar reordering. Still need to make it useful with omnijar+locales(see above)
- startup reports for Damon.
- Briefly fought with Talos. Have no idea why it only occasionally works for me.
- bug 588873: Poked at a fastload crash. The crash seems very real, but it shouldn't happen.
Next:
- Finish l10n repack
- Investigate why 4.5 is so busted
Posted on 2010-08-16
Done:
bug 580407 (static js link) kicked my butt. Spent a lot of time on ironing out last-minute issues and landing this. Most blood/sweat/tears per line of code changed ever. It landed and appears to have stuck. Yay for less pointless pagefaults/library-loading overhead.
bug 559961 Got jars to reorder as part of PGO. works with omnijar/etc. Waiting on review.
Next:
- Some mix of l10n repacks for jars + telemetry work
- Do some LTO benchmarks. Committed to coauthoring a paper for the GCC summit on this work, so better get to it finally.
Posted on 2010-08-10
Done:
bug 559961: jar reordering. Almost finished.Wrote a new jar repacker in python This appears to reduce jar IO by over 50% on Linux. Should be significant elsewhere too.
Didn't do much LTO work.
Next:
- Wrap up bug 559961, make sure it fits well with omnijar.
- bug 585196: Cook up a telemetry prototype
- Do some LTO benchmarks
Posted on 2010-08-03
Done:
Reviewed SQLite fragmentation work. They didn't take my patch, making an equivalent a fix at a lower-level. Seems ok. Finding it difficult to communicate with sqlite devs.
Tried to use talos to benchmark stuff, gave up for now
Tried to make use of Windows linker map files to better visualize startup. http://blog.mozilla.com/tglek/2010/07/30/msvc-static-initializers-decent-stuff/
bug 559961: mwu convinced me that jar reordering is a Win. I tried a proof of concept and looks like we can easily get a 30% reduction in jar pagefaults. Can probably push that to 50% or more.
Next:
bug 559961: jar reordering
Still doing GCC LTO work in spare cycles. Haven't done much last week.
Posted on 2010-07-27
Done:
- Diagnosed the fragmentation epidemic: http://blog.mozilla.com/tglek/2010/07/22/file-fragmentation/
- Patched it: bug 581606
- Folded js into libxul: bug 580407
Marked important things as blockers.
Got LTO firefox builds out of GCC =D
Next:
- Convince sqlite people to land fragmentation fix.
- Land it in Mozilla
- Finish up remaining startup bugs
Posted on 2010-07-20
Done:
bug 561842: Prototype: made a single firefox binary that does not dlopen anything, disabled -fPIC. For whatever reason it's slower and crashier. bug 579579: Prototype: folded gnome components in for a moderate speedup.
bug 577522: getting rid of xpcom.so dep in firefox-bin only postphones xpcom.so from loading since browser components depend on it.
Investigated folding in mozalloc.
Got further along on building Mozilla with gcc 4.6+lto. Feels like Honza is committing at least a bugfix per day as we work through bustage in Mozilla.
Next:
Fold js into libxul
Posted on 2010-07-13
Done:
Whistler is over :(
Minimally coached gcc's Honza on building Firefox. He fixed a good number of LTO bugs that Firefox triggered. Both his speed progress and the initial results(like 30% of functions being eliminated as dead code) sound impressive.
Filed Bug 577312 for B&R team after discussing how to benchmark gcc with mozilla at the summit.
Bug 577522 - Got a patch that gets rid of libxpcom.so/xpcom.dll/... That library is useless for Firefox and not loading it improves our ts scores on all platforms.
Next:
bug 561842. Would be nice if we can end up with a fat libxul or static firefox.exe for Firefox 4.
Trying to replicate Honza's result so I can benchmark it on the way to Bug 577312
Posted on 2010-07-07
Done:
Bug 416330 [suboptimal sqlite pagesize] Landed 32K!
bug 559964 [GCC 4.5 switch] I spent a bunch of time diagnosing 4.5 slowdowns. Reached out to gcc devs, still trying to diagnose what went wrong at -Os. Compiling with -O2 gets us to go faster, but with a more bloated slower starting binary.
I landed a workaround patch that should restore old -Os performance characteristics (ie not be a regression from 4.3).
bug 577312 [Custom compilers on moz build infrastructure]. While discussing our 4.5 problems with gcc devs we realized that the best way to avoid these to use Mozilla as a benchmark for the C++ compiler. This will ensure that GCC is tuned for our needs and we can make use of new GCC features as soon as they become available.
Next:
Summitry in Whistler.
Posted on 2010-06-25
Done:
Bug 416330 [suboptimal sqlite pagesize] Spent most of my time this week studying why sqlite io sucks so bad. Got a patch landed, backed out. Corrected patch ready for checkin.
Bug 572460 [vacuum sqlite dbs] Filed this bug to get a generic approach to vacuuming sqlite dbs.
bug 559964 [GCC 4.5 switch] This got turned on Jun 23rd, caused massive regressions in everything(back to 4.3 for now). Reached out to GCC guys about this, looks like help is on the way once I help em reproduce this.
Tried to setup standalone talos to experiment with gcc optimization options, failed to get it to run to completion.
Next:
Bug 572460 [vacuum sqlite dbs] Should get a proof of concepts next week, perhaps even something landable.
bug 559964 [GCC 4.5 switch] Get a working version of standalone talos + way to build mozilla for gcc folks.
Posted on 2010-06-18
Done:
Was in MV this week. Lots of talk, little code.
Steve Fink is making progress with bug 558200 [extension badness profiler].
Met with chrome guys to discuss binary ordering on windows.
Icegrind is feature-complete and pretty much production-quality now. Tried to build my own ld.so to see if I could fix the linker to not do io inefficiently, failed to build a non-crashy ld.so, moved on.
Reviewed more hydra patches by Ehren.
bug 572459 [bad io patterns]. Filed a tracking bug to keep track of bad io patterns...Mostly sqlite misbehavior for now. More to follow.
bug 416330 [suboptimal sqlite page size]. Confirmed that it indeed is suboptimal. Changing it to from 1-4K to 32K yielded 20-30% startup improvements, 70% less io syscalls, etc.
Met with chrome guys to discuss windows binary ordering, etc. They do good low level windows stuff, http://code.google.com/p/sawbuck/ sounds like a useful logging tool.
bug 569629 [getting rid of static initializers]. Did a platform post about it. Sounds like people are keen to rid us of em.
Next:
Figure out some stopgap measure for sqlite io suck.
bug 553721: Try windows ordering again. Didn't realize ms linker has map files too. This should make it easier to debug what the heck is going on.
Figure out the build-foo to integrate icegrind.
Posted on 2010-06-08
Done:
Was at osbridge for 3/5 days last week.
bug 570195: [hasOwnProperty busts Dehydra] Tracked down hasOwnProperty bug in spidermonkey (with jorendorff's help) bug 549749: [icegrind] Figured out how to generation .sections file from the linker map. Misc icegrind fixes. bug 569137: [pgo workaround] One file was triggering gcc pgo bugs, disabled that.
reviewed various Dehydra patches by Ehren, it's good stuff.
Gave a talk at http://www.galois.com/blog/2010/06/03/tech-talk-large-scale-static-analysis-at-mozilla/
Filed bug 569629 to host Mike Hommey's startup patches.
Next:
Mike Hommey has been playing with icegrind, found some easy to eliminate relocations. Need to learn about that.
bug 531886 [invalidating startup caches] <-- will wait on this till I see how desktop omnijar turns out(bug 556644).
Will aim to wrap up binary-reordering research into patches. Expect to land some sort of linker hacks + some variant of madvise hack in bug 554421.
Posted on 2010-06-01
Done:
bug 559964 [Install GCC 4.5 on linux VMs]: Landed bug 569137, last remaining pgo-blocker. Confirmed Rail's pgo builds, just waiting for the build team to flip the 4.5+pgo switch.
Blogged about the reverse IO caused by constructors. Mike Hommey contributed patches that alleviate the problem of having too many global initializers.
bug 554421 [madvise hack]: combined with reversing the order of constructors, this produces a 30-40% startup speed up on harddrives. I think this is the cure for making bug 561842 fast.
bug 531886 [invalidating startup caches]: Reached consensus on how resolving this. Need to get cranking on resolving this.
Diagnosed bug 415563 as a gcc 4.3 bug. Switching to 4.5 will fix it =D
Next:
Measure impact of reducing the number of global initializers(ie Mike Hommey's patches).
Attending opensourcebridge.org.
Prepare galois talk.
bug 531886 [invalidating startup caches], might get to it this week.
Posted on 2010-05-24
Done:
bug 524201 [Move browserconfig.properties to a jar]: Checked in! This was the last remaining easy jar move
bug 559964 [Install GCC 4.5 on linux VMs]: Figured out the right incantation to get -fPIC libstdc++. Waiting on Rail to confirm.
bug 561842 [fold every library into libxul.so]: Still investigating performance, getting close to figuring out why things are slow. Souped up my systemtap iologger with some gdb scripting to automagically identify causes of page faults. Once I did that a couple of causes/fixes of slow startup became really obvious, need to debug em now. Will blog details this week. Will likely have to do some basic gcc/linker hacking for some easy wins on all linux platforms.
bug 566686 [Provide a decompression API in nsZipArchive] mwu needs an efficient api to get at JAR data for bug 552121. After some iteration, I think we have a useful+efficient api. Waiting on Alfred to review.
Did up an abstract for my June 8 talk at a Portland static analysis place. http://www.galois.com/blog/ .
Next:
bug 561842 [fold every library into libxul.so]: Getting really close to figuring out how to make this configuration as fast as it should be.
Posted on 2010-05-18
Done:
Bug 559964 [Install GCC 4.5 on linux VMs]: Tracking down remaining issues with GCC PGO. For some reason the custom flags on build machines aren't getting GCC stdc++ built with -fPIC. This is bad because it results in .text relocations which makes prelink angry (and likely SELinux).
For some reason GCC pgo does not result in a good binary layout for libxul.so(it seemed to layout a static firefox-bin awesomely). Still investigating.
Next:
Resolve remaining issues in bug 559964.
Posted on 2010-05-10
Done:
bug 553721: Windows code layout: Got this working on Mozilla. Turned my assembly was causing crashes(as was every other _penter example on the net), got some noncrashy asm going. Unfortunately, the Microsoft linker adheres to the -order:@ option resulting in a somewhat-ordered binary. This combined with puny windows readahead means that there is no significant improvement in the startup paging pattern. bug 418866 [PGO on linux]: Landed! Bug 563742 [Efficient ctypes API for file handling] started drafting an API: https://wiki.mozilla.org/JSFileApi
Next:
bug 553721 [Windows code layout] Figure out if it is a win worth pursuing. Bug 561842 [fold all remaining shared libs into libxul]: Test with above work. Perhaps this patch will make above work shine. Figure out why this isn't isn't a win of Linux(if 564511 & 564851: passing pgo flags) doesn't help. Bug 563742 [ctypes file api] time-permitting submit a proposal.
Posted on 2010-05-04
Done:
bug 553721: Windows code layout: figured out the assembly+APIs that I need to accomplish this in a testcase. Made some progress in integrating this into the build system by imitating blassey's approach in the wince profiling patch. Unfortunately, xpidl crashes now, will be looking into finishing integration this week. bug 418866 [PGO on linux]: submitted patch for ted's review Bug 561236 [push_back() causes dependency on libstdc++ from gcc4.5] tracked down cause of this. Looks like patching the compiler may be the easiest workaround there :(
Next:
bug 553721 [Windows code layout] slowly, but surely getting somewhere. Bug 561236 [push_back() causes dependency on libstdc++ from gcc4.5]: Close to having a gcc patch
Posted on 2010-04-26
Done:
bug 418866 [PGO on linux]: Got PGO going on 4.5 and x8664. Bug 560897: This was keeping gcov from working on x8664 Bug 560095: Helped Mitch a bit with mozilla::services switchover bug 553721: Windows code layout is killing me. Spent a while figuring out how to build a simple windows exe to test _penter/callcap approaches. Got stuck on both :( Filed bug 561883 to get that settled.
Cleaned up Treehydra such that it builds on gcc 4.5.0. Misc doc fixes. Will need to fix it completely and switch our static analysis over to 4.5(once gcc 4.5 is deployed).
Next:
bug 553721: Hoping to get further on this with some help.
bug 558200: extension badness profiler. (might work on this while waiting on linux pgo + windows help)
Posted on 2010-04-19
Done:
bug 418866 [PGO on linux]: Spent a while playing with different gcc versions, the build system, etc. Turns out GCC 4.4 is the only compiler that can successfully compile Firefox with PGO. Filed bug 559964 to get 4.4 setup on tinderboxen.
bug 560095: Helped Mitch help me rid of the top GetService calls :)
Did some xperf-measurements on windows cold startup. Discovered the the sad story of windows page-fault-io. Will blog about it. Bought the winternals book which only confirmed how much life sucks on windows.
Ehsan started work on binary reordering on windows, bug 553721
Filed bugs 558200 and 559663. I think it is extremely important to address both of them. The extension profiler would be highest priority for me, but I need to finish binary reordering first. We need to get a good way to profile extensions if we want to make significant improvements in real-world startup. Measuring cold startup via IOPS is also important, it is a much more precise measurement than cold startup time and it gives a much better indication of progress and/or remaining work.
Next:
Blog windows startup. Investigate 553721. Push linux pgo further.
Posted on 2010-04-12
Done:
bug 549749 [Startup-optimized binary layout]: Done R&Ding. Fairly confident that I've explored all of the big remaining wins in binary layout. Ongoing enlightening discussions with kernel devs(LKML), libc maintainer(bugzilla) on why linux is insufficiently efficient at loading binaries(lack of dev attention it seems). It is worth fixing linux issues once we get firefox optimized enough on our end. This should allow for an extra 5-20% speedup. I would be very happy to point another developer at things needing fixing(most of the fixes should be easy).
bug 549749 [icegrind, plugin for producing ordered binary info] Announced, documented on blog.
bug 418866 [PGO on linux]: While finalizing my numbers on bug 549749 realized that PGO will provide most of the same wins.
Bug 516085 [fast getservice replacements] Resolved.
bug 512584 [superfast Cc/Ci]: No progress.
bug 548427 [machine for sixgill]: It got the machine configured enough to be useful for bhackett.
Landed a couple of Dehydra patches by Mike Hommey,
File bug 557319 because it would be interesting to know what effect thumb would have on our arm binaries.
Next:
bug 418866 [PGO on linux]: Investigate what can be done to enable this on a tinderbox
bug 512584 [superfast Cc/Ci]: Abandon this bug for now. The wins here are very small compared to the other bugs I'm working on. On the other hand there is unknown amount of work remaining(to make sure my replacement code imitates xpcom well enough).
Posted on 2010-04-05
Done:
bug 549749 [Startup-optimized binary layout]: Getting close to being finished. Currently at a startup 30% speedup, 2mb memory savings without any hacks.
http://blog.mozilla.com/tglek/2010/04/05/linux-how-to-make-startup-suck-less-and-reduce-memory-usage/ most recent report
http://blog.mozilla.com/tglek/2010/03/29/linux-startup-inefficiency/ summary of toolchain issues that are blocking us from faster startup. I filed toolchain bugs in their appropriate bugzillas.
bug 556446 [Dead code tracking] Blogged about Ehren's adventures http://blog.mozilla.com/tglek/2010/03/31/how-to-get-reviews-fast-delete-code/
Bug 516085 [Efficient replacements for common getService() calls] got 2 r+s ready to land
bug 512584 [superfast Cc/Ci]: Dietrich's test landing confirmed there is a win. Need to look at making this landable.
Next:
bug 549749 [Startup-optimized binary layout]: Get valgrind plugin working with static builds, release it. Move towards integrating this into our builds.
Posted on 2010-03-29
Done:
bug 552121 [omni jar] Once mwu fixed "make package" so I could compare binaries closer to their release form there was "only" a 10% win left vs the 40% improvement I observed when running out of /dist/bin/. This is still worth adopting by Firefox.
bug 549749 [Startup-optimized binary layout]: Nailed it! Graphing my io logs made it painfully obvious how to optimize our startup. The following blog posts summarize my progress http://blog.mozilla.com/tglek/2010/03/23/when-in-trouble-draw-a-picture/ http://blog.mozilla.com/tglek/2010/03/24/linux-why-loading-binaries-from-disk-sucks/ http://blog.mozilla.com/tglek/2010/03/25/madvise-prelink-update/ Shockingly, my guess at fat wins of up to 50% due to binary reordering seems to be realistic. I got around a 40% improvement with the hacks in bug 554421. There are some gotchas here. a) I'm not sure if this would be applicable on Windows(madvise should work similarly on OSX) b) madvise changes should really go into glibc instead of us hacking around glibc's dynamic linker. I got a fair bit of moral support on this from guys in RedHat/suse. Should probably suggest to people who ship firefox on linux to take SuSE's madvise() glibc patch until that is landed in glibc mainline.
Bug 541828 [Firefox 3.6 Crash in Jar stuff]: Patched and got r+ on weekend.
Next:
bug 549749 [Startup-optimized binary layout]: Finish diagnosing the runtime-linker-induced io. Almost done, just need to test + poke relevant GNU people.
bug 549749 [Valgrind plugin to generate scripts to pass to link to create startup-optimized binaries] Need to change my approach slightly such that the V plugin works in tandem with my libelf utility due to limitations in V's elf parsing.
Posted on 2010-03-22
Done:
bug 549749 [Startup-optimized binary layout]: Discussed my issues with jimb, got debugging advice. Had another conversation with Michael Meeks(who did similar work on OpenOffice/gnome). As a result I figured out how to graph my io log, now I can see that my problems are caused by the data sections(should be easily fixable) and threads(possibly fixable). bug 552121 [omnijar-ridiculously awesome bug from mwu to cut down on the file clutter(down to a small handful of files)]. I checked the patch, it gets a preliminary 40% cold startup improvement here. This is because make dist wasn't working, had to test from build dir on unstripped binaries.
Next:
Would like to nail 549749 this week. Hopefully even start working on 553721 [windows file ordering].
Posted on 2010-03-15
Done:
bug 549749 [Startup-optimized binary layout]: Got a valgrind plugin that outputs a linker script that results in a 10% faster binary(but only 10%, should do better). Unfortunately, the binary is still accessed in an unhelpful manner, wrote a libelf tool to convert io-traces include symbols within the io ranges. Need to study linkers more :(. This has taken up most of my time since last update.
Bug 516085 [efficient/easy replacement for most getService calls]: Cleaned up patch according to review comments. Going to go through another tryserver+review cycle
bug 533038 [Extensions kill startup perf] Measured/studied/argued extension effect on ff startup
Next:
Need closure on bugs 512584 and 516085, going to wrap those up. Going to spend some time with jimb to plug remaining gaps in my understanding on how our binaries get loaded for bug 549749.
Need to develop a plan for dealing with extension startup(bug 533038). Seems that the best thing to do is to debug and document what the current extensions are doing wrong and go from there.
Posted on 2010-03-04
Done:
Had some fun diagnosing extension costs during startup: bug 533038
(Two weeks ago) proof of concept binary reordering did boost startup speed: bug 531406
Had my password reset so I can post updates again
Next:
Working on a valgrind plugin to determine a cold-startup optimized binary layout(getting close): bug 549749
Posted on 2010-01-13
Done:
bug 532771: Got windows static build to link
Next:
Finish talk for http://www.lca2010.org.nz/
Posted on 2009-12-30
Done:
- bug 512584: Completed a faster Ci.* interfaces jsnative. Confirmed that it's a win on n810 ~40ms
- bug 536911: [Hopefully] addressed a jar topcrash
- bug 536879: Fixed a treehydra regression
- bug 532771: Found a few issues with joel's patch, but i'm not use with build problems. Waiting on joel to address them
- Failed to come up with a way to reproduce bug 536792.
Next:
- Bug 536792: figure out the fasl topcrash
- Figure out a strategy for bug 512584
Posted on 2009-12-15
Done:
Progress this week:
- Confirmed a robust way of measuring startup on Windows.
- Remeasured CSS overhead
- Disabled Components.* in content for now(bug 512584)
Next:
More bug 512584:
- Move Components.class.* and Components.interfaces.* CID/IIDs into pure js objects, teach xpconnect to deal with them.
- Share Components.* across various xpconnect global objects. Exploration:
- Investigate sticking linux firefox binaries into squashfs or other readonly fs and mounting it as a loopback Other:
- Setup windows & try to help with bug 532771
-
*