Closed
Bug 983146
Opened 10 years ago
Closed 6 years ago
[GSoC2014] [Week 6] All-At-Once terminology replacement method
Categories
(Intellego Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gueroJeff, Unassigned)
References
()
Details
Tracking bug. All-At-Once Replacement Method: Regenerate the DOM with matched target terminology, output content into a new webpage, and render it.
Reporter | ||
Comment 1•10 years ago
|
||
Terminology matching takes place over the entirety of extracted text from DOM, rather than node by node (or segment by segment). Regenerate the DOM with matched target terminology, output content into a new webpage, and render it.
Summary: [meta] Post-processes website regeneration → [meta] All-At-Once terminology replacement method
Updated•10 years ago
|
Summary: [meta] All-At-Once terminology replacement method → [GSoC2014] [Week 6] All-At-Once terminology replacement method
Last week I had progressed with the project in making a working prototype of web page translator. My mentor had pointed out a few issues to fix to improve the system. The TBX file we used had a few errors, some segment pairs had translated words separated by commas. The string replacement was also not taking into account pluralisation or capitalisation when replacing the source string. I also found that the text contents were sent all at once and I needed to attempt a different method to check for any difference. A segment by segment method seemed like a good approach. The current approach to translating the text makes use of Javascript to replace the text found in the DOM. Through research I came across the NLTK in python and it has many utilities that we could reuse for our project such as tokenising segments of text. Moving the translation to the server side meant that utilities provided by NLTK could be used in the process to translate the DOM and sent to the browser already translated to the target language. All hyperlinks on the website has to be changed so that when clicked it would load within the iframe and go through our proxy. Many sites including Mozilla Support sites have the X-Frame:Deny header, meaning that the website cannot be browsed within an IFrame. To get around this issue, we load each link through our translation engine - so it fetches the raw html, translates the DOM and send it to the client for the IFrame to load.
Reporter | ||
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•