timApp.document.translation package
Contents
timApp.document.translation package#
Submodules#
timApp.document.translation.deepl module#
Contains implementation of the TranslationService-interface for the DeepL machine translator: https://www.deepl.com/translator.
Both DeepL API Free and DeepL API Pro -versions.
- class timApp.document.translation.deepl.DeeplProTranslationService(values)[source]#
Bases:
timApp.document.translation.deepl.DeeplTranslationService
Translation service using the DeepL API Pro.
- id#
Translation service identifier.
- ignore_tag#
The XML-tag name to use for ignoring pieces of text when XML-handling is used. Should be chosen to be some uncommon string not found in many texts.
- service_name#
Human-readable name of the machine translator. Also used as an identifier.
- service_url#
The url base for the API calls.
- class timApp.document.translation.deepl.DeeplTranslationService(values)[source]#
Bases:
timApp.document.translation.translator.RegisteredTranslationService
Translation service using the DeepL API Free.
- get_languages(source_langs: bool) list[timApp.document.translation.language.Language] [source]#
Fetches the source or target languages from DeepL.
- Parameters
source_langs – Whether source languages must be fetched
- Returns
The list of source of target languages from DeepL.
- headers: dict[str, str]#
Request-headers needed for authentication with the API-key.
- id#
Translation service identifier.
- ignore_tag#
The XML-tag name to use for ignoring pieces of text when XML-handling is used. Should be chosen to be some uncommon string not found in many texts.
- languages() timApp.document.translation.translator.LanguagePairing [source]#
Asks the DeepL API for the list of supported languages and turns the returned language codes to Languages found in the database.
- Returns
Dictionary of source langs to lists of target langs, that are supported by the API and also found in database.
- postprocess(text: str) str [source]#
Remove unnecessary protection tags from the text and change defined aliases back to Markdown syntax.
- Parameters
text – The text returned from DeepL API after translation.
- Returns
Text with the needed operations performed to more closely match the text before passing it to DeepL API.
- preprocess(elem: timApp.document.translation.translationparser.TranslateApproval) None [source]#
Protect the text inside element from mangling in translation by adding XML-tags.
- Parameters
elem – The element to add XML-protection-tags to.
:return None. The tag is added to the input object.
- register(user_group: timApp.user.usergroup.UserGroup) None [source]#
Set headers to use the user group’s API-key ready for translation calls.
- Parameters
user_group – The user group whose API key will be used.
- Raises
NotExist – If no API key is found.
RouteException – If more than one key is found from user.
- service_name#
Human-readable name of the machine translator. Also used as an identifier.
- service_url#
The url base for the API calls.
- source_Language_code: str#
The source language’s code (helps handling regional variants that DeepL doesn’t differentiate).
- supports(source_lang: timApp.document.translation.language.Language, target_lang: timApp.document.translation.language.Language) bool [source]#
Check that the source language can be translated into target language by the translation API.
- Parameters
source_lang – Language to check the translation capability from.
target_lang – Language to check the translation capability into.
- Returns
True, if the pairing is supported.
- supports_tag_handling(tag_type: str) bool [source]#
Check if DeeplTranslationService supports a tag-handling.
- Parameters
tag_type – The tag-type to check handling for.
- Returns
True if the tag-type is supported.
- translate(texts: list[list[timApp.document.translation.translationparser.TranslateApproval]], source_lang: timApp.document.translation.language.Language | None, target_lang: timApp.document.translation.language.Language, tag_handling: str = 'xml') list[str] [source]#
Use the DeepL API to translate text between languages.
- Parameters
texts – Some set of texts to be translated.
source_lang – Language of input text. None value makes DeepL guess it from the text.
target_lang – Language for target language.
tag_handling – See comment in superclass.
- Returns
List of strings in target language with the non-translatable parts intact.
- usage() timApp.document.translation.translator.Usage [source]#
Fetch current API usage of the registered key from DeepL.
- Returns
Usage returned from DeepL.
timApp.document.translation.language module#
Contains implementation of the Language-database model, which is used to unify TIM’s translation-documents’ languages.
- class timApp.document.translation.language.Language(lang_code, lang_name, autonym, flag_uri=None)[source]#
Bases:
sqlalchemy.ext.declarative.api.Model
Represents a standardized language code used for example with translation documents.
NOTE: You should always use the provided class-methods for creating new instances!
- autonym#
Native name for the language.
- classmethod create_from_name(name: str) timApp.document.translation.language.Language [source]#
Create an instance of Language that follows a standard. Note that this should always be used when creating a new Language especially when adding it to database.
- Parameters
name – Natural name of the language
- Returns
A corresponding Language-object newly created.
- Raises
LookupError – if the language is not found from langcodes’ database.
- flag_uri#
Path to a picture representing the language.
- lang_code#
Standardized code of the language.
- lang_name#
IANA’s name for the language.
- classmethod query_all() list['Language'] [source]#
Query the database for all the languages
- Returns
All the languages found from database.
- classmethod query_by_code(code: str) Optional[timApp.document.translation.language.Language] [source]#
Query the database to find a single match for language tag
- Parameters
code – The IETF tag for the language.
- Returns
The corresponding Language-object in database or None if not found.
timApp.document.translation.reversingtranslator module#
Contains the implementation of ReversingTranslationService and its target language, which are used in (NOTE:) unit-tests for translation routes.
- timApp.document.translation.reversingtranslator.REVERSE_LANG = {'autonym': 'esreveR', 'lang_code': 'rev-Erse', 'lang_name': 'Reverse'}#
Language that the ReversingTranslationService translates text into. To use in tests.
- class timApp.document.translation.reversingtranslator.ReversingTranslationService(**kwargs)[source]#
Bases:
timApp.document.translation.translator.TranslationService
Translator to test if the list[list[TranslateApproval]]-structure is generic enough to (easily) use for integrating new machine translators into TIM.
- get_languages(source_langs: bool) list[timApp.document.translation.language.Language] [source]#
Reverse-language is supported as the only target language.
- Parameters
source_langs – See documentation on TranslationService.
- Returns
See documentation on TranslationService.
- id#
Translation service identifier.
- languages() timApp.document.translation.translator.LanguagePairing [source]#
- Returns
Mapping from all languages in database into the reversed language.
- service_name#
Human-readable name of the machine translator. Also used as an identifier.
- supports(source_lang: timApp.document.translation.language.Language, target_lang: timApp.document.translation.language.Language) bool [source]#
Check if language pairing is supported.
- Parameters
source_lang – Language to translate from.
target_lang – Only the REVERSE_LANG -language-code is supported.
- Returns
True, if target_lang is rev-Erse.
- supports_tag_handling(tag_type: str) bool [source]#
Check if the service supports tag handling in translations. For example using XML-tags, some services offer controlling parts of the text, that should be kept as-is and not be affected by the machine translation: “My name is Dr. <protect>Oak</protect>.”
NOTE this is related to the kinda HACKY way of handling Markdown-tables in DeepL-translation.
- Parameters
tag_type – Type of the tag. Some services for example support “xml” or “html”.
- Returns
True, if the tag type is supported.
- translate(texts: list[list[timApp.document.translation.translationparser.TranslateApproval]], src_lang: timApp.document.translation.language.Language, target_lang: timApp.document.translation.language.Language, *, tag_handling: str = '') list[str] [source]#
Reverse the translatable text given. NOTE The algorithm here for combining translation results back to original structure might be integrated into the actual TranslationService-implementation. Note This implementation does not fully follow the needed interface.
- Returns
- Parameters
texts – Texts to reverse
src_lang – Any.
target_lang – Only REVERSE_LANG[“lang_code”] is supported.
tag_handling – tags to intelligently handle during translation TODO XML-handling.
- Returns
Texts where translatable ones have been reversed.
- usage() timApp.document.translation.translator.Usage [source]#
Infinite quota
timApp.document.translation.routes module#
Contains routes for making operations on translation documents. Mainly translations on whole documents, paragraphs and raw text.
Also contains routes for getting available languages, names of machine translators and queries related to API-keys of these machine translators.
- timApp.document.translation.routes.add_api_key() flask.wrappers.Response [source]#
Add API key to the database for current user.
- Returns
OK response if adding the key was successful.
- timApp.document.translation.routes.create_translation_route(tr_doc_id: int, language: str, translator: str) flask.wrappers.Response [source]#
Create and add a translation version of a whole document. Make machine translation on it if so requested and authorized to.
- Parameters
tr_doc_id – ID of a document that the translation can be made based on. ID of document, that is or is linked to the original source document.
language – Language that will be set to the translation document and used in potential machine translation.
translator – Identifying name of the translator to use (machine or manual).
- Returns
The created translation document’s information as JSON.
- timApp.document.translation.routes.get_all_languages() flask.wrappers.Response [source]#
Query the database for all the available languages to be used for documents.
- Returns
JSON response containing all the available languages.
- timApp.document.translation.routes.get_keys() flask.wrappers.Response [source]#
Gets the user’s API keys.
- Returns
The user’s API keys as JSON.
- timApp.document.translation.routes.get_languages(source_languages: bool) flask.wrappers.Response [source]#
Get list of supported languages by machine translator.
- Parameters
source_languages – Flag for getting source-language (True) list instead of target-language (False).
- Returns
List of the supported languages by type (source or target).
- timApp.document.translation.routes.get_my_translators() flask.wrappers.Response [source]#
Gets the names of the translators the user has the API keys for.
- Returns
The JSON-list of the names of the translators the user has the API keys for.
- timApp.document.translation.routes.get_quota()[source]#
Gets the quota info for the user’s API key.
- Returns
The used and available quota for the user’s API key as JSON.
- timApp.document.translation.routes.get_source_languages() flask.wrappers.Response [source]#
Query the database for the possible source languages.
- Returns
JSON response containing the languages.
- timApp.document.translation.routes.get_target_languages() flask.wrappers.Response [source]#
Query the database for the possible target languages.
- Returns
JSON response containing the languages.
- timApp.document.translation.routes.get_translators() flask.wrappers.Response [source]#
Query the database for the possible machine translators.
- Returns
JSON response containing the translators.
- timApp.document.translation.routes.get_valid_status() flask.wrappers.Response [source]#
Check the validity of a given api-key with the chosen translator engine.
- Returns
OK-response if the key is valid, or an Exception.
- timApp.document.translation.routes.is_valid_language_id(lang_id: str) bool [source]#
Check that the ID is recognized by the langcodes library and found in database.
- Parameters
lang_id – Language id (or “tag”) to check for validity.
- Returns
True, if the standardized ID is found in database.
- timApp.document.translation.routes.paragraph_translation_route(tr_doc_id: int, tr_par_id: str, language: str, transl: str) flask.wrappers.Response [source]#
Replace the content of paragraph with requested translation.
- Parameters
tr_doc_id – ID of the document that the paragraph is in.
tr_par_id – ID of the paragraph in the Translation NOTE: NOT the original paragraph!
language – Language to translate into.
transl – Identifying code of the translator to use.
- Returns
OK-response if translation and modification was successful.
- timApp.document.translation.routes.remove_api_key() flask.wrappers.Response [source]#
Remove the current user’s API key from the database.
- Returns
OK-response if removing the key was successful.
- timApp.document.translation.routes.text_translation_route(tr_doc_id: int, language: str, transl: str) flask.wrappers.Response [source]#
Translate raw text between the source document’s language and the one requested.
- Parameters
tr_doc_id – ID of the document that the text is from.
language – Language to translate the text into.
transl – Identifying code of the translator to use.
- Returns
The translated text.
- timApp.document.translation.routes.translate_full_document(tr: timApp.document.translation.translation.Translation, src_doc: timApp.document.document.Document, target_language: timApp.document.translation.language.Language, translator_code: str) None [source]#
Translate matching paragraphs of document based on an original source document.
- Parameters
tr – The metadata of the translation target.
src_doc – The original source document with translatable text.
target_language – The language to translate the document into.
translator_code – Identifier of the translator to use (machine or “Manual” if empty).
- Returns
None. The translation is applied to document based on the tr-parameter.
timApp.document.translation.synchronize_translations module#
- timApp.document.translation.synchronize_translations.synchronize_translations(doc: timApp.document.docinfo.DocInfo, edit_result: timApp.document.editing.documenteditresult.DocumentEditResult)[source]#
Synchronizes the translations of a document by adding missing paragraphs to the translations and deleting non-existing paragraphs.
- Parameters
edit_result – The changes that were made to the document.
doc – The document that was edited and whose translations need to be synchronized.
timApp.document.translation.translation module#
- class timApp.document.translation.translation.Translation(**kwargs)[source]#
Bases:
sqlalchemy.ext.declarative.api.Model
,timApp.document.docinfo.DocInfo
A translated document.
Translation objects may be created in two scenarios:
An existing non-translated document is assigned a language.
A new translated document is created (via manage view).
- doc_id#
- docentry#
- property id#
Returns the item id.
- lang_id#
- property path#
Returns the Document path, including the language part in case of a translation.
- property path_without_lang#
Returns the Document path without the language part in case of a translation.
- property public#
- src_docid#
- property translations: list['Translation']#
Returns the translations of the document. NOTE: The list includes the document itself.
- timApp.document.translation.translation.add_tr_entry(doc_id: int, item: timApp.document.docinfo.DocInfo, tr: timApp.document.translation.translation.Translation) timApp.document.translation.translation.Translation [source]#
timApp.document.translation.translationparser module#
This module contains the main functions needed for marking parts of the Markdown used in TIM into translatable text (human-spoken language) and non-translatable text (syntax of Markdown and TIM-plugins for example).
Basically only the get_translate_approvals -function should be called directly by users.
- timApp.document.translation.translationparser.NOTRANSLATE_STYLE_LONG = 'notranslate'#
Longer string used for marking non-translatable text in TIM’s Markdown
- timApp.document.translation.translationparser.NOTRANSLATE_STYLE_SHORT = 'nt'#
Shorter string used for marking non-translatable text in TIM’s Markdown
- class timApp.document.translation.translationparser.NoTranslate(text: str = '')[source]#
Bases:
timApp.document.translation.translationparser.TranslateApproval
Subclass of TranslateApproval, which indicates that the string value of the class will not be translated.
- timApp.document.translation.translationparser.PLUGIN_MD_PREFIX = 'md:'#
Prefix in plugin’s values that can be parsed into Markdown. The prefix does not contain delimiters and is not preceded by spaces.
- class timApp.document.translation.translationparser.Table(text: str = '')[source]#
Bases:
timApp.document.translation.translationparser.TranslateApproval
Hacky way to translate tables by identifying them at translation and setting html-tag handling on.
- class timApp.document.translation.translationparser.Translate(text: str = '')[source]#
Bases:
timApp.document.translation.translationparser.TranslateApproval
Subclass of TranslateApproval, which indicates that the string value of the class will be translated.
- class timApp.document.translation.translationparser.TranslateApproval(text: str = '')[source]#
Bases:
object
Superclass for text that should or should not be passed to a machine translator.
- text: str = ''#
- class timApp.document.translation.translationparser.TranslationParser(quote: str = '"')[source]#
Bases:
object
- add_value_with_prefix(text: str, arr: list[timApp.document.translation.translationparser.TranslateApproval], plugin_quote: str = '"') None [source]#
Separates the contents of a YAML string-prefix and value found in plugins and adds to the list.
The text can possibly start with the “md:” prefix (NoTranslate) for content that is Markdown, and the rest after that is the value (Translate).
- Parameters
text – The text that can be contained with (plugin).
arr – The list that the results will be added to.
plugin_quote – The quote to use inside the potential Markdown.
- Returns
None, the result is inserted into the arr-parameter.
- attr_collect(content: list) Tuple[list[timApp.document.translation.translationparser.TranslateApproval], bool] [source]#
Collect the parts of Attr into Markdown.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – Pandoc-ASTs JSON form of Attr (attributes): [ str, [str], [(str, str)] ].
- Returns
List of non/translatable parts and boolean indicating, whether. the .notranslate -style was found in the element.
- block_collect(top_block: dict, depth: int = 0) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Walks the whole block and appends each translatable and non-translatable string-part into a list in order. Adds newlines to the start of each block and end of some specific blocks, for Markdown syntax. These newlines are required due to Pandoc removing the newlines in formatting.
Based on the pandoc AST-spec at: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block
- Parameters
top_block – The block to collect strings from.
depth – The depth of the recursion if it is needed for example with list-indentation.
- Returns
List of strings inside the correct approval-type.
- bulletlist_collect(content: dict, depth: int) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a bullet list element through recursion. Calls to list_collect to handle recursion through block_collect.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block
- Parameters
content – Bullet list (attributes and a list of items, each a list of blocks): [ ListAttributes, [[Block]] ].
depth – The current depth of the list, used for indentation.
- Returns
List of translatable and untranslatable areas within a bullet list element.
- cite_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a citation element. Citation element is delimited by citation marks.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – Citation (list of inlines) from Inline element: [ [Citation], [Inline] ].
- Returns
List containing the parsed collection of Citation content.
- code_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect everything within an Inline code element as untranslatable areas due to no clear context if the text should remain in the origin language or not element. Inline Code element is defined through spacing before the string.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – Inline code (literal) from Inline element: [ Attr, Text ].
- Returns
List containing the collection of Inline code content.
- codeblock_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Pick translatable and non-translatable parts off of a codeblock.
NOTE/WARNING In regard to plugins:
It is critical that the attributes do not include the TIM-identifier eg. id=”SAs3EK96oQtL” from {plugin=”csPlugin” id=”SAs3EK96oQtL”}, because Pandoc deletes extra identifiers contained in attributes like #btn-tex2 and id=”SAs3EK96oQtL” in {plugin=”csPlugin” #btn-tex2 id=”SAs3EK96oQtL”}. Here, the attributes of a plugin-codeblock are DISCARDED and will not be included in the result when markdown is reconstructed i.e. caller should save the attributes if needed.
- Parameters
content – List with the attributes and text-content of the codeblock.
- Returns
List marking the Markdown representation of the element into translatable and non-translatable parts.
- collect_tim_plugin(attrs: dict, content: str) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Special case to collect translatable and non-translatable parts of a TIM-plugin based on its (YAML) contents.
- Parameters
attrs – Pandoc-AST defined Attr -attributes of the plugin-block for example plugin=”csPlugin”. TODO Add handling for this if necessary.
content – The raw markdown content of the plugin-defined paragraph.
- Returns
List of the translatable and non-translatable parts.
- definitionlist_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect definition list areas as untranslatable. Each list item is a pair consisting of a term (a list of inlines) and one or more definitions (each a list of blocks).
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block
- Parameters
content – Definition list. : [([Inline], [[Block]])].
- Returns
List of single NoTranslate -element containing Markdown representation of definition list.
- div_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collects generic block container with attributes as untranslatable.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block
- Parameters
content – Generic block container with attributes: [ Attr [Block] ].
- Returns
List of single NoTranslate -element containing Markdown representation of div element.
- get_translate_approvals(md: str) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
By parsing the input text, identify parts that should and should not be passed to a machine translator.
- TODO Does this need to return list of lists, when the function of this is
to split markdown into parts that can be translated or not?
- Parameters
md – The input text to eventually translate.
- Returns
Lists containing the translatable parts of each block in a list.
- header_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a header from a block.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block
- Parameters
content – Header’s level (integer) and text (inlines): [ int, Attr, [Inline] ].
- Returns
List of translatable and untranslatable areas within a header element.
- image_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within an image element.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – Attr, alt text (list of inlines), target: [ Attr, [Inline], Target ].
- Returns
List containing the parsed collection of image content.
- inline_collect(top_inline: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within an Inline element.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline Types are listed as emphasized text in the list and the values after it is the content.
- Parameters
top_inline – Made out of type and content. Type defines the case and content is the value of that type.
- Returns
List of translatable and untranslatable areas within an Inline element.
- link_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a link element.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – Attr, alt text (list of inlines), target: [ Attr, [Inline], Target ].
- Returns
List containing the parsed collection of link content.
- link_or_image_collect(content: dict, islink: bool) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a link or image element. Universal collector for both link and image collect due to them having the same outline in markdown, except for “[” or “![” prepend.
- Parameters
content – Attr, alt text (list of inlines), target: [ Attr, [Inline], Target ].
islink – True-state if content is link-element (true=link, false=image).
- Returns
List containing the parsed collection of link or image content.
- list_collect(blocks: list[list[dict]], depth: int, attrs: Optional[Tuple[int, str, str]]) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
General method for handling both bullet- and ordered lists.
- Parameters
blocks – The [[Block]] found in Pandoc definition for the lists.
depth – The depth of recursion with lists (can contain lists of lists. of lists …).
attrs – The information related to the style of the OrderedList items.
- Returns
List containing the translatable parts of the list.
- math_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a math element.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – TeX math (literal) from Inline: [ MathType, Text ].
- Returns
List containing the parsed collection of math content.
- merge_consecutive(arr: Iterable[timApp.document.translation.translationparser.TranslateApproval]) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Merge consecutive elements of the same type into each other to reduce length of the list.
The merging is as follows (T = Translate, NT = NoTranslate):
[T(“foo”), T(” “), T(“bar”), NT(”
- “), NT(“[“), T(“click”),
NT(“](www.example.com)”)]
==>
[T(“foo bar”), NT(”
[“), T(“click”), NT(“](www.example.com)”)]
- param arr
The list of objects to merge.
- return
Merged list.
- notranslate_all(type_: str, content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Mark the whole element as non-translatable.
- TODO NOTE This function does not seem to produce Markdown consistent with
TIM’s practices, and using this should eventually be replaced with the specific *_collect -functions!
- Parameters
type – Pandoc AST-type of the content.
content – Pandoc AST-content of the type.
- Returns
List of single NoTranslate -element containing Markdown representation of content.
- ordered_list_styling(start_num: int, num_style: str, num_delim: str) str [source]#
Makes the style for the ordered lists.
Different styles for ordered lists: num_styles - Decimal (1,2,3), LowerRoman(i,ii,iii), LowerAlpha(a,b,c),
UpperRoman (I,III,III), UpperAlpha(A,B,C), DefaultStyle (#)
- num_delims - Period( . ), OneParen( ) ), DefaultDelim ( . ),
TwoParens ( (#) )
- Parameters
start_num – The number that starts the list.
num_style – The numbering style.
num_delim – The punctuation for list.
- Returns
The list style that needs to be used.
- orderedlist_collect(content: dict, depth: int) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within an ordered list element through recursion. Calls to list_collect to handle recursion through block_collect.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block
- Parameters
content – Ordered list (attributes and a list of items, each a list of blocks): [ ListAttributes, [[Block]] ].
depth – The current depth of the list, used for indentation.
- Returns
List of translatable and untranslatable areas within an ordered list element.
- quote: str = '"'#
- quoted_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within quatation marks. Quatation element is delimited by quatation marks.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – The types of quatation marks used and the text (list of inlines) from Inline element: [ QuoteType, [Inline] ].
- Returns
List containing the parsed collection of Quoted content.
- rawblock_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Pick translatable and non-translatable parts from a rawblock.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block
- Parameters
content – The Raw block [ Format, Text ].
- Returns
List of single NoTranslate -element containing Markdown representation of rawblock element.
- rawinline_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a rawinline element.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – RawInline from Inline: [ Format, Text ].
- Returns
List containing the parsed collection of rawinline content.
- span_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a generic inline container with attributes.
Pandoc: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Inline
- Parameters
content – Generic inline container with attributes: [Attr, [Inline] ].
- Returns
List containing the parsed collection of span area.
- table_collect(content: dict) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect table areas as untranslatable.
Refer to Pandoc definition for tables: https://hackage.haskell.org/package/pandoc-types-1.22.1/docs/Text-Pandoc-Definition.html#t:Block
- Parameters
content – Table content as dict.
- Returns
List of single NoTranslate -element containing Markdown representation of table.
- tex_collect(content: str) list[timApp.document.translation.translationparser.TranslateApproval] [source]#
Collect and separate translatable and untranslatable areas within a LaTeX element.
- Parameters
content – String which contains LaTeX area.
- Returns
List containing the parsed collection of LaTeX content.
- timApp.document.translation.translationparser.to_alphabet(num: int) str [source]#
Converts the start number from Pandoc’s alphabet list to the corresponding character.
- Parameters
num – The list’s starting number.
- Returns
The alphabet corresponding the starting number.
- timApp.document.translation.translationparser.to_roman_numeral(num: int) str [source]#
Converts the start number from Pandoc’s Roman number list to the corresponding number. Source: https://stackoverflow.com/questions/28777219/basic-program-to-convert-integer-to-roman-numerals
- Parameters
num – The list’s starting number.
- Returns
The Roman number corresponding the starting number.
timApp.document.translation.translator module#
This module contains most notably the TranslationService-interface that different machine translators must implement in order to be integrated into TIM’s machine translation feature.
Other notable things include a database model for the API-keys of machine translator services and a processor/wrapper by which the different translators can be used to translate text from one language to another.
- class timApp.document.translation.translator.LanguagePairing(value: dict[str, list[timApp.document.translation.language.Language]])[source]#
Bases:
object
Maps standardized codes of (source) Languages to lists of (target) Language objects.
- value: dict[str, list[timApp.document.translation.language.Language]]#
- class timApp.document.translation.translator.RegisteredTranslationService(**kwargs)[source]#
Bases:
timApp.document.translation.translator.TranslationService
A translation service whose use is constrained by user group.
- id#
Translation service identifier.
- register(user_group: timApp.user.usergroup.UserGroup) None [source]#
Set some state to the service object based on user group.
- Parameters
user_group – The somehow related user group.
- Returns
None.
- service_name#
Human-readable name of the machine translator. Also used as an identifier.
- timApp.document.translation.translator.TranslateBlock#
Typedef to represent logically connected parts of non- and translatable text.
- class timApp.document.translation.translator.TranslateProcessor(translator_code: str, s_lang: str, t_lang: str, user_group: timApp.user.usergroup.UserGroup | None)[source]#
Bases:
object
- translate(pars: list[timApp.document.translation.translator.TranslationTarget]) list[str] [source]#
Translate a list of text-containing items using the TranslationService-instance and languages set at initialization.
- Parameters
pars – TIM-paragraphs containing Markdown to translate.
- Returns
The translatable text contained in input paragraphs translated according to the processor-state (languages and the translator).
- class timApp.document.translation.translator.TranslationService(**kwargs)[source]#
Bases:
sqlalchemy.ext.declarative.api.Model
Represents the information and methods that must be available from all possible machine translators.
- get_languages(source_langs: bool) list[timApp.document.translation.language.Language] [source]#
Return languages supported by the TranslationService.
- Parameters
source_langs – Whether source languages must be returned.
- Returns
The list of supported source or target languages.
- id#
Translation service identifier.
- languages() timApp.document.translation.translator.LanguagePairing [source]#
Get the language-combinations for translations supported with the service.
- Returns
The supported mapping of languages to translate to and from with this TranslationService.
- service_name#
Human-readable name of the machine translator. Also used as an identifier.
- supports(source_lang: timApp.document.translation.language.Language, target_lang: timApp.document.translation.language.Language) bool [source]#
Check if the service supports a language-combination.
- Parameters
source_lang – Language to translate from.
target_lang – Language to translate into.
- Returns
True, if the service can translate from source_lang to target_lang.
- supports_tag_handling(tag_type: str) bool [source]#
Check if the service supports tag handling in translations. For example using XML-tags, some services offer controlling parts of the text, that should be kept as-is and not be affected by the machine translation: “My name is Dr. <protect>Oak</protect>.”
NOTE this is related to the kinda HACKY way of handling Markdown-tables in DeepL-translation.
- Parameters
tag_type – Type of the tag. Some services for example support “xml” or “html”.
- Returns
True, if the tag type is supported.
- translate(texts: list[list[timApp.document.translation.translationparser.TranslateApproval]], source_lang: timApp.document.translation.language.Language, target_lang: timApp.document.translation.language.Language, *, tag_handling: str = '') list[str] [source]#
Translate texts from source to target language.
The implementor of this method should return the (translated) text in the same order as found in the input texts-parameter originally.
- Parameters
texts – The texts marked for translation or not. A convention would be to pass as much of the translatable text as possible in this parameter in order to minimize the amount of separate translation-calls.
source_lang – Language to translate from.
target_lang – Language to translate into.
tag_handling – Tag representing a way to separate or otherwise control translated text with the translation service. A HACKY way to handle special case with translating (html) tables.
- Returns
List of strings found inside the items of texts-parameter, in the same order and translated.
- usage() timApp.document.translation.translator.Usage [source]#
Get the service’s usage status.
- Returns
The current usage of this TranslationService (for example status of an API-key).
- class timApp.document.translation.translator.TranslationServiceKey(**kwargs)[source]#
Bases:
sqlalchemy.ext.declarative.api.Model
Represents an API-key (or any string value) that is needed for using a machine translator and that one or more users are in possession of.
- api_key#
The key needed for using related service.
- static get_by_user_group(user_group: timApp.user.usergroup.UserGroup | None) timApp.document.translation.translator.TranslationServiceKey [source]#
Query a key based on a group that could have access to it.
- Parameters
user_group – The group that wants to use a key.
- Returns
The first matching TranslationServiceKey instance, if one is found.
- group: timApp.user.usergroup.UserGroup#
The group that can use this key.
- group_id#
- id#
Key identifier.
- service: timApp.document.translation.translator.TranslationService#
The service that this key is used in.
- service_id#
- class timApp.document.translation.translator.TranslationTarget(value: str | timApp.document.docparagraph.DocParagraph)[source]#
Bases:
object
Type that can be passed around in translations.
- value: str | timApp.document.docparagraph.DocParagraph#
- class timApp.document.translation.translator.Usage(character_count: int, character_limit: int)[source]#
Bases:
object
Contains information about the usage of a translator service.
- character_count: int#
- character_limit: int#
- timApp.document.translation.translator.replace_md_aliases(text: str) str [source]#
Replace the aliases that are used in place of Markdown-syntax-characters.
On some machine translators (tested with DeepL) the Markdown syntax characters break easier compared to their HTML-style counterparts. This is baked into the translation-parser, but must be converted back to Markdown-style in order to follow TIM’s preferences. :param text: Text to replace the HTML-tags of. :return: Text with the HTML-tags replaced.