Custom word files in DefInjections for rule packs

Started by Elevator, November 26, 2017, 03:17:46 PM

Previous topic - Next topic

Elevator

A lot of languages have different forms of the same word: plural/singular, gender-specific, etc.
RulePackDefs in current implementation only allow to use one single form for each word, which is often not appropriate for different contexts.
For example, in English version NamerFactionPirate rule pack contains lines:

        <li>name->The [badassadjective] [groupname]</li>
        <li>name->The [badassadjective] [badassanimal]s</li>
        <li>name->The [badassadjective] [badassperson]s</li>

These rules generate faction names in plural form just by adding "s" to the end. This trick is only actual for English, but not for many other languages. Moreover, some languages, for example Russian, require using plural form of [badassadjective] (because the adjective is connected to the plural noun). This is really annoying thing that leads to poor quality of translation.

There is a way to get rid of this problem by adding extra <rulesRaw> in Def files:
  <RulePackDef>
    <defName>NamerFactionPirate</defName>
    <rulePack>
      <rulesStrings>
      ...
      </rulesStrings>
      <li Class="Rule_File">
        <keyword>badassanimal</keyword>
        <path>Words/Nouns/Animals_Badass</path>
      </li>
      <li Class="Rule_File">
        <keyword>badassadjective</keyword>
        <path>Words/Adjectives/Badass</path>
      </li>
      <li Class="Rule_File">
        <keyword>badassperson</keyword>
        <path>Words/Nouns/People_Badass</path>
      </li>
      <li Class="Rule_File">
        <keyword>badassconcept</keyword>
        <path>Words/Nouns/Concepts_Badass</path>
      </li>
      <li Class="Rule_File">
        <keyword>groupname</keyword>
        <path>Words/Nouns/GroupNames</path>
      </li>
      <li Class="Rule_File">
        <keyword>badasscolor</keyword>
        <path>Words/Nouns/Colors_Badass</path>
      </li>
      <!--EXTRA FILES-->
      <li Class="Rule_File">
        <keyword>extra</keyword>
        <path>Some_Empty_File</path>
      </li>
      <li Class="Rule_File">
        <keyword>extra</keyword>
        <path>Some_Empty_File</path>
      </li>
      <li Class="Rule_File">
        <keyword>extra</keyword>
        <path>Some_Empty_File</path>
      </li>
      <li Class="Rule_File">
        <keyword>extra</keyword>
        <path>Some_Empty_File</path>
      </li>
      <li Class="Rule_File">
        <keyword>extra</keyword>
        <path>Some_Empty_File</path>
      </li>
      <li Class="Rule_File">
        <keyword>extra</keyword>
        <path>Some_Empty_File</path>
      </li>
    </rulesRaw>
  </rulePack>
</RulePackDef>


This allows translators to add custom files with custom keywords, which could be used in DefInjections for rulesStrings:

<NamerFactionPirate.rulePack.rulesRaw.7.keyword>badassadjective_plural</NamerFactionPirate.rulePack.rulesRaw.7.keyword>
<NamerFactionPirate.rulePack.rulesRaw.7.path>Words/Adjectives/Badass_plural</NamerFactionPirate.rulePack.rulesRaw.7.path>
<NamerFactionPirate.rulePack.rulesRaw.8.keyword>badassperson_plural</NamerFactionPirate.rulePack.rulesRaw.8.keyword>
<NamerFactionPirate.rulePack.rulesRaw.8.path>Words/Nouns/People_Badass_plural</NamerFactionPirate.rulePack.rulesRaw.8.path>


If you add extra rulesRaws to RulePackDef files it will be a convenient and easy way to improve quality of translations for all languages!

fiziologus

levator rigth, but only partialy. And fixing via "give me lot rawRules" very bad idea: grow code, less stability and performance, and more memory usage (each empty file must be load).
"A lot of languages have different forms of the same word", but this form have (in most case) strict grammatic rules and may be adapted to curent lang string.
"These rules generate faction names in plural form just by adding "s" to the end. This trick is only actual for English". This trick not work fully even english, as adding "ы" for plural in russian. Anyway this (and plural/gender for adjective) easy fixing via Language_Worker (strict grammatic) or (much less easy) via word tricking.
Truly trick is string like
name->[animal] [tribalword]
Animal here is an adjective. Use noun as adjective very lang specific, and no fixing because no strict mutated rules. Need fully separate (clear via rawRules or hiden via keyword_path is Word/Path/Keyword.txt) noun to noun, adjective to adjective and verbe to verbe, even if same word in english.

Elevator

QuoteAnd fixing via "give me lot rawRules" very bad idea: grow code, less stability and performance, and more memory usage (each empty file must be load).
What exactly are you talking about? Adding new rawRules to defs cannot lead to any decrease in stability or performance because it doesn't require any code modification. As weel as loading empty files (or the same empty file) doesn't reqiure any extra usage of memory as there is no data in them.

wwWraith

Quote from: fiziologus on February 08, 2018, 04:04:52 AM
"These rules generate faction names in plural form just by adding "s" to the end. This trick is only actual for English". This trick not work fully even english, as adding "ы" for plural in russian.

No, in Russian there are several ways of generating plurals ("~и", "~а"), sometimes it requires more or less changing of the whole word; exceptions exist even in English: "mouse" -> "mice".
Think about it. Think around it. Perhaps you'll get some new good idea even if it would be completely different from my words.

fiziologus

Quote from: Elevator on February 08, 2018, 06:58:58 PM
What exactly are you talking about? Adding new rawRules to defs cannot lead to any decrease in stability or performance because it doesn't require any code modification.
Defs also code. External code. More, all this external code load in memory (and save in memory) on start. All this external code parse on start. This both strong (quick access) and weak (memory usage) side Unity engine.
Plus, patching keyword and path (thanks for discover) is bug (feature), and may be remove any time.

At end, performance only first step. How much empty files you need in each RulePack or how much need  total (if store empty-file-rule in Global)? How much need for other language, more than you or less?

"Give me lot rawRules" worst way. Extend rawRules need exactly (noun as adjective; male live thing with female name, because in no-english lang abstract word have no only neuter gender etc), but extend wisely. Many translation trouble possible avoid via word trick (some time word trick just needed, say pirate groupe's name "Donkeys" very fearfull in some language) or direct via LanguageWorker.PostProcesed (good, but some hard way).
Strings files is word database for reduce RulePack size, no need use its as tool for simple task. (Anyway, create perfect readable autogenerated string very hard task)

wwWraith, "ы" more common (rule affect to many word) and easy (no need change word in most case) way.

wwWraith

Quote from: fiziologus on February 11, 2018, 05:23:31 AM
wwWraith, "ы" more common (rule affect to many word) and easy (no need change word in most case) way.

It's not true. At least it's rarer than all others. And we are not on the elections here. And there is need to change the word in many cases (probably it's more often required than not). Just a few simple examples:

лисА - лисЫ (fox)
мышЬ - мышИ (mouse)
волк - волкИ (wolf)
заЯц - заЙцЫ (hare)
котЕНОК - котЯТА (kitten)
дом - домА (house)
рекА - рекИ (river)
деревО - деревЬЯ (tree)
куст - кустЫ (bush)
лист - листЬЯ (leaf)
полЕ - полЯ (field)
луг - лугА (meadow)
пулЯ - пулИ (bullet)
ветЕр - ветрА/ветрЫ (wind; both forms are in use)

And there are many other forms. The problem of autogenerating plurals (and some other word forms) in Russian (and in some other languages) is very untrivial. Even in many big and "serious" projects it wasn't solved successfully. And I could suggest some changes to make even English language "easier", but everyone speaking it would laught (at best) if they'll see the results. Something like using "be" instead of "am, are", "bes" instead of "is" and "beed" instead of "was, were", how do you feel it?
Think about it. Think around it. Perhaps you'll get some new good idea even if it would be completely different from my words.

jamaicancastle

Even in English there are many situations where -s just won't cover it: *potatos, *childs, *deers, all of which RW will happily throw on art. Dwarf Fortress has a system where it actually has a huge list not just of words, but also their derived forms (plurals, adjective forms, etc.), which captures all of these minute details but is an immense amount of work to set up. (Actually that's a pretty accurate description of most DF systems.)

The best short-term solution is to construct a mod for the translation so you can go in and edit the rulepacks directly, and even code-linked translations with a little patching effort.

Elevator

#7
QuoteDefs also code. External code.
Really? Maybe you'll say that backstories file is a piece of "external code" too )? Reading of the file is also required on start and this data is also placed to memory.

QuoteMore, all this external code load in memory (and save in memory) on start. All this external code parse on start. This both strong (quick access) and weak (memory usage) side Unity engine.
Ah, yeah, you are absolutely right! Several milliseconds on load and few kilobytes of memory is a really big deal (sarcasm)!

QuoteHow much empty files you need in each RulePack
Only one empty file is needed. In original Defs, new "reserved" rulesRaws may just reference this one single file. Just in order to be overriden (or not) in DefInjections for different languages.

QuoteHow much need for other language, more than you or less?
I cannot say for every language of the world, but for Russian it would be great if number of "Rule_File" entries in Global was doubled, and several "Rule_File" entries were added to some rule packs (e.g. faction name generators requre at least 10 such file references to achive the best result)

It would be much better if number of file references and rules for specific languages was not limited by the fixed number of original Def entries. I would suggest that it could be achived by placing rules to text files just lile it is done for separate words in Strings folder. For example in some folder, e.g. "Strings/Rules", there will be a file, e.g. "NamerFactionBasePirate.txt", with lines like:
name->[badassanimal] [geography]
name->[badassconcept] [geography]
name->[badassanimal]'s [geography]
name->[badassperson]'s [geography]
geography->[terrainfeature]
geography->[community]

In this case number of Def data for rulepacks and tales will dercrease significantly and a great flexibility will be the result.
What would RimWorld developers say about this approach?

fiziologus

Quote from: wwWraith on February 12, 2018, 02:32:26 PM
It's not true. At least it's rarer than all others. And we are not on the elections here. And there is need to change the word in many cases (probably it's more often required than not). Just a few simple examples:
All male genger word with empty ending (except end with ь and seven magic letter) have "ы" in plural form. (And male gender have only this two ending). This gramatic, strict gramatic and autogenerate plural possible and very easy. (little trouble may be with fugitive vowels: лев -> львы etc (in most case  enough just mark vowel in single form and delete in plural) and unchanged word (creating list of exception)).
char c = str[str.Length - 1];
                // Если последняя А берём ещё и предпоследнюю
                char c2 = (str.Length != 1 && c == "а") ? str[str.Length - 2] : '\0';
                if ( "гкхжшщч".IndexOf(c) >= 0 )
                {
                        return str + "и";
                }
                else if ( ( c == "а" && "гкхжшщч".IndexOf(c2) >= 0 ) || "йья".IndexOf(c) >= 0 )
                {
                        return str.Substring(0, str.Length - 1) + "и";
                }
                else if ( c == "о" )
                {
                        return str.Substring(0, str.Length - 1) + "а";
                }
                else if ( c == "е" || c == "ё" )
                {
                        return str.Substring(0, str.Length - 1) + "я";
                }
                else if ( c == "а" )
                {
                        return str.Substring(0, str.Length - 1) + "ы";
                }
                else return str + "ы";

QuoteReally? Maybe you'll say that backstories file is a piece of "external code" too )? Reading of the file is also required on start and this data is also placed to memory.
Really. And this files reading and parse on start and store content in memory.
Quoteame->[badassanimal] [geography]
name->[badassconcept] [geography]
name->[badassanimal]'s [geography]
name->[badassperson]'s [geography]
First two is noun-as-adjective, my first reply. This need fix via Strings[+LanguageWorker]. For two other need only LanguageWorker: case gramatic (падежи) strict (in most case).

wwWraith

Quote from: fiziologus on February 19, 2018, 12:39:22 AM
Quote from: wwWraith on February 12, 2018, 02:32:26 PM
It's not true. At least it's rarer than all others. And we are not on the elections here. And there is need to change the word in many cases (probably it's more often required than not). Just a few simple examples:
All male genger word with empty ending (except end with ь and seven magic letter) have "ы" in plural form. (And male gender have only this two ending). This gramatic, strict gramatic and autogenerate plural possible and very easy. (little trouble may be with fugitive vowels: лев -> львы etc (in most case  enough just mark vowel in single form and delete in plural) and unchanged word (creating list of exception)).

No, some of them still have "а" in plural form: "дом - дома", "луг - луга" again, "мех - меха" (fur). And if your list of exceptions will be hardcoded, it will require additional translator's work injected into the coding process. Also it can become longer than the list of actually used words, otherwise it will make adding new words problematic. If that list will be rendered as XML then it won't be very different from what was suggested in the OP, just more complex for the translators as they would have to additionally check every single word if it should be added in the exception list or not.
Think about it. Think around it. Perhaps you'll get some new good idea even if it would be completely different from my words.

fiziologus

Quote from: wwWraith on February 19, 2018, 02:19:07 AM

No, some of them still have "а" in plural form: "дом - дома", "луг - луга" again, "мех - меха" (fur).

And if your list of exceptions will be hardcoded, it will require additional translator's work injected into the coding process. Also it can become longer than the list of actually used words, otherwise it will make adding new words problematic. If that list will be rendered as XML then it won't be very different from what was suggested in the OP, just more complex for the translators as they would have to additionally check every single word if it should be added in the exception list or not.
Exception list need only for used in game words, not for all words in language. You need this word in ruleString?
No compare tool, just simple tool for replace words in limited and  words list, and large language processor.
And list no need hardcode, store in file as plant text ("original<separator>replace") and load as need (as static). No need XML or other thing. For simple task need simple tools only. Mechanic same as for Strings files.
And about "complex for the translators as they would have to additionally check every single word if it should be added in the exception list or not".
Elevator say "need store all language (plural, case and gender) forms in String". After this for each new word need check used form in ruleStrings and add this form in Strings. More, if word no change in some form (очки, брюки) anyway need add this word in all used form (очки must be in as wordFile as wordPluralFile as wordOtherFormFile) because engine no know about gramatic and just select one random word from file. "Yeah, this good" say wwWraith -- "need do nothing".
I say "store exception, all other must be do engine". "No" scream wwWrath -- "Need know school gramatic; mutate word in mind and compate with used in engine rules in very great work".
Logic?

I say one more time. 'Strings' need expand, this database very lang specific now, but expand wisely. 'Strings' is just word database for lower size  RulePack and use as database. Create tool-for-translators from String like replace Male/Female with gender sign, good in paper, very bad in metal.

PS: For dev team, in paper, possible find compromise between ruleFiles size and I-do-not-want-to-think translators. Just store path in keyword e.g. Word_WordPath_Syll in Strings/Word/WordPath/Syll.txt (Replace("_", "path_separator")) (initially worst and crazy idea).

wwWraith

Quote from: fiziologus on February 19, 2018, 05:47:36 PM
Elevator say "need store all language (plural, case and gender) forms in String". After this for each new word need check used form in ruleStrings and add this form in Strings. More, if word no change in some form (очки, брюки) anyway need add this word in all used form (очки must be in as wordFile as wordPluralFile as wordOtherFormFile) because engine no know about gramatic and just select one random word from file. "Yeah, this good" say wwWraith -- "need do nothing".
I say "store exception, all other must be do engine". "No" scream wwWrath -- "Need know school gramatic; mutate word in mind and compate with used in engine rules in very great work".
Logic?

You are wrong again. "Брюки" (trousers) are not "no change in some form", they just can't be used in singular form as they are lacking it (if you don't mind "брючина" - "trouser's leg"). Something like "пальто" (coat) could be a better example for what you were trying to say. But "очки" is another interesting case. Actually it is a homonym (there are 2 words with the same spelling but different meanings: "eyeglasses" and "points"). While it has singular form for "point", it lacks it at all for "eyeglasses". There are other examples: "часы" ("clocks" or "hours", no singular form for the 1st), "лук" ("onion" or "bow", no plural form for the 1st except more strict "луковицы" - "onion's bulbs"). Good luck with implementing an algorithm capable of lexical analysis.

I think that the "need to know school grammar" is certainly required when adding various word forms to the lists. But for adding only exceptions, the translator must additionally learn in details that autogenerating algorithm, keep it in mind and "run" it mentally for every word to check if it should be treated as exception. So the translator should actually think as a coder, and it's not a very common combination. And don't forget that there should be several algorithms for other word forms. So yes, I think that simple "intuitive" writing would be much easier.

Don't forget also that there are other languages besides English and Russian. Do you think that Ludeon coders know all of them? It would take a lot of work with the hired coders to implement such algorithms for every language and further ongoing problems with updates and bugfixing. That's why such algorithms are only "good in paper".
Think about it. Think around it. Perhaps you'll get some new good idea even if it would be completely different from my words.