2014-08-29

Intervals

The current interval notation has notoriety for clashing with other notations and having weird unmatched brackets (the two cases being mutually exclusive). For instance, \(x\in[0,1]\) is ambiguous as to whether \(x\) is a fuzzy logic value (being a member of an interval) or a strict boolean value (being a member of a list). That's quite a large ambiguity for this otherwise useful notation. Also, the unacquainted can easily confuse whether a bracket is for an open end or a closed end. I've always managed to remember, but that's no excuse to keep things as they are.

I propose a simple alternative: replace \((\) with \({<}|\) and \([\) with \({\leq}|\), and reflect for the right-hand end. So, our fuzzy logic interval would be \({\leq}|0,1|{\geq}\). Intervals of integers can keep the double-dot notation, so \({\leq}|0..1|{\geq}\) represents the same thing as the list \([0,1]\). It's a list because it corresponds to a portion of \(\mathbb Z\), which I consider a list (see last post for justification).

That's about it. The best way to approximate this in ASCII is by writing things like “<=|0,1|=>”. It's better to keep the ‘=’ next to the ‘|’, so that one avoids the misparse “(<=|0,1|>) =”. And a note for \(\LaTeX\): put the ‘<’-like symbols in curly braces to avoid them spacing as if they were infix operators.

2014-08-28

Zip lists

I've laid off the maths for a while in preparation for this post. You may remember that the last post ended in a rather unsatisfactory manner. Here, I try to rectify the situation. The heart of the issue is that I want to be able to express two different types of equality between functions: the weaker set-wise complete closure and the stronger point-wise extensional equivalence.

I define a function \(f\) as being completely closed over a set \(S\) iff it satisfies these conditions (in non-mapping notation):\[\begin{aligned}
\forall x\in S.\,& f\ x\in S \\
\forall y\in S.\,\exists x\in S.\,& f\ x=y
\end{aligned}\]In other words, the domain and image of \(f\) are equal. If \(S\) is a finite set, \(f\) essentially reorders the elements. If \(S\) is infinite, there may be other things that \(f\) could do. For example, if \(f=\text{succ}\wedge S=\mathbb Z\), \(f\) is a kind of translation function. In any case, \(f\) is surjective. My aim is to be able to write this in a concise and generalisable way in mathematical notation.

If two functions are extensionally equivalent, they do the same job, and thus one can replace the other without changing the logic of the situation. However, they may have different computational properties, and hence can't be considered to be totally equal. The conditions for this relation between \(f\) and \(g\) with respect to set \(S\) can be written:\[
\forall x\in S.\,f\ x=g\ x
\]This, too, I want to have a concise and generalisable way of writing, since it is a very common relation.

So, why are these related? Well, let's start naïvely applying the principles of the notation without variables. For extensional equivalence, it is intuitive to write \(f=g\). Now, we want to limit that to some set? Maybe we say \((f=g)\ S\). That expands to \(f\ S=g\ S\) by the fork rule (since equality doesn't act upon functions). But, by implicit mapping, \(f\ S\) and \(g\ S\) are sets, so correspondences between individual elements are lost. All we've equated are the images for the functions, given a domain. If \(g\) is the identity function, the statement is that \(f\) is completely closed over \(S\).

As the title suggests, the solution lies in using lists, rather than sets. These retain an order, so there is still correspondence between individual elements in the domain and image. In fact, a function can be specified as a pair of lists, as well as a set of pairs. To say that \(f\) and \(g\) are extensionally equivalent over domain \(S\), we just have to make sure that \(S\) is cast to be a list. In symbols, \((f=g)\ \text{list}\ S\).

Both \((f=g)\ \text{list}\ S\) and \((f=g)\ S\) are useful. \(D\ {}_\cdot^2=2\cdot{}\) is an example of the former, where the implicit argument for the expression is \(\text{list}\ \mathbb C\). \((\Pi=(!))\ \text{list}\ \mathbb N\) is an example of where the list argument cannot be reliably inferred, and has a significant effect on the truth of the statement. Finally, \((\text{floor}\ \frac 2 \iota=\iota)\ \mathbb N\), where \(\iota\) is the identity function (an example of a non-bijective completely closed function).

List can occur in other situations, too. For example, \(\pm\) can be formalised by saying that it returns a 2-element list (as too does \(\sqrt \iota\), but not \({}_\cdot^{2/1}\); see an earlier post). Explicitly, \(x\pm y=[x+y,x-y]\). This gives \(\cos(A\pm B)=\cos A\cdot\sin B\mp\sin A\cdot\cos B\), which boils down to an equivalence between two lists. In fact, lists are so useful that it may be worth assuming that the standard sets are lists, which can be converted to sets as needed. This makes more sense, too, because the \(\text{list}\) function essentially introduces new information, whereas functions should lose information (like \(\text{set}\) does on lists). Also, in general, expressions like \(\text{head}\ \text{list}\ S\) are meaningless. However, given the axiom of choice, \(\text{list}\) can be defined for any set, it's just that some operations on it are meaningless.

2014-08-15

Phonotactics

This is the last piece of documentation I need before I can start creating words. That's not to say that I'll go about word creation particularly quickly. For the time being, brivla are likely to be taken from Lojban without any attempt at making them fit the correct word patterns. Remember that these rules don't apply to cmevla.

Phonotactically, but not phonemically, I distinguish between the onset and coda of a syllable. Codas can contain 0 or 1 approximants followed by 0 or 1 nasals. There are some further restrictions:
  • If followed by an obstruent, a nasal must have the same place of articulation.
  • Sequences “ij” and “uw” are not allowed unless immediately followed by a vowel.
Onsets are where most difficulty arises. There are a lot of rules to make, but with some guidelines, the number of explicit rules needed can be lessened. The main rule is that all onsets must be pronounceable at the start of a word (i.e, after a pause). This implies that all onsets start with obstruents; sonorants would be part of the preceding coda. There are two other important rules:
  • If an onset is deemed pronounceable, so is its reflection.
  • If an onset is deemed pronounceable, so is its revoicing.
The functions “reflect” and “revoice” are both self-inverse, and defined by these tables:

originalpbtdfszmnwliea
reflectionkgtdxszrnjluoa

originalptkfsxmnrwljoiaue
revoicingbdgfzxmnrwljoiaue

The name “reflect” refers to the fact that, for its input, it picks the letter on the opposite side of the corresponding table about a vertical line of symmetry. “revoice” is a more obvious name. If one imagines the consonant table being 3-dimensional, revoice is similar to reflect, but works on a different axis (the voice axis, rather than the place-of-articulation axis).

These restrictions have some interesting corollaries. For instance, it can be seen that coronal (alveolar) consonants reflect to themselves. This gives them more freedom than labial and dorsal (velar) consonants, which lives up to our expectations of them. Sonorants, by revoicing to themselves, also enjoy similar freedom. ‘f’ and ‘x’ also revoice to themselves, but are probably heavily restricted, anyway.

To simplify things further, ‘o’, ‘a’ and ‘e’ are mutually considered phonotactically equivalent. ‘i’ and ‘u’ are special cases because they are close to ‘j’ and ‘w’. For example, I don't expect ‘pji’ to be valid, since it's too similar to ‘pi’. However, ‘pja’, ‘pju’, ‘pwi’ and ‘tji’ are all valid, being distinct enough from ‘pa’, ‘pu’, ‘pi’ and ‘ti’, respectively. Of course, for ‘pja’, we need to test ‘bja’, ‘kwa’ and ‘gwa’ before we accept it (and similar for the others).

Beyond that, there is only one more easily-stated rule I can think of: onsets cannot mix voicings. All of the other phonotactic rules are left, unfortunately, to common sense. That's implicitly my common sense, but I promise to not screw up!

2014-08-05

Morphology

Before moving onto the main morphology, I'd like to get cmevla out of the way. These are practically the same as dotside Lojban's cmevla. They start with a pause (‘.’) and end with a consonant or schwa followed by a pause. The requirement for a consonant/schwa comes about because pauses can arbitrarily occur between words, so cmevla need a feature that binds them together. When using Shwa, IPA or similar orthographies, any letters can be used in cmevla (except pauses and glottal stops), even those that are not part of the language. Phonotactic rules are also done away with.

All other words begin with an obstruent (“pbtdkgfszh”) or pause, and end with a normal vowel (“oiaue”). In other words, word breaks occur in and only in these environments:
  • at a pause or glottal stop (which is part of neither word)
  • at the gap between a normal vowel and an adjacent following obstruent
This resembles Ceqli, but has a major difference: obstruents can occur in the middle of a word, in which case they were preceded by a sonorant (“mnrwlj”) or schwa (“y”). This allows much easier importation of loan words. Also – it nearly goes without saying – the scheme here is much simpler than Lojban's, with tosmabru and all. I don't rely on stress, which frees things up for songwriters. I suggest that the reason why there are barely any Lojban versions of existing songs, whether performed or not, is that it's so difficult to get the stress right (I've tried it a few times).

Other than the vowel-obstruent restriction, native words are free in form. There are no formal equivalents to lujvo, with more tanru-like structures taking their place. To keep the classes separate, brivla are required to have at least 1 nasal, and cmavo & tags have none. Like how one can recognise types of Lojban words after little practice, there shouldn't be problems telling the two types apart.

I haven't decided on an equivalent to the gismu creation algorithm yet, though I aim to have a Lojban-style one. With brivla being free-form, I'll need some way of deciding what concepts get a short word, and which get longer ones. I will likely base this on the length of the source words (by some metric).

Just to give an idea of what loan words would look like, here are some language names:
nativecmevlaIPA-style cmevlabrivla
.lojban..lozban..loʒban..lonzba
Ceqli.tjerlis..tʃeŋlis.tjerli
English.irglis..ɪŋɡlɪʃ..irgli
Esperanto.espelanton..esperanton..enspelanto
Español.espanjol..espaɲol..enspanjo
日本語.nihorgos..nihõŋɡos..nirhorgo

There are, of course, ISO code names, but I haven't worked out that system yet. Ceqli doesn't have a code, anyway. Also, stress and tone can be marked in any way that seems fit. For brivla, these have no phonemic value, but they may have in cmevla.

2014-08-04

Phonology

This should be a steady post. I post it now because I'm sick of using Lojban's FA tags when giving examples for my language. I need words, and for that I need phonemes, phonotactic rules and a writing system.

Ideally, the writing system would be Shwa (used phonemically, because things like assimilation are optional, and can vary between speakers). However, a romanisation scheme is needed. That is all I'll document here; Shwa and IPA substitutions should be obvious. Here are the consonants:
LabialCoronalDorsalGlottal
Plosivep bt dk g.
Fricativefs zh(h)
Nasalmnr
Approximantwlj

There are a few sticking points. Firstly, ‘r’ represents /ŋ/. This is just an ASCIIism derived from the shape of the lowercase letter (‘m’ → ‘n’ → ‘r’). I want newcomers to be able to write in the language without having to fiddle with their keyboard layout (though it is fun).

Then, there are the glottal consonants. ‘.’ is almost exactly the same as ‘.’ in Lojban. Notably, the dotside conventions for cmevla carry over as expected. With ‘h’, I decided to stick to the convention of many natural languages, which tend to have either /x/ or /h/, but not both. In Lojban, they're roughly in complementary distribution anyway. Instead of using Lojban's ‘'’, I use approximants to join vowels. It's your choice which of [x] and [h] you want to use, and you can mix them in your speech. Even English speakers may prefer [x] in certain circumstances.

The final thing to mention is about approximants. Unlike Lojban, I use on-glides regularly, so distinguishing between vowels and semivowels becomes necessary. Also, it should be noted that the approximants' positions on the chart are more to make the abstractions work, rather than to give phonetic information. ‘w’ and ‘j’ are the semivowels of ‘u’ and ‘i’, respectively. ‘l’ is any lateral or rhotic.

Now, the vowels:
FrontCentralBack
Closeiu
Mideyo
Opena

These are essentially the vowels of Lojban, and serve similar purposes. ‘y’ represents the schwa, which is morphologically distinct from the other vowels. In an IPAisation, it can be written as ‘ə’, but you may also want to change ‘e’ and ‘o’ to ‘ɛ’ and ‘ɔ’ to avoid them looking too similar.

The order of the vowels is “oiaue”. Tracing this out makes a star shape, thus guaranteeing maximum phonetic distance between vowels close in that sequence. The subsequence “iau” is also used (as it is in Lojban). You may or may not have noticed that the various classes of phonemes have different numbers of phonemes each. For instance, there are 3 approximants, 4 fricatives, 5 vowels and 6 plosives. These can come together to make sequences of semantically similar cmavo.

Where the Latin alphabet needs to be transcribed (like when using ISO 3166-1 codes for countries), the code letters are usually given sounds based on the ASCII orthography. I'll have to go into the morphology before describing this system, though. I will cover morphology and phonotactics soon, in that order.