PROSODIC BOUNDARY INCONGRUENCES IN ORAL READING

Background: Characteristics of oral readings are well studied in school-aged children and teenagers, but not in educated adults. Objectives: Assess the prevalence of prosodic boundary incongruences in oral readings of adult, native, educated, Brazilian Portuguese speakers and analyze their correlations with specific linguist features. Design, settings, and participants: We studied an online video corpus of political speeches delivered by house members of the Brazilian parliament between 2017 and 2018, and their respective written texts. Measurements: We assessed a) prosodic boundary incongruences between oral readings and written texts, b) actor prototypicality of the subjects, c) thematic continuity of the sentences, and d) a variable called “sufficiency”, related to the concept of argumenthood, assorting each word according to its need for complementary words. The inter-rater reliability of the author's perceptions of incongruences underwent Cohen's Kappa test. Results: In 5 hours of oral readings, we found a median of 1.4 prosodic boundary incongruences per minute (interquartile range: 0.766 2.212). 80% of the incongruences were insertions of non-terminal or terminal boundaries. Prosodic boundary incongruency correlated positively with a) thematic continuity of the incongruent sentences (p-value = 0.0006345), b) the concept of “sufficiency” (p-value < 2.2e-16); and correlated negatively with c) first-person subjects (p-value = 0.0002584). Limitations: The assessment of the variables was subjective, and we did not control sentences for their lengths when analyzing variables “b” and “c”. Conclusions: Prosodic boundary incongruences were relatively common in our corpus. We introduced some hypotheses to explain the results.


Introduction
be proscribed as legitimate autonomous linguistic units in oral reading in the appropriate linguistic, pragmatic, and prosodic context.
If oral readers segment speech incongruently, they forge incongruent boundaries and units, and listeners with access to the written text should be able to identify the incongruences between the writing and the reading. The first question we ask is if listeners can reliably identify incongruent boundaries/units in oral reading. If they could, we would like to know why do readers produce those incongruent boundaries/units.
It is legitimate to guess that incongruent units result from some sort of language processing difficulty. For example, when a reader is reading a sentence, he may, at first, not know "what" or "whom" that sentence is about. As every sentence should ordinarily talk about "who is doing what", that information constitutes relevant semantic knowledge that should translate into coherent units and prosodic boundaries. Bornkessel-Schlesewsky and Schlesewski (11) consider the "actor" in a sentence (the "who") to be a universal cardinal category "that provides an optimal and neurobiologically plausible solution to the demands of real-time information processing". According to their model, named "actor identification strategy", listeners/readers would search for prominence features associated with agenthood, identifying actors as prototypically human, animate, definite, first-person, nominative (in nominative-accusative languages), and positioned at the first argument position in sentences. Predicates -the "what" -would be inferred by exclusion.
Other source of semantic information that readers can access is the affiliation of the current sentence or segment with other sentences or segments in the discourse. Grosz and Sidner (12) proposed a well-known "computational theory of discourse structure" to analyze these affiliations, but it has some shortcomings when applied to oral reading. First of all, it includes every sentence and segment (current, previous, and forthcoming) in discourses, and readers, as a matter of fact, do not have access to forthcoming sentences or segments. Additionally, they suggested analyses that are too subjective and may yield biased and conflicting results between judges. A less subjective approach would be to consider the semantic affiliations of the current sentence with the sentence that came just before it only, as in the topic-focus (topic-comment, theme-rheme, etc.) approach (13). A sentence that talks about something already known could have a better chance of being appropriately segmented.
A third possibility is that oral readers construct prosodic boundaries responding to linguistic features that are local and narrowly focused. For example, since prosodic boundaries exist between words, the relation between adjacent words may have the upper hand when deciding to insert or not insert terminal or non-terminal boundaries. From this angle, any word in a sentence might call for a prosodic boundary or not, and the verdict for a boundary insertion would take into account its relations with previous words in the same sentence and, eventually, some additional information from the discourse context.
That brings us back to Izre'el's ideas (8) about speech segmentation. A read sentence would be segmented into information modules or, apart from its terminal boundary, would not be segmented at all. At each word, the oral reader would decide if he inserts a prosodic boundary. If he judges the current word as the last one in an information module, he could insert a boundary. Otherwise, he would not do it. Additionally, if he thinks the word is not the last one in an information module, he presumes something is missing in the current speech segment. When we speculate that the oral reader assumes the current word needs other words (or a single word) to fulfill its semantic or syntactic needs, our thinking becomes tangential to the concept of arguments vs. adjuncts.
According to Haspelmath (14), a verbal argument may be defined as "a phrase whose occurrence is made possible by a specific verb, and which therefore cannot occur with a generic verb". The author proposes a method to identify verbal arguments: a. I wrote a letter. > *I wrote, and I did a letter. b. I wrote with a pen. > I wrote, and I did it with a pen.
The sentence in (a) shows that "letter" cannot be moved into a neighboring clause with an anaphoric verb because it is an argument of the verb "to write". On the other hand, sentence (b) shows that "pen", an adjunct, can be freely moved away from the verb. This so-called "argumentadjunct dichotomy" has also been applied to nouns, adjectives, and even prepositions (15), but it may not be as unambiguous as it seems (16). As a matter of fact, many researchers have "abstracted away from this distinction, because identifying arguments and adjuncts is a notoriously difficult task, taxing many native speakers' intuitions" (17).
Furthermore, when applied to oral reading, the approach suggested by Haspelmath (14) suffers from the same ailments we pinpointed above in Grosz and Sidner's (12): it includes segments of the speech that are yet to come to the readers' eyes. For example, if "with a pen" in sentence (b) is an adjunct, it could be wrongly omitted when reading the sentence. The reader could be led to think that the sentence ended after the verb and insert a terminal boundary, which would be incongruous with the written text. In sentence (a), "a letter" is an argument, the verb "calls for it", but its fate in reading may be the same as any adjunct. In both situations, when the reader gets to "wrote", he has precisely the same information. To that end, all that matters to the reader is if the word he is reading at any moment needs additional word(s) to complete the current segment, regardless of the arbitrary category -information module or utterance, argument or adjunct -linguists attributed to them.
To better understand how oral readers segment their readings and why they do so, we established as our primary objectives to 1) assess the prevalence of prosodic boundary incongruences between written texts and their respective oral readings and 2) investigate the correlations between the incidence of prosodic boundary incongruences in oral reading and the following elements of the written texts: a) the prototypicality of the actor; b) the thematic continuity from one sentence to another; c) the need for other words to syntactically, semantically, or pragmatically complete the current speech segment. We hypothesize that sentences that a) are thematically continuous with the previous sentence or b) bear prototypical actors have a better chance to be prosodically congruent with the written text. Additionally, we conjecture that if the word or segment under reading does not need other words to be syntactically, semantically, or pragmatically fulfilled, the reader will have a bias to insert a prosodic boundary and incongruently segment his speech.

Methods
We performed a cross-sectional study on a corpus of oral readings, identifying prosodic boundary incongruences and some pre-determined linguistic features in incongruent and congruent sentences. Then, we investigated the statistical correlations between prosodic congruency and linguistic features.

Corpus and participants
The corpus consisted of a public online database of video recordings of speeches given on the Brazilian Senate floor from 2017 to 2018 by native Brazilian Portuguese (BP) speakers (18). The speakers are Brazilian senators, and the speeches are political. The files had MPEG-4 format with a mean bitrate of 500 kbps. Each speech was delivered by one individual speaker at one particular moment of a specific day. Most speeches consisted of oral read sentences and off-the-cuff nonread sentences, so we extracted speeches with at least one sentence that was read. Then, we selected one speech per speaker, choosing the speech with the largest number read sentences. After these steps, we ended up with a corpus of 39 speeches, delivered by 39 different speakers, with at least one oral read sentence per speech.
The speeches, as mentioned, are political, and have some specificities. They may talk about many issues but typically employ persuasive elements. Prosodically, they may use higher and more variable pitches (fundamental frequency) and tend to highlight emotions more often than not (19).
It must be noted that the written texts were drafted by professional speechwriters and specifically targeted to oral delivery (reading aloud). They were accessible to the study but are not available to the general public due to ethical reasons and institutional policies. Nonetheless, speech transcriptions by professional stenographers are publicly available under the label "notas taquigraficas" (18). As the transcriptions avoid reproducing reading errors that are evident to the stenographers, they happen to be close replicas of the written texts.

Variables
Along with demographic statistics like age, sex, educational attainment, and birthplace, we collected the following variables: a) incongruent prosodic boundaries (oral reading differs from written text), b) actor prototypicality, c) thematic continuity, and d) need for other words to syntactically, semantically, or pragmatically complete the current speech segment. Similar to a typical cross-sectional study, we could say that the presence of variable "a" determined the "cases", and its absence determined the "non-cases". Correspondingly, the presence of variables "b", "c", and "d" defined the "exposed," and their absence determined the "not exposed".
The exposition to the linguist features of variables "b" and "c" applied to the entire sentences where the variables appeared. In these situations, we had "exposed" and "not exposed" sentences that would have an incongruent prosodic boundary (variable "a") or not. On the other hand, the exposition to the linguistic features of variable "d" applied both to the sentences and to the exact word transitions where they appeared. Altogether, variables "b" and "c" looked at associations with incongruencies at the sentence level, and variable "d" also looked at associations at a local-word level. That is, with variable "d" there were also "exposed" and "not exposed" word transitions, in addition to "exposed" and "not exposed" sentences.
Sentences with incongruent prosodic boundaries had their inter-rater reliability between authors and a group of native BP speakers validated by Cohen's Kappa test.

Assessment of the variables.
We assessed each variable as follows.

Incongruent prosodic boundaries
The delimitation of sentences observed the punctuation (periods) defined by the written text. One of the authors read each sentence and then watched the video of its oral reading. Incongruences in speech segmentation between the oral readers and ourselves were annotated. Our measures for incongruences were very tolerant. We admitted as congruent all prosodic boundaries (or their absences) that could eventually be considered an acceptable speech segmentation by the intuitions of a native BP speaker. We only treated a boundary as incongruent when the speaker read the sentence without a "phrasing that was consistent with the author's syntax" (6), i.e., when the speech segmentation of the sentence as a whole, meaning the combination of its prosodic boundaries, was inconsistent with the proper delivery of the message intended by the written sentence. Formal syntactic criteria, per se, were not applied when evaluating prosodic boundary congruence.
Additionally, we categorized the boundaries as terminal or non-terminal. A congruent terminal boundary could only be inserted at the end of a written sentence and a non-terminal one at any point where, according to the indulgent definition we adopted, it would fit. Hence, four types of prosodic boundary incongruences were annotated: insertions of prosodic boundaries (1. terminal, 2. non-terminal) and deletions of prosodic boundaries (3. terminal, 4. non-terminal). Some annotated samples extracted from the corpus are provided in the Appendix, including audio files and English translations.

Actor prototypicality
We analyzed the syntactic subject of each sentence's clause along the lines followed by the actor identification strategy model (20). The classification included the following binary features: 1) person (first vs. other), 2) human (yes vs. no), 3) animacy (animate vs. inanimate), 4) position (before vs. after the verb), and 5) definiteness (definite vs. indefinite). We added the following features to the classification: 6) voice (active vs. passive), and 7) subject drop (yes vs. no). About item 7, BP, in contrast with English or French, may not realize the subject. When the subject is not overtly present (subject drop), it can be figured out based on pragmatic or grammatical elements (i.e., agreement on the verb).

Thematic continuity
We labeled each sentence as thematic continuous or thematic discontinuous. Thematic continuity meant the current sentence topic (theme) had been a topic or comment (focus, rheme) in the previous sentence. In multi-clause sentences, we analyzed the main clause or the first coordinate clause. Additionally, we considered a first-person subject consistent with thematic continuity since the first person is always known and positioned at the center of any discourse.

The need for other words to syntactically, semantically, or pragmatically complete the current speech segment
The approach here was less orthodox and will be arbitrarily named "sufficiency". We tried to put ourselves in the readers' shoes, simulating an extreme situation where the reader would be completely blind to whatever would come after the word he was reading at any specific point in time. At that moment, with only the syntactic, semantic, and pragmatic information gathered so far in the current speech and sentence, could that word be assumed to be the last word in an utterance or information module? For instance, in "I wrote a letter", "letter" could be the last word in the sentence/utterance, and the reader could be led to insert a terminal prosodic boundary. However, the sentence/utterance went on: "I wrote a letter with a pen". Again, it would be possible to think that "pen" was the last word in the sentence/utterance, and a terminal prosodic boundary could be inserted. But, again, the sentence could be much longer, offering many new opportunities for prosodic boundary incongruences: "I wrote a letter with a pen my father gave me as a birthday present last year when my mother came back home from abroad." In this case, we would say that, sequentially, the words "letter", "pen", "me", "present", "year", "home", and maybe "back" are "sufficient": they do not need other words to syntactically, semantically, or pragmatically complete the current speech segment.
We must not forget that our study deals with Brazilian Portuguese, which has some grammatical features of its own and, as such, behaves differently than English. For instance, adjectives in BP usually follow nouns, and that shapes opportunities for incongruent insertion of boundaries after nouns, as in "comprei um carro velho" (I bought an old car), where "velho" is the adjective (old) qualifying the noun "carro" (car). The reader may think the sentence goes as far as "carro", as in "I bought a car that is old", and insert an incongruent prosodic boundary after it. Nevertheless, we provide additional examples of putative sentences in English. 1) "Without spoken words, facial expression and gesture must carry the meaning." In this sentence, "facial expression and gesture" is the subject of the clause. However, the reader may think they are coordinated with "spoken words", as in "Without spoken words, facial expressions and gesture, [something else] must carry the meaning", and insert a non-terminal boundary after "gesture". 2) "To begin with, remember what a word is: a long-term memory linking of pieces of phonological, syntactic, and conceptual structures." The reader may read "memory" and think that "a long-term memory" is "what a word is", as in "a word is a long-term memory". He could insert a nonterminal or terminal prosodic boundary after memory, but that would be incongruent because "a word is: a long-term memory linking". 3) "Phrasing (also referred to as grouping) is associated with the segmentation of utterances into variable prosodic units and prosodic theory and phonological studies refer to several prosodic categories and units ranging from syllable to utterance." (21). Here, "prosodic theory" is part of the subject of the verb "refer" but the reader may think it is coordinated with "variable prosodic units", as in "the segmentation of utterances into variable prosodic units and prosodic theory". In this case, he could incongruently delete a non-terminal prosodic boundary after "prosodic units" and insert a non-terminal (or even a terminal) one after "prosodic theory". In a way, this reasoning could also apply to the so-called garden path sentences, as in the illustrious: "While Mary bathed the baby played in the crib." Here, the reader could insert an incongruent non-terminal boundary after "baby".
Taking all that into account, we labeled each word of every sentence according to its "sufficiency", as in the examples below, where "/" indicates "sufficiency" (of the previous word): a) I ate/ my soup/ with a spoon/ my father gave me/ as a birthday present/ last year/ when my mother returned home/ from abroad. b) To begin with/, remember what a word is/: a long-term memory/ linking/ of pieces/ of phonological, syntactic, and conceptual structures.
Sentence (a) labeling is straightforward, as every "/" marks a possible end to the utterance, but sentence (b) needs additional explanation. For example: "with" in "to begin with" was considered "sufficient" because "to begin with" is a usual phrase, a fixed expression, and the reader does not need to look for other words to complement the segment (an "information module"), as he would need with "to" and "begin". The other marked words -"is", "memory", "linking", and "pieces" -follow the same pattern of the sentence (a) and could be the last word of an utterance.
Additionally, as can be seen (in italics) in sentences (a) and (b), we identified in every sentence the first segment that could be considered an utterance -the smallest linguistic unit with pragmatic autonomy and interpretability in isolation, "the counterpart to a speech act", "the primary reference unit for the analysis of speech", akin to the Language into Act Theory (22). For instance, "I ate" in sentence (a), and "Remember what a word is" in sentence (b) are the smallest (and first to appear) linguistic units that have interpretability in isolation and pragmatic autonomy. Thus, readers could eventually (and erroneously) interpret them as utterances. Bethink that this procedure is an adaptation of concepts originally applied to spoken speech; but, as we are dealing with written sentences destined to be read and acquire prosodic features, we take the liberty and run the risk of expanding its conventional applicability.
Summing up, we categorized each word along two axes: a) sufficiency (yes vs. no) and b) autonomy (belongs to the first segment in the sentence with pragmatic autonomy and interpretability in isolation: yes vs. no)

Validation of the variables
We compared the author's perception of incongruent prosodic boundaries with four other BP speakers' perceptions and assessed the inter-rater reliability with Cohen's Kappa test (23) performed in the software RStudio (24). Since participants had no formal linguistic education, they were shown, as a preparatory step, four sentences (from the corpus) with all types of incongruent prosodic boundaries. The sentences were annotated to represent the authors' specific perceptions of the incongruences. The annotations included only insertions ("/") and deletions ("*") of prosodic boundaries, regardless of the terminality or non-terminality of the boundary. The protocol each participant individually followed at the preparatory phase was: 1) read the sentence, 2) listen to the audio of the actual oral reading of the sentence, 3) read the sentence with annotations showing the authors' perceptions of prosodic boundary incongruences, 4) repeat any of the steps at will, if needed.
After the preliminary phase, each participant received a set of audio files with the same 40 pairs of sentences from the corpus and their respective written texts, each pair containing one sentence with and one without prosodic boundary incongruence(s). The procedure they had to follow was: 1) read the sentence, 2) listen to the audio of the oral reading, 3) if one of the sentences of the pair is incongruent, mark it. They were asked to apply the procedure to at least ten pairs of sentences. The findings of the test are described in the results section.

Study size and sources of bias
Awareness of the value of the variable that defined non-cases and cases (prosodic boundary incongruence) could distort the assessment of exposition variables (variables b, c, and d). Thus, we tried to avoid information bias by evaluating the exposition variables without information about the value of the case-defining variable.
We analyzed many oral read sentences extracted from a database of speeches delivered by a few dozen speakers in a frame of time. There was a potential selection bias towards speakers that gave more speeches during that time. We avoided the selection bias by choosing only one speech per speaker. Even so, there was also a selection bias towards speakers that read more material during their speeches. In order to have enough sentences to analyze, we did not try to eliminate that last bias.
We did not perform formal, a priori, sample size calculations to determine a proper size for our sample. However, we had an a priori estimate of the incidence of prosodic boundary incongruences and of the number of speeches speakers used to deliver. Therefore, we believed a two-year timeframe of the database would yield enough read sentences to analyze.

Statistical methods
We used descriptive statistics to summarize information about demographics and corpus characteristics. Then, as we deal with associations between categorical nominal variables, we applied chi-squared tests. Finally, when analyzing actor prototypicality, we tested the variables as a group (all characteristics lumped together) and, in order to discriminate between joint effect and individual effects, we tested each relevant subgroup.

The speakers
From a universe of more than 81 potential speakers, we selected 39. The criteria for the selection were a) the speaker had read aloud any sentence of his speech, and b) the original written text upon which the oral reading was based was available to the authors. Table 2 shows the speakers' demographic characteristics. Note that "ages" reflect the moment speeches were delivered.

The corpus
The 39 speeches amounted to 8h28min of video recordings (individuals speaking with and without reading) with 5 hours of oral readings. The median oral reading time was 473 seconds (7.88 minutes). Table 6 shows reading times per speaker.

The inter-rater reliability
The Cohen's Kappa test included four participants, with the following demographic characteristics:    Table 6 shows the prosodic boundary incongruences we identified, distributed by speaker. Speakers produced a median of 1.4 prosodic boundary incongruences (PBI) per minute across speeches. Most incongruences (80%) were prosodic boundary insertions, notably non-terminal boundaries (54% of all incongruences, 67% of boundary insertions). Figure 1 shows a histogram of the number of speakers in each category of PBI per minute, and Figure 2 depicts the median and interquartile range of PBI per minute in the corpus. Another way to look at the data is to acknowledge the speech rate (in words per minute) and the PBI incidence per spoken word. The basal speech rate for each speaker was measured on stretches of fluent oral readings to avoid unwanted effects of incongruent prosodic boundaries and other dysfluencies on the calculations. The mean speech rate was 119.7 words per minute (interquartile range 109.5 -127), and the incidence of PBI per spoken word was 1.39 PBI per 100 words (interquartile range 0.7-1.9). Pearson's product-moment correlation test showed no correlation between basal speech rate and incidence of PBI per word (t=0.051908, p-value=0.9589). Table 3 shows the number of incongruent and congruent sentences in each category of thematic continuity: continuous and discontinuous. There were a disproportionally high number of thematically continuous incongruent sentences. Pearson's Chi-squared test with Yates' continuity correction suggests an association between thematic continuity and prosodic boundary incongruence (p-value = 0.0006345).  Table 4 shows the number each type of prosodic boundary incongruence (PBI) in each category of thematic continuity. There seemed to be a disproportionally higher incidence of insertion of non-terminal boundaries (INT) associated with thematic continuity, but Pearson's Chi-squared test did not confirm it (p-value = 0.1426).

Actor prototypicality
We identified 38 different types of "actors" from the combination of the prototypicality features we choose to analyze. Table 5 shows the distribution of the most frequent types, bundling all the features in a six-position string of characters -XXXXXX -, corresponding, sequentially, to the following features: 1) Voice: active (A) vs. passive (P) vs. or subject drop (D); 2) Position: before (1) vs. after (0) the verb; 3) Person: first (1) vs. other (0); 4) Human: yes (1) vs. no (0); 5) Animate: yes (1) vs. no (0); 6) Definite: yes (1) vs. no (0). The last column of Table 5 (actor congruency ratio) shows the ratio of the counting of each type of actor in each type of sentence. Negative ratios indicate there were more in incongruent sentences with that type of actor.  The most frequent type of actor was "A10001": active voice, pre-verbal, non-first person, non-human, inanimate and definite. Subject-drop actors occupy the next three positions. Passive voice was relatively rare, representing only 3% of actors in any kind of sentence.
Pearson's Chi-squared test with simulated p-value applied to the numbers of actors in congruent and incongruent sentences in Table 5 yields a p-value of 0.0002332, suggesting an association between actor prototypicality and congruency. The actor congruency ratio (last column of the Table 5) indicates the most congruent type of actor was A11111: active voice, preverbal, first-person, human, animate, and definite. Conversely, the most incongruent actor was A10111, which differs from A11111 only at the first-person feature (it is non-first person).
In order to pinpoint the differences in congruency ratios between actors, we reclassified the actors into subgroups. Table 7 shows the first reclassification, considering only the voice and subject drop features. Pearson's Chi-squared test suggests no association between those features and congruency (p-value = 0.5115). The next reclassification segregated the results of each of the other features. Table 8 shows them as a stack of contingency tables.  Incongruent  Position  0  59  48  1  651  451  Person  0  528  416  1  182  83  Human  0  472  342  1  238  157  Animate  0  472  342  1  238  157  Definite  0  172  136  1  538  363 We see, at first, that the results for "human" and "animate" are identical. It brings to light that our corpus didn't have animals or other credible non-human animated entities as actors. Next, we applied the Pearson's Chi-squared test with Yates' continuity correction to each contingency table, obtaining the following p-values: position (0.4925), first person (0.0002584), human (0.4909), and definite (0.2614). These results indicate there is an association between first-person subjects and prosodic boundary congruency. Although we did not analyze interactions between variables, the association we found between A11111 subjects and congruency may stem from the first-person feature alone.
The first-person feature had an additional characteristic that we must consider (and will discuss later): only 12% of them were overtly expressed. The other 88% were subject-dropped.

Sufficiency
By "sufficiency", we mean the need for other words to syntactically, semantically, or pragmatically complete the current speech segment. Words that did not need other words were "sufficient". We analyzed every word of the written texts to check if they were sufficient or not (in the context they appeared). Table 9 shows the number of words in each category of sufficiency (yes/no) in each type of sentence (with or without PBI). Pearson's Chi-squared test with Yates' continuity correction suggests an association between sufficiency and incongruency (p-value = 0.001381), which means that incongruent sentences had comparatively more sufficient words than congruent sentences. Next, we analyzed only incongruent sentences, labeling every word as 1) PBI (yes/no) and 2) sufficiency (yes/no). Note that PBI here does not refer to sentences, but to each specific PBI we found in the readings. Table 10 shows the results, and Pearson's Chi-squared test with Yates' continuity correction suggests a strong association between word sufficiency and PBI (p-value < 2.2e-16).  Table 10 included all types of PBI and maybe we should have excluded PBIs of the type "deletion" from the analysis. Prosodic boundaries can only be deleted where prosodic boundaries should exist, and they mustn't exist after words that need other words to complete their speech segments. Indeed, we found only one event of PBI of the type "deletion" after a word that was not sufficient. For this reason, we excluded PBI deletions from the counting and represented the results in Table 11. Pearson's Chi-squared test with Yates' continuity correction keeps sustaining a p-value < 2.2e-16, confirming the association between sufficiency and prosodic boundary incongruence of the type "insertion". At last, Table 12 shows the association between PBI and the "autonomy" of the segment where the word is. Considering only words without "sufficiency" (words that need others), a word has "autonomy" if it belongs to a segment that can be interpreted as an utterance. As we have already seen, there were not many PBI associated with words without sufficiency. From Table 12, we see that, for those words, "autonomy" also does not correlate with congruency: it did not matter if the word belonged to a potential autonomous utterance or a nonautonomous information module (Pearson's Chi-squared test with Yates' continuity correction, pvalue = 0.1585).

Discussion
We aimed to assess the prevalence of prosodic boundary incongruences in oral readings and to investigate their associations with some linguistic features of the written texts.
The material we analyzed amounted to 5 hours of political oral readings, delivered by 39 native BP speakers from all regions of Brazil, most of them males, in their fifties or sixties, with at least a college degree. Results are briefly summarized below.

Prevalence of prosodic boundary incongruences
Prosodic boundary incongruences (PBI) were relatively common, arising more than once per minute of reading, with an interquartile range of 0.766 -2.212, and only one speaker (who read only one sentence) performing a PBI-free reading. Incongruent prosodic boundary insertions accounted for 80% of all PBIs. Inter-rater reliability of prosodic incongruence of sentences measured by Cohen's Kappa Test with four participants showed an agreement with the author's judgments between fair and substantial.

Prosodic boundary incongruences vs. thematic continuity
We hypothesized that thematic-continuous sentences would have a better chance to be prosodically congruent with the written texts. Surprisingly, we saw the opposite: thematiccontinuous sentences had more PBI than thematic-discontinuous sentences. In other words, familiarity correlated with incongruency.

Prosodic boundary incongruence vs. actor prototypicality
Our hypothesis was that prototypical actors would be associated with prosodic congruency, and, indeed, we found that active voice, pre-verbal, first-person, human, definite actors were more prevalent in congruent sentences. However, analyzing each of those features individually, we found that only the first-person feature had a statistically significant association with congruency.

Actor prototypicality vs. thematic continuity
Our thematic continuity assessment included first-person actors as a criterium for continuity. Since first-person actors were independently associated with congruency, as seen in the assessment of actor prototypicality, the first-person feature of thematic continuous sentences may have reduced their association with incongruency. Nonetheless, we still found a statisticalsignificant association between thematic continuity and incongruence. Furthermore, if we had not included the first person as a criterium for thematic continuity, we might find some association between thematic continuity and subtypes of PBI, as seen in Table 4.

Sufficiency
We hypothesized that prosodic boundary incongruency could be related to local word-level features and, then, "sufficiency", as we defined it, could be associated with a readers' bias towards prosodic boundary incongruence. Our findings confirmed it, showing significantly more sufficient words in incongruent sentences and significantly more PBI after sufficient words. Along with that, we found that it did not matter if a non-sufficient word belonged to a potential utterance or an information module.

Limitations
In addition to the non-experimental design, our study has limitations we will try to diagnose and report.

Corpus and speakers
The speakers represent only a particular stratum of BP speakers: skewed to male, older, educated, upper class. Speakers who read more were overrepresented compared to those who read less or did not read at all. The speeches were also very specific, as they were political.

Variables
There was no hard science in the measuring of our variables. Firstly, even though we assessed the inter-rater reliability of prosodic boundary incongruence perceptions, they are still perceptions. Secondly, the concept and criteria we proposed to measure "sufficiency" are still fuzzy and need more clarification and inter-rater validation. Therefore, when it comes to the association between prosodic incongruency and word sufficiency, what we can say for sure is that our perception of prosodic boundary incongruence is strongly associated with our perception of word sufficiency. Thus, replications of this study would be reassuring.

Confounding
We analyzed variables associated with the congruency of entire sentences. However, sentences may have different extensions, and larger sentences may have more incongruences than shorter ones. Since we did not control sentences by their extensions, actor prototypicality and thematic continuity may be associated with sentence extension, and sentence extension may be the middleman between those variables and congruency. When it came to sufficiency, we also analyzed specific locations inside sentences, and thus, in that setting, extensions were not confounding factors.

Thematic continuity
Supposing the association between thematic continuity and prosodic segmentation incongruence is genuine, it would be interesting to speculate why. The speculations we put forward are consistent with the interpretation that written language is incongruous with spoken language, and mechanisms of language processing designed by nature to help the latter do not necessarily benefit the former.
One line of thought would be the idea of priming. Current sentences tend to be biased towards previous sentences' syntactic, semantic, or even prosodic (26) characteristics. Therefore, when the current sentence topic has been mentioned in the previous sentence, that may redeem latent primed features that do not help the current sentence's speech segmentation and, probably, may even disrupt it. Another possibility is that active mechanisms like predictive processing that "exploits multiple constraints in parallel across the different levels of linguistic representation" may play a role in misguiding the reader's interpretation (27). Finally, we could propose that a more semantic approach, like the concept of "preparedness", meaning "information that is given makes contact with linguistic material that came before, as well as with background knowledge, and integration of the input with preceding context and knowledge leads to the creation of a rich semantic representation", induce a misrepresentation of the forthcoming sentence (28).

Actor prototypicality
Actor prototypicality, the next variable we studied, showed an association between first-person actors and sentence congruence. A reanalysis of the data revealed that most first-person actors were not overtly expressed (subject drop, 88%). However, subject drop alone was not associated with congruency. Reviewing the data, 265 of 325 subject drops were first person, suggesting that maybe the 60 second and third-person subject-dropped actors were associated with incongruency.
In either way, it seems that the Actor Identification Strategy model (20) does not help prosodic speech segmentation in oral reading. What does seem to help speech segmentation is a higher degree of grammaticalization of the actor: BP marks first-person subjects on the verb, which means BP has morphosyntactic properties that unequivocally relate an argument (in this case, the first-person subject) to its clause, helping readers understand the sentence's grammatical relations and apply proper speech segmentation.

Sufficiency and autonomy
From the analyses of "autonomy" and "sufficiency", we learned that a) belonging to the smallest linguistic unit with interpretability in isolation and pragmatic autonomy (the minimal utterance extracted from a larger sentence) does not bring about prosodic congruency, and b) prosodic speech segmentation appears to be highly responsive to local word-level properties. Whenever a word in any written phrase or clause or sentence was interpretable as the last one in its respective segment, the odds were that an incongruent prosodic boundary (usually, a non-terminal prosodic boundary insertion) could surface.

Beyond the bounds
A guideline for reporting observational studies (29) advises researchers to give "a cautious overall interpretation of results". Nevertheless, even at the risk of lacking in moderation, we will put forward a particular interpretation of our results. As we have seen, readers inserted, overall, more than one incongruent prosodic boundary per minute, revealing a bias toward incongruent prosodic boundary insertion and hyper-segmentation of their readings. But why did they segment more instead of less?
Chomsky (30) holds that boundless expressions are the "most basic property of human language", and recursive structures can yield sentences with infinite words. Christiansen and Chater (27) maintain that language processing works under pressure and that the fundamental constraint on language is the working memory. Cowan (31) clarifies that working memory has a mean capacity of 3.5 independent items, ranging from a minimum of 2 to a maximum of 6 items. So, how can sentences have infinite words if working memory has such a limited capacity?
One possibility is that sentences incorporate each new word to the bulk of previous words in a straightforward merge operation. Then, working-memory limits would never be under pressure because it would have only one (the bulk of words) or two (the bulk plus a new word, before the merge operation) of its slots occupied at any time. However, we know that sentences, as defended by Chomsky (30), are composed of "hierarchically structured expressions", and dependencies between words are not always contiguous.
As a matter of fact, hierarchical structures have been interpreted as a domain-general cognitive response to memory constraints. In language, they materialize as progressive merging of phonetic, phonological, word, phrase, clause, sentence, and discourse-level units, from lower to higher levels (32). Besides, these levels are not abstract or arbitrary, as they have tangible neurophysiological signatures in the brain (33). The most studied of those signatures is a centroparietal electroencephalographic positive wave that has been traced to domain-general cognitive phrasing, or segmenting, of any flow of sequential units that must be dealt with by the human brain (34).
In language, there is a correlation between neurophysiological markers of cognitive segmentation and prosodic boundaries. It is not a cause-effect relationship but an association: apparently, prosodic boundaries happen simultaneously with the closure of linguistic segments (35). Our finding that readers are biased to prosodically over segment their oral readings suggests that they are forming shorter linguistic segments in their working memories. It may emerge from a universal cognitive pressure to chunk words into units, transfer those units to long-term memory, and free space in the reader's working memory.

Generalizability
Our results have some characteristics that may hinder their universality. Firstly, we deal with a particular language, and, as we saw, grammatical features may be relevant to the readers' inclination to produce prosodic boundary incongruences. Then, other languages may have different prevalence of the phenomenon.
Secondly, we dealt with a particular group of BP speakers and a specific kind of oral readings: political speeches. Therefore, our results may not apply to other speakers in other circumstances.
However, as long as we proposed a universal mechanism behind the bias towards incongruent prosodic boundary insertion, it is fair to stipulate that it must be ubiquitous if it expects to have any merit.

Foreword
Samples of sentences from the corpus are provided below. Each example includes a) the original written text in Brazilian Portuguese (in italics) with annotations showing the authors' perceptions of incongruent prosodic boundaries, b) an audio file with the corresponding oral reading, and c) an English translation. In order to give an idea of the original constituent arrangements, translations try to keep constituents in the same relative positions as in the original texts, neglecting better translations options.

3) Essa ampliação/ do atendimento é muito bem-vinda, e deve ser buscada e estimulada. (audio example 3)
The increment of services is welcome and must be pursued and encouraged.

4) Esses cálculos foram tabulados/ pelos auditores com base nos dados de empregos formais do Ministério do Trabalho. (audio example 4)
The estimates were tabulated by the auditors, based on employment data from the Ministry of Work.
(audio example 7) To some people, States that are economically weaker hold ruling powers that are beyond what should be expected in proper federations.
8) É preciso, porém, avaliar se a grande maioria da população vem sendo prejudicada em razão da irresponsabilidade de alguns poucos perdulários.** (audio example 8) It is necessary to check if most people are not being harmed because of some profligates. About 75% of our country's water is on rivers of the Amazon Basin, which is inhabited by less than 5% of our population.

19) É a recompensa pelo trabalho que permite o consumo.** (audio example 19)
It is the earnings from work that allow consumption.