Publications

Left-branching tree in CCG with D combinator

Masaya Taniguchi (JAIST) and Satoshi Tojo

LENLS 19, 2022

The essence of incremental parsing is to construct a partial syntactic structure stepwise from the head of a sentence. Thus, the parsing algorithm is preferable when we analyze the process in which human listen to/ read a sentence in the temporal order, and beneficial to reveal an intermediate state of natural language understanding and to parse a long natural language sentence. In categorial gram- mar (CG) and its variants, there are incremental parsing algorithms, but not all grammatical sentences could be parsed by these algorithms. It is empirically known that all the grammatical sentences might be parsed incrementally by CG with combinator rules. As the previous researches were software simulations of the incremental parsing, which did not show that they work sentences of any length. Thus far, we have empirically shown 4 that any grammatical sentences could be incrementally parsed, however, no proofs to this is known to us. In this paper, we show the main theorem; For the certain parsing tree generated by CG, we can generate the equivalent parsing tree without binary nodes on the right branch by the combintor rules 𝐵, 𝐷, 𝑇, where the condition is to except the backward long reference.

Decidable Parsing Algorithm for Categorial Grammar with Type-raising

Masaya Taniguchi (JAIST)

TPS 2022, 2022

One of the combinatory rules is the T combinator called a type-raising rule to correspond with swapping a head in X-bar theory, where the head takes another component. In CG and its variants, we parse a sentence by proving the theorem Γ ⊢ S where Γ is a sequence of categories, e.g., “He walks” is given by NP, NP\S ⊢ S. This proof system is known as a non-cut-free system. The non-cut-free proof is a problem for the decidability of the sequent calculus. Hence, there is a limitation in the usage of the rule in most CCG parsing algorithms. For example, a parser allows the type raising only for the noun phrase. In the present paper, we eliminate the limitation of the type-raising rule by the proof-theoretic analysis.

Losing a Head in Grammar Extraction

Masaya Taniguchi (JAIST) and Satoshi Tojo (JAIST)

KSE 2022, 2022

The treebank corpus is a collection of the tree that represents a sentence constituency and dependency relation. We are motivated to extract grammar rules from the treebank, that is to decompose the tree data structure and to find grammar rules. After the extraction, we need to validate the adequacy of the grammar so that we inspect the generative power of the obtained grammar. In this phase, the syntactic head is a significant feature, however, in the obtained grammar the head information is missing. Hence, we propose to supplement the lost head information with the type-raising rule of categorial grammar (CG). We extend the same issue to combinatory categorial grammar (CCG) and solve it using the generalized type-raising. Furthermore, we verify our grammar by the formal proof written in the proof assistant system, Isabelle/ HOL.

Unprovabiliy of Continuation-Passing Style Transformation in Lambek Calculus

TANIGUCHI Masaya (JAIST)

33rd ESSLLI Student Session, 2022

Combinatory categorial grammar (CCG) has been able to accommodate various linguistic phenomena with added combinators to categorial grammar (CG). In particular, the type-raising rule realized by such a combinator in CCG and has solved the scope change of quantifiers, and the rule is generalized as continuation-passing style (CPS) transformation. However, there is concern that CPS may exceedingly accept ungrammatical sentences. In this paper, we analyze the expanded grammar rules using the Lambek calculus, that is a formal system of CG, to restrict CPS transformation. First, we show that Barker’s CPS transformation is provable and Plotkin’s CPS transformation is unprovable in Lambek calculus. Second, we show that a subset of Plotkin’s CPS transformations which is provable in Lambek calculus. Due to the complexity of proving unprovability, we formalize the proof in Isabelle/HOL and verify it. We regard this subset represents the grammatical class in terms of Lambek calculus, and call it type-restricted CPS transformation.

Interactive CCG Parsing with Incremental Trees

TANIGUCHI Masaya (JAIST), Satoshi Tojo (JAIST), Koji Mineshima (Keio University)

33rd ESSLLI BriGaps, 2022

We aim at constructing a partial semantics structure incrementally from the beginning word of a sentence. To achieve this property, we employ combinatory categorial grammar (CCG), which enables us to acquire a semantic structure and a parsing tree simultaneously. Therefore, we introduce new rules as a natural extension of existing formalisms, and in addition, we implement an interactive parser that can externalize a parsing tree for each word input. As a result, we find a left-branching tree without employing extra memory for floating tokens that are un-treed words. This is because categorial grammar and its derivations have a confluence property in the simply-typed lambda calculus; that is, the reduction steps do not affect the resultant sequence of semantics terms. Generally, we build a constituency tree structure from a sentence using the parser with some given grammar rules, i.e., the building process is a sequence of unification between applicable Grammar rules and a given Sentence. However, we can execute the parser in the reversal way, as in definite clause grammar; when we execute the unification between an input Sentence and a parsed Tree, we can replicate a set of innate Grammar rules. This system takes a sentence as input and displays the possible parsing results in CCG. For each word input, the partial sentence is processed as follows; 1. The tree and the sentence are used to extract the grammar rules employed in the parser. 2. Based on the extracted grammar, the parser outputs all possible parsing results in CCG. The type of the partial tree is not uniquely determined, and so all the possible grammar rules are exhaustively consulted.

Incremental Derivations with Q Combinator in CCG

TANIGUCHI Masaya (JAIST) and Satoshi Tojo (JAIST)

LENLS 18, 2021

We show that there exists a left-recursive parsing tree for any given sentence generated by Combinatory Categorial Grammar (CCG). In order to do this, we add four binary deduction rules Q that correspond to Geach’s rule based on the mixed composition, which was introduced in the early study. These rules correspond to Q combinator in Combinatory Logic, and it corresponds to the exchange of the head in linguistics. This paper gives a procedure of translation from a given derivation to a left-recursive one. Our contribution includes the visualization of the transition of semantics in the incremental readings.

Interactive Grammar Extraction from a Treebank

TANIGUCHI Masaya (JAIST) and Satoshi Tojo (JAIST)

KICSS 2021, 2021

The notion of normative grammar is helpful for tagging sentences of a large corpus, that is to annotate each word by a part of speech (POS). In this research, we aim at obtaining categorial grammar rules, where the category is a generalized notion of POS. However, to find a proper set of grammar rules is computationally exponential regarding the length of sentence, and thus, a reliable but exhaustive search method is in demand. Here, we present a support system for the annotation by the CCG parser based on the bidirectionality and the non-determinism in logic programming. Contrary to the common usage of the parser, we extract a set of grammar rules from a syntax tree, and moreover, we retrieve all the probable readings with the system. In this paper, we show the parsing algorithm and employ it for the grammar extraction. Further, we consider the application of the algorithm to generate a sentence and to fill a masked sentence.

Generic Framework to Uncross Dependency

TANIGUCHI Masaya (JAIST) and Satoshi Tojo (JAIST)

AROB 25, 2020

Combinatory Categorial Grammar (CCG) called Mildly Context-Sensitive Grammar (MCSG) deals with dependencies. CCG uses the Categorial Grammar for the syntax types, and it uses the extension of simply-typed lambda calculus for the semantics types of each word. Most of the natural languages stay in this grammar as the syntax type. However, we propose the new grammar, in which the semantic type is the extension of System F to analyze the cross-serial dependencies that are the relationship of word reference over the other references. Our grammar has the CPS-transformation rules instead of the Dutch forward crossed composition rule in original CCG. That rule works as the generic framework to uncross dependency.