• Part of
    Ubiquity Network logo
    Join Mailing List Publish with us

    Read Chapter
  • No readable formats available
  • @PhilosTEI: Building Corpora for Philosophers

    Arianna Betti, Martin Reynaert, Hein van den Berg

    Chapter from the book: Odijk J. & van Hessen A. 2017. CLARIN in the Low Countries.

     Download

    For philosophers to be able to take a computational turn in their field, especially if that field relies heavily on historical material, it is crucial to be able to build high-quality, easily and freely accessible corpora in a sustainable format composed from multi-language, multi-script books from different historical periods. At the moment, corpora matching these needs are virtually non-existent. Within the CLARIN-NL project @PhilosTEI, we have addressed the problem of building this kind of corpora by developing an open-source, web-based, user-friendly workflow from textual images to TEI, based on state-of-the-art open-source OCR software Tesseract, and a multi-language version of TICCL, a powerful OCR post-correction tool. We have demonstrated the utility of the @PhilosTEI tool by applying it to a multilingual, multi-script corpus of important 18th to 20th century European philosophical texts.

    Chapter Metrics:

    How to cite this chapter
    Betti, A et al. 2017. @PhilosTEI: Building Corpora for Philosophers. In: Odijk J. & van Hessen A, CLARIN in the Low Countries. London: Ubiquity Press. DOI: https://doi.org/10.5334/bbi.32
    License

    This is an Open Access chapter distributed under the terms of the Creative Commons Attribution 4.0 license (unless stated otherwise), which permits unrestricted use, distribution and reproduction in any medium, provided the original work is properly cited. Copyright is retained by the author(s).

    Peer Review Information

    This book has been peer reviewed. See our Peer Review Policies for more information.

    Additional Information

    Published on Dec. 28, 2017

    DOI
    https://doi.org/10.5334/bbi.32


    comments powered by Disqus