🌍 Live Open Source Explorer
Explore live open-source projects and AI models.
Search public open-source repositories from GitHub and AI models from Hugging Face. Every page shows 10 results with clean pagination.
🔎 Live Search
Search live open-source data
Search GitHub repositories and Hugging Face models directly, then explore stars, downloads, source links and project details.
Live Results
GitHub Open Source Repositories
Search: linguistic-data
Page 1
Showing 10 results from 30
proycon/pynlpl
GitHub Python GNU General Public License v3.0PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There ar... Read more
External source
GitHub
ChangdeDu/BraVL
GitHub Python MIT LicenseCode and Data for "Decoding Visual Neural Representations by Multimodal Learning of Brain-Visual-Linguistic Features"
External source
GitHub
EticaAI/linguistic-datasets-portuguese
GitHub The UnlicenseLinguistic Datasets for Portuguese: Lista de conjuntos de dados linguísticos para língua portuguesa com licença flexíveis: banco de dados, lista de palavras, sinônimos, antônimos, dicionário temático, tesauro, linked data, semântica, ontologia e representação de conhecimento
External source
GitHub
Maximax67/Words-CEFR-Dataset
GitHub Jupyter Notebook MIT LicenseA dataset mapping English words to CEFR levels based on the CEFR-J dataset, word lemmas, stems, parts of speech (POS), and frequency data from the N-Gram Google dataset. Ideal for NLP tasks, language proficiency assessment, and linguistic research.
External source
GitHub
clld/clld
GitHub Python OtherA web framework to display Cross Linguistic Linked Data.
External source
GitHub
proycon/folia
GitHub Python GNU General Public License v3.0FoLiA: Format for Linguistic Annotation - FoLiA is a rich XML-based annotation format for the representation of language resources (including corpora) with linguistic annotations. A wide variety of linguistic annotations are supported, making FoLiA a useful format for NLP tasks and data interchan... Read more
External source
GitHub
cldf/cldf
GitHub Python Apache License 2.0CLDF: Cross-Linguistic Data Formats - the specification
External source
GitHub
microsoft/CodeMixed-Text-Generator
GitHub Jupyter Notebook MIT LicenseThis tool helps automatic generation of grammatically valid synthetic Code-mixed data by utilizing linguistic theories such as Equivalence Constant Theory and Matrix Language Theory.
External source
GitHub
vered1986/UnsupervisedHypernymy
GitHub Python OtherData and code for the experiments in: "Hypernyms under Siege: Linguistically-motivated Artillery for Hypernymy Detection". Vered Shwartz, Enrico Santus and Dominik Schlechtweg. EACL 2017.
External source
GitHub
dowobeha/ldc_downloader
GitHub Shell GNU General Public License v3.0Script to download corpora from the Linguistic Data Consortium (LDC)
External source
GitHub
10 results on this page · 30 total found
Showing first 30 accessible GitHub results.