๐ŸŒฑ Open Source โ–พ

๐ŸŒ Live Open Source Explorer

Explore live open-source projects and AI models.

Search public open-source repositories from GitHub and AI models from Hugging Face. Every page shows 10 results with clean pagination.

๐Ÿ”Ž Live Search

Search live open-source data

Search GitHub repositories and Hugging Face models directly, then explore stars, downloads, source links and project details.

Reset Search โ†ป
๐Ÿ”Ž
๐ŸŒ

Try keywords like automation, CRM, analytics, chatbot, llama or workflow.

Choose where to search live data.

Live Results

GitHub Open Source Repositories

Search: DATAGEN

Page 1

Showing 10 results from 55

D

starpig1129/DATAGEN

GitHub Python MIT License

DATAGEN: AI-driven multi-agent research assistant automating hypothesis generation, data analysis, and report writing.

โ˜… 1,751 Forks 238 starpig1129 Updated 12 Jun 2026
D

databrickslabs/dbldatagen

GitHub Python Other

Generate relevant synthetic data quickly for your projects. The Databricks Labs synthetic data generator (aka `dbldatagen`) may be used to generate large simulated / synthetic data sets for test, POCs, and other uses in Databricks environments including in Delta Live Tables pipelines

โ˜… 475 Forks 97 databrickslabs Updated 13 Jun 2026
D

firmai/datagene

GitHub Jupyter Notebook

DataGene - Identify How Similar TS Datasets Are to One Another (by @firmai)

โ˜… 205 Forks 26 firmai Updated 08 Jun 2026
L

ldbc/ldbc_snb_datagen_spark

GitHub Java Apache License 2.0

Synthetic graph generator for the LDBC Social Network Benchmark, running on Spark

โ˜… 183 Forks 61 ldbc Updated 12 Jun 2026
D

MaterializeInc/datagen

GitHub TypeScript Apache License 2.0

Generate authentic looking mock data based on a SQL, JSON or Avro schema and produce to Kafka in JSON or Avro format.

โ˜… 168 Forks 20 MaterializeInc Updated 25 May 2026
D

FINRAOS/DataGenerator

GitHub Java Apache License 2.0

DataGenerator is a Java library for systematically producing large volumes of data. DataGenerator frames data production as a modeling problem, with a user providing a model of dependencies among variables and the library traversing the model to produce relevant data sets.

โ˜… 164 Forks 165 FINRAOS Updated 25 May 2026
J

jrnd-io/jr

GitHub Go MIT License

JR: streaming quality random data from the command line

โ˜… 145 Forks 30 jrnd-io Updated 05 Jun 2026
T

hassan-mahmood/TIES_DataGeneration

GitHub Python MIT License

Dataset Generation Code for: S.R. Qasim, H. Mahmood, and F. Shafait, Rethinking Table Parsing using Graph Neural Networks (2019)

โ˜… 123 Forks 40 hassan-mahmood Updated 29 Dec 2025
S

youdao-ai/SRNet-Datagen

GitHub Python Apache License 2.0

This is a data generator of SRNet which is the model of paper Editing Text in the wild.

โ˜… 115 Forks 31 youdao-ai Updated 16 Mar 2026
S

maropu/spark-tpcds-datagen

GitHub Scala Apache License 2.0

All the things about TPC-DS in Apache Spark

โ˜… 112 Forks 44 maropu Updated 23 May 2026
Pagination Page 1 of 6

10 results on this page ยท 55 total found

Showing first 55 accessible GitHub results.