Skip to content

veya2ztn/sci-bert-finetune

Repository files navigation

This is the embedding training code for the scientific embedding project: [2405.11461] DocReLM: Mastering Document Retrieval with Language Model (arxiv.org).

The dataset is from the Synthetic data build via veya2ztn/Synthetic-Science: Those script try to create Synthetic Science QA answer-question pair efficiently and reasoning (github.com) based on veya2ztn/uparxive: llm-friendly dataest for the whole arxiv .tex source. (github.com)

llm_train

This repo integrate embedder training method

  • ART
  • SGPT
  • Finetune
    • Pipline training
    • Tensor Parallel: 1D, 2D and so on
  • Gradient Cache
  • Qlora
  • Uniem

About

Fineturn your embedding model for science paper

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published