Mango
CT

Chat with PDFs

Name is self-explanatory; it's like ChatGPT or Claude, but formatted on documents, PDFs, text files, markdown... etc. Initially, I started using ChatPDF. I tried using it in different languages to parse documents differently, which it did. But it could not format them correctly, that's why I built Chat with PDFs 😉

Tech Stack

  • • Framer-Motion, TypeScript, TailwindCSS
  • • OpenAI API Integration
  • • Pinecone Vectorization
  • • Vercel for hosting, Clerk for authentication

⚠️ Development Challenges

My biggest challenge was how to make this cost-effective for me. I wanted to make this service free but I had to pay for OpenAI. I ended up figuring out the best package for OpenAI's API plan for my personal use thus far. Oh, and formatting the text 😜

Format error for reference

PineCone Vectorization
  • Formatting was wrong, the text must be right to left (AR) for example.

Pinecone - ML Integration

PineCone Vectorization

Development Process

I have experience with ML and breaking down data into chunks for the algorithm or, in this case, PDFs & documents; however, I wasn't really familiar with using Pinecone that well, so I followed Sonny Sangha's YouTube tutorial and customized the app to my liking. I focused on letter formatting for the chat.

What Letter Formatting Involves:

  • Character Shaping (Contextual Forms): Arabic, for instance, requires shaping letters based on their position within a word. Example: Isolated: ب Initial: بـ Medial: ـبـ Final: ـب. This shaping is automatic in modern browsers, provided the text is correctly rendered with the direction attribute or an appropriate font.
  • Bidirectional (Bidi) Text Handling: Mixed LTR and RTL text can have misaligned characters unless the correct Unicode and CSS properties are applied. Example: مرحباً Hello might look odd without proper formatting.

Letter Formatting Implementation

To handle letter formatting efficiently, I used a combination of CSS properties and JavaScript. Here's a breakdown:

  • Direction Attribute: Ensured that all text elements had the correct dir attribute (ltr or rtl) based on the language.
  • Unicode Bidirectional Algorithm: Utilized Unicode markers like \u200E (LTR) and \u200F (RTL) for inline text to ensure proper alignment.
  • Font Selection: Chose fonts designed for multilingual support, ensuring proper rendering of contextual letter forms.
  • Testing: Used sample texts in various languages to validate alignment and character shaping.

Core Features

Supports LTR and RTL text with precise letter shaping for scripts with cursive fonts like Arabic, and context-aware responses powered by OpenAI and Pinecone. It supports multiple document formats, ensures accurate language parsing, and is optimized for responsive, cost-effective performance on Vercel with Tailwind CSS.

← Go back

Hassan MangoHassan MangoHassan MangoHassan MangoHassan MangoHassan MangoHassan MangoHassan MangoHassan MangoHassan MangoHassan MangoHassan Mango