c-lexical-analyzer

A C++ program for lexical analysis of C code

· Documentation · Report Bug · Request Feature

📔 Table of Contents

About the Project
FAQ
License
Contact

🌟 About the Project

📷 Screenshots

🎯 Features

The lexer efficiently tokenizes C code, recognizing keywords, identifiers, operators, constants, and preprocessor directives.
The lexer has a preprocessor which handles comments and preprocessor directives.
The lexer efficiently handles various operator types (arithmetic, relational, logical, bitwise, and ternary), along with character constants, string literals, and invalid characters.
Output Formatting: Tokens are formatted for output, including their type and lexeme.
Effective Use of Standard Library: The code leverages various components from the C++ Standard Library, including string manipulation (), input/output operations (), and data structures (, <unordered_set>, <unordered_map>). This showcases the power and efficiency of utilizing standard library features for common programming tasks.
Assertions for Testing and Debugging: The inclusion of assertions (assert) in the test cases ensures the correctness of the lexer's behavior during testing. Assertions help detect unexpected conditions and provide valuable feedback for debugging, contributing to the reliability and stability of the codebase.
Object Oriented Design: The lexer and its classes embrace object-oriented principles, ensuring readable, maintainable, and modular code. Core concepts like encapsulation and polymorphism are demonstrated through classes such as Token and Lexer, promoting code reusability and facilitating easier maintenance and testing.

🧰 Getting Started

‼️ Prerequisites

C++ Compiler
A Build system

🏃 Run Locally

Clone the project

https://github.com/vivekkdagar/c-lexical-analyzer

Navigate to the Project Directory

cd c-lexical-analyzer

Install dependencies: Ensure C++ Compiler and a build system is installed
Build the project using your preferred build system (I used CMake)
Execute main.cpp. You can change the text in the input string of main.cpp to tokenize different code

❔ FAQ

What is the purpose of the Token class?

The Token class represents a lexical token in the C/C++ code. It encapsulates information about the token type and its corresponding lexeme.

How are tokens categorized in the lexer?

Tokens are categorized into different types such as keywords, identifiers, constants, operators, and special symbols. The lexer analyzes the input code and assigns appropriate token types based on predefined rules.

How does the lexer handle invalid characters or tokens?

If the lexer encounters invalid characters or tokens, it categorizes them as INVALID_TOK and includes them in the token stream. These tokens can be identified and handled appropriately during further processing.

What is lexical analysis, and why is it important in compiler design?

Lexical analysis, also known as scanning or tokenization, is the first phase of the compilation process. It involves breaking the input code into a sequence of tokens representing the language's fundamental elements. Lexical analysis is crucial as it provides a structured representation of the source code, which facilitates further processing by subsequent phases of the compiler.

How does the lexer handle whitespace and comments in the input code?

The lexer typically skips over whitespace characters such as spaces, tabs, and newline characters. It also recognizes and ignores comments, ensuring that they do not contribute to the token stream. By discarding whitespace and comments, the lexer focuses on extracting meaningful tokens from the code.

Can the lexer handle complex tokenization scenarios, such as string literals or nested expressions?

Yes, the lexer is designed to handle various tokenization scenarios encountered in real-world programming languages. It can accurately tokenize string literals, character constants, and handle nested expressions or control structures. The lexer's robust design ensures that it can effectively parse complex code constructs while maintaining efficiency and accuracy.

⚠️ License

Distributed under the MIT License. See LICENSE for more information.

🤝 Contact

Vivek Dagar - - [email protected]

Project Link: https://github.com/vivekkdagar/c-lexical-analyzer

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
.github/workflows		.github/workflows
.idea		.idea
Lex		Lex
Screenshots		Screenshots
Tests		Tests
cmake-build-debug		cmake-build-debug
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md
main.cpp		main.cpp

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

c-lexical-analyzer

· Documentation · Report Bug · Request Feature

📔 Table of Contents

🌟 About the Project

📷 Screenshots

🎯 Features

🧰 Getting Started

‼️ Prerequisites

🏃 Run Locally

❔ FAQ

⚠️ License

🤝 Contact

About

Languages

License

vivekkdagar/c-lexical-analyzer

Folders and files

Latest commit

History

Repository files navigation

c-lexical-analyzer

· Documentation · Report Bug · Request Feature

📔 Table of Contents

🌟 About the Project

📷 Screenshots

🎯 Features

🧰 Getting Started

‼️ Prerequisites

🏃 Run Locally

❔ FAQ

⚠️ License

🤝 Contact

About

Topics

Resources

License

Stars

Watchers

Forks

Languages