Semantic type analyser brainstorming #248
Replies: 11 comments 6 replies
-
Semantic Type Analysis - brainstorming # (Issue #247)In my attempt to understand the complexities of transpiling Matlab to Python I have found 3 useful sources of insight. Perhaps they have ideas you can use?
Also a thesis in which many of the best attempts so far are reviewed. It also details their methods to produce consistent results. An interesting looking link: Unfortunately this project has lost its leader. Prof Laurie Hendren died about 2 years ago
Background papers here: Bysiek wrote typed-astunparse an unparser for Python 3 AST’s with type comments. Converting Matlab to Bysiek’s extended Python AST could open up many possibilities for creators of Matlab code.
I hope these suggestions are helpful. |
Beta Was this translation helpful? Give feedback.
-
I will need to take some time to properly answer this. But on arrays, I believe in MATLAB all array indices are integral (unlike e.g. Ada) so you can just translate Let me get back to you on the rest. Since there is now official interest, I will put the semantic analysis at the top of my prio list. |
Beta Was this translation helpful? Give feedback.
-
Obviously what I said doesn't work if we don't even know if something is an array or a function call. I really think I need to get sem working... |
Beta Was this translation helpful? Give feedback.
-
Also, moving this to a discussion item, since it's not really a feature/bug |
Beta Was this translation helpful? Give feedback.
-
So, I am not a fan of even considering machine learning for type inference. I think this is both too slow to do in Python, but also very hard to eventually qualify. I think, with some language constraints (MathWorks also has some), some reasonable type inference can be made. For example:
I think all three of these are also limitations for the Mathworks code generator, so people are already used to them. There are a few more, and maybe a few different ones we'd want to choose. |
Beta Was this translation helpful? Give feedback.
-
I think my plan is as follows (but take this with a grain of salt as I have had neither coffee nor breakfast :D) We still parse in parallel, like we do right now. Then, for sem, we can do this:
Special care needed for:
Once we have name resolution working correctly, we can:
|
Beta Was this translation helpful? Give feedback.
-
This list of Matlab flaws may be useful for you in testing your thinking. |
Beta Was this translation helpful? Give feedback.
-
@florianschanda said
Neither am I. I am a generalist who used the SMOP project as a way of extending my poor knowledge of Matlab and improving my Python understanding. This has led me down some interesting alleyways over the last few years. It does surprise me that there are no really successful attempts at bridging the Matlab<>Python gap. I see your Miss_Hit as part of that bridge. An easy to use & effective Matlab>Python tool helps break down barriers of cost and complexity for students, independent researchers and small developers. Its a rewarding open source project. Re the previous reference i sent about MatJuice. There is Matlab>Java code for that paper here: https://github.com/Sable/matjuice On the Sable github site I also noticed some tests of Matlab semantics which may be helpful:
These may give you clues to help your understanding. |
Beta Was this translation helpful? Give feedback.
-
A while ago I learned about Treesitter, a tool for typing & syntax analysis developed by Github and included in Atom editor. Its use to assist code transformation is mentioned in some of the references listed below. A few months ago I jotted down the helpful weblinks which I have updated today.
The idea:Incremental Analysis of Real Programming Languages Video of Github Announcement of Treesitter: 42min Tree Sitter and Syntax Highlighting - Petersen tree-sitter-syntax-visualizer A map of the tree-sitter ecosystem Matlab grammar is available, but not on Treesitter site. Create a Treesitter grammar Combobulate is an Emacs package that provides a standardized framework for manipulating and navigating your source code using tree sitter's concrete syntax tree. Combobulate is language agnostic and should work with little modification almost all languages supported by tree sitter itself. https://github.com/mickeynp/combobulate Parser Parser Combinators for Program Transformation |
Beta Was this translation helpful? Give feedback.
-
Woah, that's a lot more responses than I would think of :D. Looking at these papers, it doesn't make sense to share my semantic type analyser or code generator, as it's super basic, although it outputs clean code. But the code generation itself ain't that hard, could be integrated easily into MISS_HIT, once the semantic type analyser is working. (Also, @RobBW your url's are sometimes messed up, it redirects to https://github.com/florianschanda/miss_hit/discussions/url instead) |
Beta Was this translation helpful? Give feedback.
-
Do you know about OctMiner? It turned up for the first time in one of my searches related to Matlab today. It looks as if it may have some relevant observations for your work on semantics and quality. |
Beta Was this translation helpful? Give feedback.
-
Heya! So, I've been working on a MATLAB to Python/NumPy converter the past few months, using your execellent tools for this. As you said multiple times in issues/code, you need to get a semantic type analyser working. So, my idea was to share some ideas about analysing and code generation.
For example, what would happen if you have the following (nonsense) code:
If you translate that to Python, you would expect something like
which of course doesn't work right, as
os.path.exists
can't take a NumPy array. You can change the initialization, but the array might be used as a proper array, and then the code breaks. So what are your thoughts about this?Also, indexing. Yay. A single index in MATLAB is perfectly fine, but logically won't get the same results as with NumPy. What kind of code should be generated do you think?
Regarding my semantic type analyser, I have one main function, which looks like this:
which visits the right node, other functions in the same class. So far, everything is fine. But as you said, it is a hard task, for example, how do you deal with types and sizes? How do you track them/store them in a tree? When are you visiting functions? How do you handle array indexing, or function calls?
I may share my entire type analyser/code generator if you would like :) So far, it produces syntactically correct (as in, whitespace, comments), but when running the output program you get a ton of errors. For example, I can't properly detect if you call a function or index an array, so it always uses
()
rather than[]
if necessary, which obviously won't work.So, TL;DR: I'm curious what your thoughts are about this :) (and then especially the semantic type analyser). Thanks!
Beta Was this translation helpful? Give feedback.
All reactions